Date of Award
2026
Document Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical and Computer Engineering
Committee Chair
Rhonda Gaede
Committee Member
Jeffrey Kulick
Committee Member
David Coe
Committee Member
Tommy Morris
Committee Member
Ben Denton
Committee Member
Leon Jololian
Research Advisor
Rhonda Gaede
Subject(s)
Compilers (Computer programs), Neural networks (Computer science), Software maintenance, Reverse engineering
Abstract
When software is compiled into machine code, the high-level data types defined in the original source code are lost. Binary type inference is the process of attempting to recover these high-level data types from compiled binary code. This dissertation investigates binary type inference using graph neural networks (GNNs) to predict data types for decompiled variables. The study is organized into three phases. Phase one presents DRAGON, a GNN model that predicts simple data types for decompiled variables using input graphs derived from decompiled abstract syntax trees (ASTs). DRAGON also produces confidence estimates indicating the degree of faith the model places in each prediction. The results demonstrate that confidence estimates correlate with prediction accuracy, giving insight into which predictions should be trusted. DRAGON exhibits simple data type prediction accuracy competitive with or better than two state of the art methods, showing that GNNs can be used effectively with decompiled ASTs for type inference. Phase two presents SAPHIRA, an algorithm extending DRAGON for structure recovery. SAPHIRA capitalizes on the interactive paradigm for which modern decompilers are designed by iteratively providing annotations to refine decompilation. This evaluation shows that SAPHIRA can recover structure definitions effectively whether or not the structures were observed during training. Phase three presents DRAGON-RYDER, a full type recovery solution combining DRAGON and SAPHIRA to infer both simple types and structure definitions. DRAGON-RYDER introduces incremental retyping, in which high-confidence predictions are applied first to improve available context before regenerating predictions for low-confidence variables. While not universally effective, incremental retyping can improve structure recovery for some programs. DRAGON-RYDER implements a technique to decide dynamically if incremental retyping should be used for a program based on its initial confidence estimate distribution. Using this technique, DRAGON-RYDER achieves greater simple type prediction accuracy than TRex, a recent state of the art approach. On the most difficult benchmark, DRAGON-RYDER demonstrates superior structure recovery to TypeForge, a state of the art method using a “mini” large language model (LLM) approximately 220x larger than the total model footprint of DRAGON-RYDER.
Recommended Citation
Stewart, Caleb, "An incremental approach for recovering decompiled variable data types using graph neural networks with learned confidence estimates" (2026). Dissertations. 500.
https://louis.uah.edu/uah-dissertations/500