Author

Date of Award

2026

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical and Computer Engineering

Committee Chair

Rhonda Gaede

Committee Member

Jeffrey Kulick

Committee Member

David Coe

Committee Member

Tommy Morris

Committee Member

Ben Denton

Committee Member

Leon Jololian

Research Advisor

Rhonda Gaede

Subject(s)

Compilers (Computer programs), Neural networks (Computer science), Software maintenance, Reverse engineering

Abstract

When software is compiled into machine code, the high-level data types defined in the original source code are lost. Binary type inference is the process of attempting to recover these high-level data types from compiled binary code. This dissertation investigates binary type inference using graph neural networks (GNNs) to predict data types for decompiled variables. The study is organized into three phases. Phase one presents DRAGON, a GNN model that predicts simple data types for decompiled variables using input graphs derived from decompiled abstract syntax trees (ASTs). DRAGON also produces confidence estimates indicating the degree of faith the model places in each prediction. The results demonstrate that confidence estimates correlate with prediction accuracy, giving insight into which predictions should be trusted. DRAGON exhibits simple data type prediction accuracy competitive with or better than two state of the art methods, showing that GNNs can be used effectively with decompiled ASTs for type inference. Phase two presents SAPHIRA, an algorithm extending DRAGON for structure recovery. SAPHIRA capitalizes on the interactive paradigm for which modern decompilers are designed by iteratively providing annotations to refine decompilation. This evaluation shows that SAPHIRA can recover structure definitions effectively whether or not the structures were observed during training. Phase three presents DRAGON-RYDER, a full type recovery solution combining DRAGON and SAPHIRA to infer both simple types and structure definitions. DRAGON-RYDER introduces incremental retyping, in which high-confidence predictions are applied first to improve available context before regenerating predictions for low-confidence variables. While not universally effective, incremental retyping can improve structure recovery for some programs. DRAGON-RYDER implements a technique to decide dynamically if incremental retyping should be used for a program based on its initial confidence estimate distribution. Using this technique, DRAGON-RYDER achieves greater simple type prediction accuracy than TRex, a recent state of the art approach. On the most difficult benchmark, DRAGON-RYDER demonstrates superior structure recovery to TypeForge, a state of the art method using a “mini” large language model (LLM) approximately 220x larger than the total model footprint of DRAGON-RYDER.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.