Date of Award


Document Type


Degree Name

Master of Science (MS)


Computer Science : Modeling and Simulation

Committee Chair

Eric Mendenhall

Committee Member

Mikel D. Petty

Committee Member

Daniel Rochowiak


Cystic fibrosis., Machine learning., Binding sites (Biochemistry), Functional genomics.


Cystic Fibrosis is a Mendelian genetic disorder causing production of hyper- viscous mucus, which damages organs including the lungs and digestive system, and is the result of absent or defective production of the chloride transport protein CFTR. CFTR expression varies between organs, developmental stages, and individuals. We develop and optimize a machine learning pipeline to identify enhancers—binding sites for regulatory proteins—near CFTR. We segment A549, Caco-2, Calu-3, and PANC-1 genomes, train a sequence-based classifier on the predicted enhancer segments, and use the classifier score to predict sequence variants’ enhancer activities. Our optimizations appreciably improve the resulting enhancer predictions. We present 112 high- interest variants for further analysis and observe a minority of CF-causing variants predicted to modify enhancers. Our pipeline summarizes a vast amount of epigenetic data into a robust and simple metric, yielding valuable functional hypotheses.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.