Date of Award
2017
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science : Modeling and Simulation
Committee Chair
Eric Mendenhall
Committee Member
Mikel D. Petty
Committee Member
Daniel Rochowiak
Subject(s)
Cystic fibrosis, Machine learning, Binding sites (Biochemistry), Functional genomics
Abstract
Cystic Fibrosis is a Mendelian genetic disorder causing production of hyper- viscous mucus, which damages organs including the lungs and digestive system, and is the result of absent or defective production of the chloride transport protein CFTR. CFTR expression varies between organs, developmental stages, and individuals. We develop and optimize a machine learning pipeline to identify enhancers—binding sites for regulatory proteins—near CFTR. We segment A549, Caco-2, Calu-3, and PANC-1 genomes, train a sequence-based classifier on the predicted enhancer segments, and use the classifier score to predict sequence variants’ enhancer activities. Our optimizations appreciably improve the resulting enhancer predictions. We present 112 high- interest variants for further analysis and observe a minority of CF-causing variants predicted to modify enhancers. Our pipeline summarizes a vast amount of epigenetic data into a robust and simple metric, yielding valuable functional hypotheses.
Recommended Citation
Lawlor, James M.J., "Application of machine learning models to CFTR enhancer discovery" (2017). Theses. 214.
https://louis.uah.edu/uah-theses/214