Date of Award
Master of Science (MS)
Computer Science : Modeling and Simulation
Mikel D. Petty
Cystic fibrosis., Machine learning., Binding sites (Biochemistry), Functional genomics.
Cystic Fibrosis is a Mendelian genetic disorder causing production of hyper- viscous mucus, which damages organs including the lungs and digestive system, and is the result of absent or defective production of the chloride transport protein CFTR. CFTR expression varies between organs, developmental stages, and individuals. We develop and optimize a machine learning pipeline to identify enhancers—binding sites for regulatory proteins—near CFTR. We segment A549, Caco-2, Calu-3, and PANC-1 genomes, train a sequence-based classifier on the predicted enhancer segments, and use the classifier score to predict sequence variants’ enhancer activities. Our optimizations appreciably improve the resulting enhancer predictions. We present 112 high- interest variants for further analysis and observe a minority of CF-causing variants predicted to modify enhancers. Our pipeline summarizes a vast amount of epigenetic data into a robust and simple metric, yielding valuable functional hypotheses.
Lawlor, James M.J., "Application of machine learning models to CFTR enhancer discovery" (2017). Theses. 214.