Date of Award
2023
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chair
Tathagata Mukherjee
Committee Member
Manil Maskey
Committee Member
Chaity Banerjee Mukherjee
Subject(s)
Big data, Data mining, Machine learning, Gene expression
Abstract
Recent years have seen rapid growth in high-dimensional datasets. Most existing machine learning (ML) algorithms fail in high-dimensional settings where many features could be redundant. A critical process of feature selection is thus applied in such a setting that helps in identifying the most relevant features while removing redundant ones. With the increase in high dimensionality, one is also faced with problems of efficiency and interpretation in performing such selection methods. Therefore, this thesis proposes a “novel” feature selection framework that uses an ensemble of interpretable ML algorithms to perform feature selection and the ranking of final features. Finally, this framework is applied to a gene expression dataset obtained through collaboration with National Aeronautics and Space Administration (NASA)’s Biological and Physical Sciences (BPS) team and help identify important and relevant genes contributing to specific target attributes through classification tasks.
Recommended Citation
Pantha, Nishan, "Feature selection in high-dimensional space with applications to gene expression data" (2023). Theses. 523.
https://louis.uah.edu/uah-theses/523