Author

Nishan Pantha

Date of Award

2023

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chair

Tathagata Mukherjee

Committee Member

Manil Maskey

Committee Member

Chaity Banerjee Mukherjee

Subject(s)

Big data, Data mining, Machine learning, Gene expression

Abstract

Recent years have seen rapid growth in high-dimensional datasets. Most existing machine learning (ML) algorithms fail in high-dimensional settings where many features could be redundant. A critical process of feature selection is thus applied in such a setting that helps in identifying the most relevant features while removing redundant ones. With the increase in high dimensionality, one is also faced with problems of efficiency and interpretation in performing such selection methods. Therefore, this thesis proposes a “novel” feature selection framework that uses an ensemble of interpretable ML algorithms to perform feature selection and the ranking of final features. Finally, this framework is applied to a gene expression dataset obtained through collaboration with National Aeronautics and Space Administration (NASA)’s Biological and Physical Sciences (BPS) team and help identify important and relevant genes contributing to specific target attributes through classification tasks.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.