Date of Award

2024

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chair

Tathagata Mukherjee

Committee Member

Chaity Banerjee Mukherjee

Committee Member

Vineetha Menon

Research Advisor

Tathagata Mukherjee

Subject(s)

Natural language processing (Computer science), Computational linguistics--Methodology, Data mining, Artificial intelligence

Abstract

This thesis addresses the challenging problem of hierarchical multi-label text classification and introduces a novel zero-shot approach that recommends the label up to the depth of hierarchy in which it is confident. In order to validate the efficacy of the proposed method, we experimented using various potential embedding models such as text-embedding-ada-002, mpnet-all, instructor embeddings, and nasa-smd-ibm-st on Earth science datasets. The experimental results reveal that all considered embedding models surpass the baseline model supervised learning classifier, demonstrating the superiority of the proposed zero-shot approach. This proposed solution can minimize the label imbalance problem typically observed in the supervised learning approach. The findings from this research can help scholars, researchers, policymakers and environmental scientists better understand and tackle urgent global issues. Experimenting with the proposed framework on datasets belonging to other domains such as biology, physics, medicine, etc. can be a next step to better understand the rigidity of the model.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.