Date of Award
2024
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chair
Tathagata Mukherjee
Committee Member
Chaity Banerjee Mukherjee
Committee Member
Vineetha Menon
Research Advisor
Tathagata Mukherjee
Subject(s)
Natural language processing (Computer science), Computational linguistics--Methodology, Data mining, Artificial intelligence
Abstract
This thesis addresses the challenging problem of hierarchical multi-label text classification and introduces a novel zero-shot approach that recommends the label up to the depth of hierarchy in which it is confident. In order to validate the efficacy of the proposed method, we experimented using various potential embedding models such as text-embedding-ada-002, mpnet-all, instructor embeddings, and nasa-smd-ibm-st on Earth science datasets. The experimental results reveal that all considered embedding models surpass the baseline model supervised learning classifier, demonstrating the superiority of the proposed zero-shot approach. This proposed solution can minimize the label imbalance problem typically observed in the supervised learning approach. The findings from this research can help scholars, researchers, policymakers and environmental scientists better understand and tackle urgent global issues. Experimenting with the proposed framework on datasets belonging to other domains such as biology, physics, medicine, etc. can be a next step to better understand the rigidity of the model.
Recommended Citation
Dahal, Rajashree, "Hierarchical multi-label text classification in Earth science datasets" (2024). Theses. 662.
https://louis.uah.edu/uah-theses/662