Date of Award
2024
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
Committee Chair
Tathagata Mukherjee
Committee Member
Manil Maskey
Committee Member
Chaity Banerjee Mukherjee
Research Advisor
Tathagata Mukherjee
Subject(s)
Information retrieval, Natural language processing (Computer science), Computational linguistics, Artificial intelligence
Abstract
In the context of exponential data growth, the efficient retrieval of information remains a known challenge. One key problem lies in bridging the gap between the search query and the information available. This thesis introduces a framework for information retrieval with the help of large language models (LLMs) along with query augmentation. Given a query, its sub-queries are created, using a fine tuned Seq2Seq (sequence to sequence) model through a technique called knowledge distillation. Different prompting methods are applied to produce an efficient query graph. The graph generated is then fed through a retrieval augmented generation (RAG) pipeline to respond to the original question. Experimental results on open source question answering dataset HotpotQA achieved over 51% exact match with ground truth.
Recommended Citation
Bhusal, Anish, "Query augmentation for information retrieval (IR) using large language model (LLM)" (2024). Theses. 663.
https://louis.uah.edu/uah-theses/663