Author

Anish Bhusal

Date of Award

2024

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chair

Tathagata Mukherjee

Committee Member

Manil Maskey

Committee Member

Chaity Banerjee Mukherjee

Research Advisor

Tathagata Mukherjee

Subject(s)

Information retrieval, Natural language processing (Computer science), Computational linguistics, Artificial intelligence

Abstract

In the context of exponential data growth, the efficient retrieval of information remains a known challenge. One key problem lies in bridging the gap between the search query and the information available. This thesis introduces a framework for information retrieval with the help of large language models (LLMs) along with query augmentation. Given a query, its sub-queries are created, using a fine tuned Seq2Seq (sequence to sequence) model through a technique called knowledge distillation. Different prompting methods are applied to produce an efficient query graph. The graph generated is then fed through a retrieval augmented generation (RAG) pipeline to respond to the original question. Experimental results on open source question answering dataset HotpotQA achieved over 51% exact match with ground truth.

Available for download on Tuesday, May 06, 2025

Share

COinS