Date of Award

2024

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

Committee Chair

Tathagata Mukherjee

Committee Member

Letha Etzkorn

Committee Member

Chaity Banerjee Mukherjee

Research Advisor

Tathagata Mukherjee

Subject(s)

Source code (Computer science), Python (Computer program language), Comparative semantics

Abstract

In this thesis, we did comparative study of various methods for generating Python source code embeddings and evaluated their effectiveness using semantic labels. We used both word embedding models, such as Word2Vec and GloVe, and document embedding models to capture the semantic meaning of Python source code. In terms of word embedding evaluation, Word2Vec, combined with cosine distance, achieved the highest nearest neighbor precision of 0.5790. For evaluation of Python source code (or document) embeddings, our analysis across two datasets showed that Doc2Vec, paired with cosine distance, outperformed other methods in semantic code similarity detection, achieving an AUROC between 0.80 and 0.81 and an AUPR between 0.82 and 0.83. Notably, transformer-based methods like CodeBERT and GPT-2 underperformed when used solely for inference, likely because these large language models are more effective in tasks like code completion and code recommendation rather than generating robust source code embeddings.

Recommended Citation

Gyawali, Binita, "A comparative study of methods for modeling Python source code semantic similarity" (2024). Theses. 724.
https://louis.uah.edu/uah-theses/724

Download

COinS

Theses

A comparative study of methods for modeling Python source code semantic similarity

Date of Award

Document Type

Degree Name

Department

Committee Chair

Committee Member

Committee Member

Research Advisor

Subject(s)

Abstract

Recommended Citation

Search

Browse

Author Corner

M. Louis Salmon Library

Theses

A comparative study of methods for modeling Python source code semantic similarity

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair

Committee Member

Committee Member

Research Advisor

Subject(s)

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

M. Louis Salmon Library