Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION

 

Learning-Based Analysis of Pull Request-Issue Alignment with Large Language Models

 

Mustafa Yasir Altunhan
Master Student
(Supervisor: Assoc.Prof. Eray Tüzün)

Computer Engineering Department
Bilkent University

Abstract: Accurate alignment between pull requests (PRs) and corresponding issues is crucial for efficient software development and maintaining code quality, as misalignments can lead to reduced traceability, hindered defect localization, and decreased maintainability. This thesis aims to improve automated PR--issue alignment classification by leveraging fine-tuned large language models (LLMs) across multiple alignment categories, to investigate the effects of PR--issue fields on model predictions through interpretability analysis, and to demonstrate how LLM-based PR--issue alignment analysis can be integrated into a real-world code review workflow. The proposed methodology consists of dataset preparation, LLM fine-tuning, interpretability analysis, and system implementation. An existing dataset is extended and data augmentation techniques are applied to address class imbalance. Subsequently, GPT-4o is fine-tuned via instruction tuning, and several open-source LLMs---including CodeLlama-7B, CodeQwen1.5-7B, StableCode-3B, CodeGemma-7B, and DeepSeek-Coder-6.7B---are fine-tuned using classification-specific model heads. In addition, interpretability analysis using Shapley Additive Explanations (SHAP) is conducted to examine the influence of PR--issue fields on the predictions of the best-performing open-source LLM. In addition to the modeling approach, this thesis presents the design and implementation of an LLM-based PR--issue alignment tool integrated into real-world software development workflows. The tool is implemented as an extension to Bitbucket and Jira: it automatically analyzes PR--issue pairs upon pull request updates, reports alignment predictions directly within pull request interfaces, and allows developers to override automated decisions with an explicit label and explanation. To support traceability and post-hoc analysis, the tool persistently stores model predictions, developer overrides, and the corresponding PR--issue artifacts (including the code diff) in a commit-scoped manner. Experimental results show that fine-tuned LLMs outperform baseline models, achieving average improvements of 6.15\% in accuracy and F1-micro, 14.69\% in F1-macro, and 6.15\% in recall. CodeLlama-7B emerges as the best-performing fine-tuned LLM overall, demonstrating consistent performance across evaluation metrics. Interpretability analysis further reveals that code diffs, together with issue body and PR body contents, exert the greatest influence on alignment predictions. Overall, the findings demonstrate that fine-tuning substantially enhances PR--issue alignment classification while interpretability analysis provides actionable insights into the dataset features driving alignment decisions. Moreover, the implemented PR--issue alignment tool shows that LLM-based alignment analysis can be embedded into practical review workflows, supporting improved traceability, transparency, and decision-making in software engineering.

 

DATE: February 27, Friday @ 16:30 Place: EA 409