Bilkent University
Department of Computer Engineering
CS 590/690 SEMINAR

 

INSPECTBUGS: DETECTING INVALID BUG REPORTS AND RESOLVING WITH NO-CODE FIXES

 

Mahmut Furkan Gön
Master Student
(Supervisor:Assoc.Prof.Eray Tüzün)

Computer Engineering Department
Bilkent University

Abstract: Many software issues are reported to software maintainers in the form of bug reports. However, there are many invalid bug reports whose solutions do not require modification in the source code. The existence of invalid bug reports causes unnecessary human effort and time spent in determining their nature. Also, customer support staff spend a considerable amount of time explaining why the reported bug is invalid. In this study, we investigate the automated subclassification of invalid bugs by utilizing different machine learning (ML), deep learning (DL), and large language model (LLM) techniques. We also study how subclassification, LLMs, and retrieval-augmented generation (RAG) systems can be useful in suggesting no-code fixes to invalid bug reports. On a dataset of bug reports from the Brave repository, we evaluated various models, including ML, DL, and LLMs, to determine the subclass of invalid bug reports. We then used different LLMs in a RAG setting and standalone to generate no-code fixes for these invalid reports. We applied different evaluation mechanisms for the no-code fixes. Firstly, we randomly sampled and manually evaluated the suggested no-code fixes to see if they resolved the bug. Then, we compared the semantic similarities of the suggested no-code fixes with ground truth no-code fixes—obtained from the dataset via BERTScore. This study shows that LLM solutions outperform traditional ML and DL-based solutions, as well as stand-alone LLMs, in bug report subclassification tasks. RAG-based no-code fix suggestions successfully resolved a substantial proportion of the invalid bug reports, showing their potential for providing accurate and quick solutions to frequently reported issues. RAG-based no-code fix suggestions displayed a high similarity with real-world scenarios, showing their potential to be used in customer support services. Further analysis is needed to evaluate the effect of different fine-tuning, prompt, and context engineering techniques, as well as other LLMs and AI models in the future.

 

DATE: March 23, Monday @ 16:30 Place: EA 502