Improving the Accuracy of Refactoring Detection
With the development of refactoring technology, refactoring detection technology as its reverse technology has also been greatly progressed and applied, the technology has important significance and role for code optimisation, code review and code reliability. Over the past 20 years, the refactorin...
|Online Access:||PDF Full Text|
No Tags, Be the first to tag this record!
|Summary:||With the development of refactoring technology, refactoring detection technology as its reverse technology has also been greatly progressed and applied,
the technology has important significance and role for code optimisation, code review and code reliability. Over the past 20 years, the refactoring detection technique has evolved from a theoretical concept to be mature approaches and tools. However, due to the various complexities that arise when refactoring code, there are still some problems with these detection tools at work: selection of tools, detection of nested refactorings, false negatives due to matching algorithms, etc. As the requirements for detection increase, the pursuit of better detection performance (precision and recall) and more generalised detection tools has become a major research goal for my PhD.
The main research components of this paper include: Firstly, at the beginning of my research I conducted a meta-analysis of refactoring detection and evaluated the detection performance of four common refactoring detection tools under the same benchmark, analysed and compared their strengths and weaknesses, and identified new research questions and research directions. Secondly, I identified the study and detection of nested refactorings as a research blind spot in existing approaches, so I conducted a demonstration of the feasibility of nesting multiple refactor types with each other; in addition, I created an approach that can detect nested refactorings based on a single refactoring data using manually defined refactoring features combined with a random forest algorithm, thus being able to detect all 35 semantically meaningful nestings of them with 91.4% accuracy. Then I focused on the features that emerged during the refactoring process, mining the refactoring information in the diff to help RefDiff improve detection performance. During the research I developed Diff Extractor and Diff Encoder for extracting and encoding diffs, transformed diffs into arrays for refactoring information mining, and trained two models: 1. Diff Structure Feature Model, which determines the type of refactoring based on the structural features of the refactored diffs and can be used as a result checker, which improves the overall performance of RefDiff by checking for false positives in the RefDiff detection results. 2. Diff Feature Matching Network, which is trained based on the correspondence of the removed and added parts of the
refactored diff, has excellent robustness and can solve the problem of missing matching caused by word frequency matching approach.
Finally, I design an approach that integrates the two models, optimises Diff Extractor and Diff Encoder based on the characteristics of diff features, adds flags to diff based on the refactoring property, designs new encoding approach emphasising token uniqueness, trains better models and builds a model cross-validation mechanism that allows us to obtain detection results with high levels of confidence. We have shown that our approach, called RefDiff-Model, not only improves the precision of RefDiff 2.0 to 100% and increases the recall to 96.1%, but also continues to support detection tasks in multiple programming languages.|
|Physical Description:||152 Pages|