Tree-Pattern-Based Clone Detection with High Precision and Recall
- Authors
- Lee, Hyo-Sub; Choi, Myung-Ryul; Doh, Kyung-Goo
- Issue Date
- May-2018
- Publisher
- 한국인터넷정보학회
- Keywords
- Software maintenance; code clone; clone detection; abstract syntax tree
- Citation
- KSII Transactions on Internet and Information Systems, v.12, no.5, pp.1932 - 1950
- Indexed
- SCIE
SCOPUS
KCI
- Journal Title
- KSII Transactions on Internet and Information Systems
- Volume
- 12
- Number
- 5
- Start Page
- 1932
- End Page
- 1950
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/6216
- DOI
- 10.3837/tiis.2018.05.002
- ISSN
- 1976-7277
- Abstract
- The paper proposes a code-clone detection method that gives the highest possible precision and recall, without giving much attention to efficiency and scalability. The goal is to automatically create a reliable reference corpus that can be used as a basis for evaluating the precision and recall of clone detection tools. The algorithm takes an abstract-syntax-tree representation of source code and thoroughly examines every possible pair of all duplicate tree patterns in the tree, while avoiding unnecessary and duplicated comparisons wherever possible. The largest possible duplicate patterns are then collected in the set of pattern clusters that are used to identify code clones. The method is implemented and evaluated for a standard set of open-source Java applications. The experimental result shows very high precision and recall. False-negative clones missed by our method are all non-contiguous clones. Finally, the concept of neighbor patterns, which can be used to improve recall by detecting non-contiguous clones and intertwined clones, is proposed.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF COMPUTING > SCHOOL OF COMPUTER SCIENCE > 1. Journal Articles
- COLLEGE OF ENGINEERING SCIENCES > SCHOOL OF ELECTRICAL ENGINEERING > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.