Presentation is loading. Please wait.

Presentation is loading. Please wait.

2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.

Similar presentations


Presentation on theme: "2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji."— Presentation transcript:

1 2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji Kusumoto*, Katsuro Inoue* * Graduate School of Information and Science Technology, Osaka University **PRESTO, Japan Science and Technology Corporation

2 2002/12/11PROFES20022 Contents Background: Code Clone Objective Code Clone Detection Tool: CCFinder Proposed Clone Removal Technique Case Studies Summaries and Future Works

3 2002/12/11PROFES20023 Background: Code Clone Code clone is a code portion in source files that is identical or similar to another. Clone Pair Clone Class Code clone is one of factors that make software maintenance more difficult. If some faults are found in a code clone, it is necessary to correct the faults in its all code clones.

4 2002/12/11PROFES20024 Code Clone Detection Tool: CCFinder We have been developing a code clone detection tool, CCFinder. We have delivered CCFinder to software companies and evaluated the usefulness through some case studies.

5 2002/12/11PROFES20025 Case studies of CCFinder Open source software Commercial Software (about 30 companies) Students exercise of Osaka University Filed in a court as an evidence for software copyright suit JDK libraries (Java, 570 KLOC) Linux, FreeBSD (C, 1.6 + 1.3 MLOC) FreeBSD, OpenBSD , NetBSD(C) Qt(C++ , 240KLOC) NTT Data Corp., Hitachi Ltd., Hitachi GP, NEC soft Ltd., ASTEC Inc., SRA Inc., NASDA , Daiwa Computer, etc …

6 2002/12/11PROFES20026 Purpose of our research As an actual application of CCFinder, we want to use code clone analysis in refactoring process. But code clones detected by CCFinder are sequences of tokens, such code clones are not appropriate to be directly replaced by one module (subroutine, function et.).

7 2002/12/11PROFES20027 Objective We propose a method to extract code clones from ones detected by CCFinder, which are well-suited to refactoring process ([Extract Method], [Pull Up Method])*. We apply the proposed method to some open source softwares, and evaluate the usefulness of it. *M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.

8 2002/12/11PROFES20028 Extract Method Void methodA(int i){ methodZ(); System.out.println(“name:” + name); System.out.println(“amount:” + i); } Void methodB(int i){ methodY(); System.out.println(“name:” + name); System.out.println(“amount:” + i); } void methodA(int i){ methodZ(); methodC(i); } void methodB(int i){ methodY(); methodC(i); } Void methodC(int i){ System.out.println(“name:” + name); System.out.println(“amount:” + i); } methodC(i);

9 2002/12/11PROFES20029 Pull Up Method method A class A class B class C class A class B class C method A

10 2002/12/11PROFES200210 Outline of CCFinder Clone detection process consists of four steps. Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs CCFinder Step 1 Step 2 Step 3 Step 4 Target program C / C++ Java FORTRAN COBOL LISP Plain Text

11 2002/12/11PROFES200211 Process of CCFinder(1/4) Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Clone pairs Transformation Lexical analysis

12 2002/12/11PROFES200212 Process of CCFinder(3/4) Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs

13 2002/12/11PROFES200213 Process of CCFinder(3/4) Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs

14 2002/12/11PROFES200214 Process of CCFinder(4/4) 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs

15 2002/12/11PROFES200215 Issues in refactoring process Since code clones detected by CCFinder are sequences of tokens, so they are not appropriate to be directly replaced by one module (subroutine, function et.). Some of them do not suit refactoring.

16 Example 1 (Code clones including needless statements for refactoring) righttokennumber = c.getEndNumber() - c.getStartNumber() + 1; } string getLeftClone() const { char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\t%d,%d,%d\t%d,%d,%d\t",leftID.c_str(), leftstartline,leftstartcolumn,leftstartnumber, leftendline,leftendcolumn,leftendnumber); string clone(temp); return clone; } string getRightClone() const { char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\t%d,%d,%d\t%d,%d,%d\t",rightID.c_str(), rightstartline,rightstartcolumn,rightstartnumber, rightendline,rightendcolumn,rightendnumber); string clone(temp); return clone; } int getLeftTokenNumber() const { return lefttokennumber; }

17 2002/12/11PROFES200217 Example 1 (Code clones including needless statements for refactoring) parts should be detected. Only

18 Example 2 (Code clones not suited to refactoring) CCFinder extracts parts as code clones. But, these are not suited to refactoring.

19 2002/12/11PROFES200219 Outline of proposed method It extracts meaningful code clone from output of CCFinder. CCFinder Filter Source filesMeaningful clone data Clone data GUI Interface

20 2002/12/11PROFES200220 Processes executed by the filter Clone extraction unit : It extracts meaningful code clones from the result of syntax analysis and the output of CCFinder. Output of CCFinder Source files Syntax analysis unit Clone extraction unit Clone management unit Output Clone management unit : It sorts and merges code clones detected by clone extraction unit. Syntax analysis unit : It performs syntax analysis to source code including code clones.

21 2002/12/11PROFES200221 Implementation of proposed method CCShaper ( Code Clone Shaper) Target program: Java CCShaper extracts meaningful block from the output of CCFinder Description language: C++ Source size: about 4000 LOC Working environment : Windows2000/XP

22 2002/12/11PROFES200222 Outline of experiments We conducted two experiments using two kinds of Java source codes, ANTLR and Ant, which are open source software. Experimental environment Pentium4 1.5GHz memory SDRAM512MB We deal with code clones which include more than 50 tokens. CCFinder Code clones Meaningful code clones CCShaper Source code of ANTLR, Ant

23 2002/12/11PROFES200223 Two criteria Clone Pair Clone Class A Clone Pair is each pair of clone code portions. A Clone Class is a collection of code portions that are code clones each other. In this experiment, we use two criteria, the number of clone pairs and clone classes.

24 2002/12/11PROFES200224 Experiment 1(ANTLR)(1/3) Without CCShaperWith CCShaper Number of clone pairs 388574984 Number of clone classes 1072148 Analysis time: about 2 minutes ANTLR is implemented in Java, and generates parsers in either Java or C++. 239 Source files Size: about 44000 LOC Result 1/400 1/7

25 2002/12/11PROFES200225 Experiment 1(ANTLR)(2/3) Comparing on scatter plot Without CCShaper With CCShaper a

26 2002/12/11PROFES200226 Experiment 1(ANTLR)(3/3) Source code of the selected code clone public final void mOPEN_ELEMENT_OPTION(boolean _createToken) throws RecognitionException, CharStreamException, TokenStreamException { int _ttype; Token _token=null; int _begin=text.length(); ttype = OPEN_ELEMENT_OPTION; int _saveIndex; match('<'); if ( _createToken && _token==null && _ttype!=Token.SKIP ) { _token = makeToken(_ttype); _token.setText(new String(text.getBuffer(), _begin, text.length()- _begin)); } _returnToken = _token; } (a) Only portions are different from other clones. This code clone appears in 20 places of ANTLR. All code clones are methods included in the same class. These methods can be merged to one method by adding 2 arguments.

27 2002/12/11PROFES200227 Experiment 2(Ant)(1/4) Ant is a Java-based build tool. 689 source files. Size: about 164000 LOC. Result Without CCShaperWith CCShaper Number of clone pairs 12870159 Number of clone classes 116187 Analysis time: about 5 seconds 1/80 1/13

28 2002/12/11PROFES200228 Experiment 2(Ant)(2/4) Comparing on scatter plot Without CCShaper With CCShaper b

29 2002/12/11PROFES200229 Experiment 2(Ant)(3/4) public void getAutoresponse(Commandline cmd) { if (m_AutoResponse == null) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } else if (m_AutoResponse.equalsIgnoreCase("Y")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_YES); } else if (m_AutoResponse.equalsIgnoreCase("N")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_NO); } else { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } // end of else } Source code of the selected code clone (b) These clones are verbatimly the same ones These clones appear in seven classes These seven classes inherit a same class These methods can be merged to one method by pulling up to the parent class

30 2002/12/11PROFES200230 Experiment 2(Ant)(4/4) Class diagram (before refactoring) MSVSSADDMSVSSCHECKINMSVSSCHECKOUT MSVSSCPMSVSSCREATEMSVSSGETMSVSSLABEL MSVSS getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) Class diagram (after refactoring) MSVSSADDMSVSSCHECKINMSVSSCHECKOUT MSVSSCPMSVSSCREATEMSVSSGETMSVSSLABEL MSVSS getAutoresponse (Commandline cmd)

31 2002/12/11PROFES200231 Summaries We have developed a filtering tool (CCShaper) which extracts code clones that are well-suited to refactoring activity We have evaluated the usefulness of CCShaper by applying it to actual Java programs

32 2002/12/11PROFES200232 We are going to apply CCShaper to commercial software, extend it as to apply other programming languages, develop a filtering method which can extract code clones more-suited to refactoring. Future works

33 2002/12/11PROFES200233

34 2002/12/11PROFES200234 Web page of CCFinder/Gemini is available at http://sel.ist.osaka- u.ac.jp/cdtools/index.html.en

35 2002/12/11PROFES200235 The difference between ‘ diff ’ and clone detection tools Diff finds the longest common sub- string. Given a code portion, diff does not report two or more same code portions (clones). Clone detection tool finds all the same or similar code portions.

36 2002/12/11PROFES200236 Suffix-tree Suffix tree is a tree that satisfies the following conditions. 1. A leaf node represents the starting position of sub-string. 2. A path from root node to a leaf node represents a sub-string. 3. First characters of labels of all the edges from one node are different from each other. → A common path means a clone

37 2002/12/11PROFES200237 Example of transformation rules in Java All identifiers defined by user are transformed to same tokens. Unique identifier is inserted at each end of the top-level definitions and declarations. Prevents detecting clones that begin at the middle of class definition and end at the middle of another one. ”java. lang. Math. PI” is transformed to ”Math. PI”. By using import sentence, a class is referred to with either full package name or a shorter name ” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.

38 2002/12/11PROFES200238 The output of CCFinder Output of CCFinder #version: ccfinder 3.1 #langspec: JAVA #option: -b 30,1 #option: -k + #option: -r abcdfikmnprsv #option: -c wfg #begin{file description} 0.0 52 C:\Gemini.java 0.1 94 C:\GeneralManager.java : #end{file description} #begin{clone} 0.1 53,9 63,13 1.10 542,9 553,13 35 0.1 53,9 63,13 1.10 624,9 633,13 35 0.2 124,9 152,31 0.2 154,9 216,51 42 : #end{clone} Object file ID ( file 0 in Group 0 ) Location of a clone pair ( Lines 53 - 63 in file 0.1 and Lines 542 - 553 in file 1.10 are identical or similar to each other) It is difficult to analyze source code by only this text-based information of the location of clone pairs.

39 2002/12/11PROFES200239 The analysis of comparison among students (non-gapped clones only) A B The corresponding code A (2 students)  Similar code fragments were from source code of sample compiler described in textbook. B (4 students)  Many code fragments were similar even with respect to name of variables or comments.


Download ppt "2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji."

Similar presentations


Ads by Google