Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.

Similar presentations


Presentation on theme: "Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier."— Presentation transcript:

1 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier Similarity for Defect Detection Norihiro Yoshida Takashi Ishio Makoto Matsushita Katsuro Inoue (Osaka University)

2 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Similar code fragment A code fragment that has similar part to it in source code  introduced in source code because of various reasons. e.g. “copy-and-paste”  makes software maintenance difficult. Similar code fragment CF 1 If CF 1 is defective… It is necessary to check a2. It is necessary to check CF 2 and CF 3 CF 2 CF 3 Source file

3 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 2 Similar defects in Linux 2.6.6 for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1]; } for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = (char *) prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = (unsigned long) prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1]; }

4 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 3 Similar defects in Linux 2.6.6 for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1]; } for(iter=0; iter<num_regs; iter++) { prom_prom_taken[iter].start_adr = (char *) prom_reg_memlist[iter].phys_addr; prom_prom_taken[iter].num_bytes = (unsigned long) prom_reg_memlist[iter].reg_size; prom_prom_taken[iter].theres_more = &prom_phys_total[iter+1]; // should be:&prom_prom_taken[iter+1]; } Type cast operations are inserted. Clone detection tools cannot treat the code fragments as a clone pair.

5 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 4 An overview of proposed method Input code fragment (Query) Target source files Lexical Analysis I i [0]I i [n i ] Input identifier list Lexical Analysis I t1 [0] I t1 [n t1 ] I t2 [0]I t2 [n t2 ] I tn [0]I tn [n tn ] Target identifier lists Comparison Similar sublists I s1 [0]I s1 [n s1 ] I s2 [0]I s2 [n s2 ] Ranking I sn [0]I sn [n sn ] RankStart line #End line #Similarity 1Line s1 Line e1 Sim 1 2Line s2 Line e2 Sim 2 Similarity Ranking The method retrieves code fragments similar to an input code fragment.

6 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 5 Comparison Scan a target identifier list with a sliding window  We compare identifiers in the sliding window with the input identifier list. Extract a code fragment corresponding to the sliding window if the window involves one or more identifiers in the input list It[3]It[0]It[1]It[2] It[n]It[n-1] Input identifier list Ii[0]Ii[1]Ii[2] Sliding Window ( fixed length ) The direction of movement of the sliding window Target identifier list

7 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 6 Similarity-based ranking The extracted code fragments are sorted according to the following similarity.  S i : a set of elements in an input identifier list  S w : a set of elements in a sliding window Developers investigate the resultant similarity-based ranking.

8 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 7 Case Study Target open source software systems  arch/ directory in Linux 2.6.6 Architecture-specific implementations in OS 2 incorrect pointer accesses  server/ directory in Canna 3.6 Japanese input system 19 buffer overflow errors Procedure 1. extract code fragments sharing similar defects 2. enter each code fragment into the tool implementing our method 3. inspect if the similarity ranking ranks highly code fragments involving defects

9 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 8 Result Linux 2.6.6  We used 2 code fragments as queries. Each code fragment involves an incorrect pointer access.  In both of those queries, the 2 code fragments are the top 2. Canna 3.6  We used 19 code fragments as queries. Each code fragment involves a buffer overflow error.  In all of those queries, 18 or 19 code fragments are the top 30. In our case studies, we could detect most of similar defects.

10 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 9 Summary & Future work We proposed a method to retrieve similar code fragments based on identifier similarity.  Sliding window comparison  Similarity-based ranking We need further case studies.  Application to similar defects in other software systems  Effects from changing “similarity” definition


Download ppt "Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier."

Similar presentations


Ads by Google