Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.

Similar presentations


Presentation on theme: "Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina."— Presentation transcript:

1 Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina State University Raleigh, USA ASE 2009 This work is supported in part by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443 and ARO grant W911NF-08-1-0105 managed by NCSU Secure Open Source Systems Initiative (SOSI)

2 Alattin: Motivation 2  Problem: Programming rules are often not well documented  General solution:  Mine common patterns across a large number of data points (e.g., code samples)  Use common patterns as programming rules to detect defects

3 3  Limited data points  Existing approaches mine specifications from a few code bases  miss specifications due to lack of sufficient data points  Existing approaches produce a large number of false positives Challenges addressed by Alattin

4 4 4 4 Code repositories 1 2 N … 12 mining patterns searchingmining patterns Code search engine e.g., Open source code on the web Eclipse, Linux, … Existing approaches Alattin approach Often lack sufficient relevant data points (eg. API call sites)‏ Code repositories Limited Data Points

5 5 5  Existing approaches produce a large number of false positives  One major observation:  Programmers often write code in different ways for achieving the same task  Some ways are more frequent than others Large Number of False Positives Frequent ways Infrequent ways Mined Patterns mine patterns detect violations

6 6 Example: java.util.Iterator.next() PrintEntries1(ArrayList entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries1(ArrayList entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 1 PrintEntries2(ArrayList entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } PrintEntries2(ArrayList entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Code Example 2 Code Sample 2 Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements

7 7 Example: java.util.Iterator.next() PrintEntries1(ArrayList entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries1(ArrayList entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 1 PrintEntries2(ArrayList entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } PrintEntries2(ArrayList entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Code Sample 2 1243 code examples Sample 1 (1218 / 1243) Sample 2 (6/1243) Mined Pattern from existing approaches: “boolean check on return of Iterator.hasNext before Iterator.next”

8 8 Example: java.util.Iterator.next()  Require more general patterns (alternative patterns): P 1 or P 2 P 1 : boolean check on return of Iterator.hasNext before Iterator.next P 2 : boolean check on return of ArrayList.size before Iterator.next  Existing approaches cannot mine, since alternative P 2 is infrequent PrintEntries1(ArrayList entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } PrintEntries1(ArrayList entries) { … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } … } Code Sample 1 PrintEntries2(ArrayList entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } PrintEntries2(ArrayList entries) { … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } … } Code Sample 2

9 9 Our Solution: ImMiner Algorithm  Mines alternative patterns of the form P 1 or P 2  Based on the observation that infrequent alternatives such as P 2 are frequent among code examples that do not support P 1 1243 code examples Sample 1 (1218 / 1243) Sample 2 (6/1243) P 2 is frequent among code examples not supporting P 1 P 2 is infrequent among entire 1243 code examples

10 10 Alternative Patterns  ImMiner mines three kinds of alternative patterns of the general form “P 1 or P 2 ” Balanced: all alternatives (both P 1 and P 2 ) are frequent Imbalanced: some alternatives (P 1 ) are frequent and others are infrequent (P 2 ). Represented as “P 1 or P ^ 2 ” Single: only one alternative

11 11 ImMiner Algorithm  Uses frequent-itemset [Burdick et al. ICDE 01] mining iteratively  An input database with the following APIs for Iterator.next() Input databaseMapping of IDs to APIs

12 12 ImMiner Algorithm: Frequent Alternatives Input database Frequent itemset mining (min_sup 0.5) Frequent item: 1 P 1 : boolean-check on the return of Iterator.hasNext() before Iterator.next()

13 13 ImMiner: Infrequent Alternatives of P 1 Positive database (PSD) Negative database (NSD)  Split input database into two databases: Positive and Negative  Mine patterns that are frequent in NSD and are infrequent in PSD  Reason: Only such patterns serve as alternatives for P 1  Alternative Pattern : P 2 “const check on the return of ArrayList.size() before Iterator.next()”  Alattin applies ImMiner algorithm to detect neglected conditions

14 14 Neglected Conditions  Neglected conditions refer to  Missing conditions that check the arguments or receiver of the API call before the API call  Missing conditions that check the return or receiver of the API call after the API call  One of the primary reasons for many fatal issues  security or buffer-overflow vulnerabilities [Chang et al. ISSTA 07]

15 15 Alattin Approach Application Under Analysis Detect neglected conditions Classes and methods Open Source Projects on web 1 2 N … … Pattern Candidates Alternative Patterns Violations Extract classes and methods reused Phase 1: Issue queries and collect relevant code samples. Eg: “lang:java java.util.Iterator next” Phase 2: Generate pattern candidates Phase 3: Mine alternative patterns Phase 4: Detect neglected conditions statically

16 16 Evaluation  Research Questions:  Does alternative patterns exist in real applications?  How high percentage of false positives are reduced (with low or no increase of false negatives) in detected violations?

17 17 Subjects  Two categories of subjects:  3 Java default API libraries  3 popular open source libraries  Column “Samples”: number of code examples collected from Google code search

18 18 RQ1: Balanced and Imbalanced Patterns  How high percentage of balanced and imbalanced patterns exist in real applications?  Balanced patterns: 0% to 30% (average: 9.69%)  Imbalanced patterns:  30% to 100% (average: 65%) for Java default API libraries  0% to 9.5% (average: 5%) for open source libraries  Inference: Java default API libraries provide more different ways of writing code compared to open source libraries

19 19 RQ2: False Positives and False Negatives  How high % of false positives are reduced (with low or no increase of false negatives)?  Applied mined patterns (“P 1 or P 2 or... or P i or A ^ 1 or A ^ 2 or... or A ^ j ”) in three modes:  Existing mode: “P 1 or P 2 or... or P i or A ^ 1 or A ^ 2 or... or A ^ j ”  P 1, P 2,..., P i  Balanced mode: “P 1 or P 2 or... or P i or A ^ 1 or A ^ 2 or... or A ^ j ”  “P 1 or P 2 or... or P i ”  Imbalanced mode: “P 1 or P 2 or... or P i or A ^ 1 or A ^ 2 or... or A ^ j ”  “P 1 or P 2 or... or P i or A ^ 1 or A ^ 2 or... or A ^ j ” 19

20 20 RQ2: False Positives and False Negatives ApplicationExisting ModeBalanced Mode DefectsFalse Positives DefectsFalse Positives % of reduction False Negatives Java Util371043710400 Java Transaction 511055110500 Java SQL561435690 37.06 0 BCEL21428 42.86 0 HSqlDB101000 Hibernate109 811.110 AVERAGE/ TOTAL 15.17 0  Existing Mode vs Balanced Mode  Balanced mode reduced false positives by 15.17% without any increase in false negatives 20

21 21 RQ2: False Positives and False Negatives ApplicationExisting ModeImbalanced Mode DefectsFalse Positives DefectsFalse Positives % of reduction False Negatives Java Util371043674 28.85 1 Java Transaction 511054776 27.62 4 Java SQL561435381 43.36 3 BCEL21426 57.04 0 HSqlDB101000 Hibernate109 811.110 AVERAGE/ TOTAL 28.01 8  Existing Mode vs Imbalanced Mode  Imbalanced mode reduced false positives by 28% with quite small increase in false negatives 21

22 22 Conclusion  Problem-driven methodology for advancing mining software engineering data by identifying  new problems, patterns  mining algorithms, defects  Alattin mines alternative patterns classified into three categories: balanced, imbalanced, and single  Alattin can be used to enhance various existing mining approaches to reduce false positives  Future work: Exploit synergy between static and dynamic analysis to further reduce false positives

23 23 Thank You


Download ppt "Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina."

Similar presentations


Ads by Google