Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESEM | October 9, 2008 On Establishing a Benchmark for Evaluating Static Analysis Prioritization and Classification Techniques Sarah Heckman and Laurie.

Similar presentations


Presentation on theme: "ESEM | October 9, 2008 On Establishing a Benchmark for Evaluating Static Analysis Prioritization and Classification Techniques Sarah Heckman and Laurie."— Presentation transcript:

1 ESEM | October 9, 2008 On Establishing a Benchmark for Evaluating Static Analysis Prioritization and Classification Techniques Sarah Heckman and Laurie Williams Department of Computer Science North Carolina State University

2 ESEM | October 9, 2008 2 Contents Motivation Research Objective FAULTBENCH Case Study –False Positive Mitigation Models –Results Future Work

3 ESEM | October 9, 2008 3 Motivation Static analysis tools identify potential anomalies early in development process. –Generate overwhelming number of alerts –Alert inspection required to determine if developer should fix Actionable – important anomaly the developer wants to fix – True Positive (TP) Unactionable – unimportant or inconsequential alerts – False Positive (FP) FP mitigation techniques can prioritize or classify alerts after static analysis is run.

4 ESEM | October 9, 2008 4 Research Objective Problem –Several false positive mitigation models have been proposed. –Difficult to compare and evaluate different models. Research Objective: to propose the FAULTBENCH benchmark to the software anomaly detection community for comparison and evaluation of false positive mitigation techniques. http://agile.csc.ncsu.edu/faultbench/

5 ESEM | October 9, 2008 5 FAULTBENCH Definition [1] Motivating Comparison: find the static analysis FP mitigation technique that correctly prioritizes or classifies actionable and unactionable alerts Research Questions –Q1: Can alert prioritization improve the rate of anomaly detection when compared to the tool’s output? –Q2: How does the rate of anomaly detection compare between alert prioritization techniques? –Q3: Can alert categorization correctly predict actionable and unactionable alerts?

6 ESEM | October 9, 2008 6 FAULTBENCH Definition [1] (2) Task Sample: representative sample of tests that FP mitigation techniques should solve. –Sample programs –Oracles of FindBugs alerts (actionable or unactionable) –Source code changes for fix (adaptive FP mitigation techniques)

7 ESEM | October 9, 2008 7 FAULTBENCH Definition [1] (3) Evaluation Measures: metrics used to evaluate and compare FP mitigation techniques Prioritization –Spearman rank correlation Classification –Precision –Recall –Accuracy –Area under anomaly detection rate curve ActionableUnactionable Actionable True Positive (TP C ) False Positive (FP C ) Unactionable False Negative (FN C ) True Negative (TN C ) Actual Predicted

8 ESEM | October 9, 2008 8 Subject Selection Selection Criteria –Open source –Various domains –Small –Java –Source Forge –Small, commonly used libraries and applications

9 ESEM | October 9, 2008 9 FAULTBENCH v0.1 Subjects SubjectDomain# Dev. # LOC # Alerts MaturityAlert Dist. Area cvsobjectsData format 115777Prod.0.645477 import scrubber Software dev. 2165335Beta0.3126545 iTrustWeb514120110Alpha0.61703277 jbookEdu1127652Prod.0.2829400 jdomData format 3842255Prod.0.19211638 org.eclipse. core.runtime Software dev. 100279198Prod.0.30239546

10 ESEM | October 9, 2008 10 Subject Characteristics Visualization

11 ESEM | October 9, 2008 11 FAULTBENCH Initialization Alert Oracle – classification of alerts as actionable or unactionable –Read alert description generated by FindBugs –Inspection of surrounding code and comments –Search message boards Alert Fixes –Changed required to fix alert –Minimize alert closures and creations Experimental Controls –Optimal ordering of alerts –Random ordering of alerts –Tool ordering of alerts

12 ESEM | October 9, 2008 12 FAULTBENCH Process 1.For each subject program 1.Run static analysis on clean version of subject 2.Record original state of alert set 3.Prioritize or classify alerts with FP mitigation technique 2.Inspect each alert starting at top of prioritized list or by randomly selecting an alert predicted as actionable 1.If oracle says actionable, fix with specified code change. 2.If oracle says unactionable, suppress alert 3.After each inspection, record alert set state and rerun static analysis tool 4.Evaluate results via evaluation metrics.

13 ESEM | October 9, 2008 13 Case Study Process 1.Open subject program in Eclipse 3.3.1.1 1.Run FindBugs on clean version of subject 2.Record original state of alert set 3.Prioritize alerts with a version of AWARE-APM 2.Inspect each alert starting at top of prioritized list 1.If oracle say actionable, fix with specified code change. 2.If oracle says unactionable, suppress alert 3.After each inspection, record alert set state. FindBugs should run automatically. 4.Evaluate results via evaluation metrics.

14 ESEM | October 9, 2008 14 AWARE-APM Adaptively prioritizes and classifies static analysis alerts by the likelihood an alert is actionable Uses alert characteristics, alert history, and size information to prioritize alerts. Unactionable 1 Actionable 0 Unknown

15 ESEM | October 9, 2008 15 AWARE-APM Concepts Alert Type Accuracy (ATA): the alert’s type Code Locality (CL): location of the alert at the source folder, class, and method Measure the likelihood alert is actionable based on developer feedback –Alert Closure: alert no longer identified by static analysis tool –Alert Suppression: explicit action by developer to remove alert from listing

16 ESEM | October 9, 2008 16 Rate of Anomaly Detection Curve SubjectOptimalRandomATACLATA+CLTool jdom91.82%71.66%86.16%63.54%85.35%46.89% Average87.58%61.73%72.57%53.94%67.88%50.42% jdom

17 ESEM | October 9, 2008 17 Spearman Rank Correlation ATACLATA +CL Tool csvobjects0.321-0.643-0.3930.607 importscrubber0.512**-0.0260.2380.203 iTrust0.418**0.264**0.261**0.772** jbook0.798**0.389**0.599**-0.002 jdom0.675**0.288*0.457**0.724** org.eclipse.core.runtime0.395**0.325**0.246*0.691** * Significant at the 0.05 level** Significant at the 0.01 level

18 ESEM | October 9, 2008 18 Classification Evaluation Measures SubjectAverage Precision Average RecallAverage Accuracy ATACLATA +CL ATACLATA +CL ATACLATA +CL csvobjects0.320.500.39.038.0480.380.580.340.46 import- scrubber 0.340.200.180.240.280.450.620.430.56 iTrust0.050.020.050.160.150.070.970.840.91 jbook0.220.270.230.650.480.610.680.620.66 jdom0.060.090.060.310.070.290.880.860.88 org.eclipse. core.runtime 0.050.040.030.170.050.110.920.940.95 Average0.170.190.160.420.250.320.760.670.74

19 ESEM | October 9, 2008 19 Case Study Limitations Construct Validity –Possible closure and alert creation when fixing alerts –Duplicate alerts Internal Validity –External variable, alert classification, subjective from inspection External Validity –May not scale to larger programs

20 ESEM | October 9, 2008 20 FAULTBENCH Limitations Alert oracles chosen from 3 rd party inspection of source code, not developers. Generation of optimal ordering biased to the tool ordering of alerts. Subjects written in Java, so may not generalize to FP mitigation techniques for other languages.

21 ESEM | October 9, 2008 21 Future Work Collaborate with other researchers to evolve FAULTBENCH Use FAULTBENCH to compare FP mitigation techniques from literature http://agile.csc.ncsu.edu/faultbench/

22 ESEM | October 9, 2008 22 Questions? FAULTBENCH: http://agile.csc.ncsu.edu/faultbench/ Sarah Heckman: sarah_heckman@ncsu.edu

23 ESEM | October 9, 2008 23 References [1]S. E. Sim, S. Easterbrook, and R. C. Holt, “Using Benchmarking to Advance Research: A Challenge to Software Engineering,” ICSE, Portland, Oregon, May 3-10, 2003, pp. 74-83.


Download ppt "ESEM | October 9, 2008 On Establishing a Benchmark for Evaluating Static Analysis Prioritization and Classification Techniques Sarah Heckman and Laurie."

Similar presentations


Ads by Google