Presentation is loading. Please wait.

Presentation is loading. Please wait.

DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.

Similar presentations


Presentation on theme: "DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University."— Presentation transcript:

1 DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University

2 A Box Full of Nails A lot of promise potential excitement Not that many success stories Not sure what to apply it to Lets try this particularly exciting idea Miners looking at their tools Promises, promises… Interesting usage patterns found by CVS mining Interesting error patterns found by CVS mining

3 My Background Tools for bug detection Analysis: pointer analysis, etc. Mostly static, some dynamic Applications: Security Buffer overruns Format string violations SQL injections Cross-site scripting HTTP response splitting Data lifetimes J2EE patterns Bad session stores Lapsed listeners Eclipse patterns Missing calls to dispose Not calling super Forgetting to deregister listeners

4 Glorified Bug Finding System A language for describing bug patterns Called PQL, see OOPSLA years of work Static and dynamic analysis combined We dont know what to look for Took a long time to find useful error patterns Programmers often dont recognize patterns Have pretty good tools How do we find more patterns to check? Want: find error patterns in unfamiliar code

5 The Usual Suspects Much bug-detection research in recent years Focus: generic patterns, sometimes language-specific NULL dereferences Security Buffer overruns Format string violations Memory Double-deletes Memory leaks Locking errors/threads Deadlock/race detection Atomicity Lets look at the space of error patterns in more detail

6 Classification of Error Patterns NULL dereferences Buffer overruns Double-deletes Locking errors/threads Generic patterns -- the usual suspects App-specific patterns particular to a system or a set of APIs Bugs in Linux code Bugs in J2EE servlets Device drivers Error Pattern Iceberg NULL dereferences Buffer overruns Double-deletes Locks/threads

7 Classification of Error Patterns App-specific patterns particular to a system or a set of APIs Intuition: Many other application-specific patterns exist Much of application-specific stuff remains a gray area so far Goal: Lets figure out what the patterns are Generic patterns -- the usual suspects NULL dereferences Buffer overruns Double-deletes Locking errors/threads Anybody knows any good error patterns specific to WinAmp plugins? ? There are hundreds of WinAmp plugins out there

8 Motivation: Matching Method Pairs Start small: Matching method pairs Only two methods A very simple state machine Calls must match perfectly, order matters Very common, our inspiration is System calls fopen/fclose lock/unlock … GUI operations addNotify/removeNotify addListener/removeListener createWidget/destroy Widget … Want to find more of the same And, if are lucky, more interesting patterns

9 DynaMine: Our Insight Our problem: Want to find patterns whose violation causes errors Want to find patterns for program understanding Our technique: Look at revision histories Crucial observation: Use data mining techniques to find method that are often added at the same time Things that are frequently checked in together often form a pattern

10 DynaMine: Our Insight (continued) Now we know the potential patterns Profile the patterns Run the application See how many times each pattern hits – number of times a pattern is followed misses – number of times a pattern is violated Based on this statistics, classify the patterns Usage patterns – almost always hold Error patterns – violated a large number of the times, but still hold most of the time Unlikely patterns – not validated enough times

11 Architecture of DynaMine mine CVS histories patterns run the application post-process usage patterns error patterns unlikely patterns sort and filter revision history mining dynamic analysis report bugs report patterns reporting instrument relevant method calls

12 Mining approach

13 Mining Basics Rely on co-change Simplification: look at method calls only Look for interesting patterns in the way methods are called Example: Sequence of revisions Files Foo.java, Bar.java, Baz.java, Qux.java o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java

14 Mining Matching Method Calls Use our observation: Methods that are frequently added simultaneously often represent a usage pattern For instance: … addListener(…); … removeListener(…); … o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java

15 Data Mining Summary We consider method calls added in each check-in We want to find patterns of method calls Too many potential patterns to consider Want to filter and rank them Use support and confidence for that Support and confidence of each pattern Standard metrics used in data mining Support reflects how many times each pair appears Confidence reflects how strongly a particular pair is correlated Refer to the paper for details

16 Improvements Over the Traditional Approach Default data mining approach doesnt quite work Filters based on confidence and support Still too many potential patterns! 1.Filtering: Consider only patterns with the same initial subsequence as potential patterns 2.Ranking: Use one-line fixes to find likely error patterns

17 Matching Initial Call Sequences o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java Pair 3 Pairs 1 Pair 10 Pairs 2 Pairs 1 Pair 0 Pairs 0 Pairs

18 Using Fixes to Rank Patterns Look for one-call additions which likely indicate fixes Rank patterns with such methods higher o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java This is a fix! Move patterns containing removeListener up

19 Applications under Study Apply these ideas to the revision history of Eclipse and jEdit Very large open-source projects Many people working on both, are all over the planet 122 on Eclipse 92 on jEdit Many check-ins Eclipse 2,837,854 jEdit 144,495 Long histories Eclipse since 2001 jEdit since 2000

20 Some patterns (as promised)

21 Categories of Patterns Method calls during execution: Care about the methods Care about the order Care about the parameters/return values Herere some common cases Matching method pairs State machines More complex patterns

22 Some Interesting Method Pairs (1) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen …

23 Some Interesting Method Pairs (2) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Register/unregister the current widget with the parent display object for subsequent event forwarding

24 Some Interesting Method Pairs (3) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Add/remove listener for a particular kind of GUI events

25 Some Interesting Method Pairs (4) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Use OS native locking mechanism for resources such as icons, etc.

26 State Machines Order captured by a state machine Must be followed precisely: omitting or repeating a method call is a sign of error. Simplest formalism for describing the object life-cycle. Matching method pairs – specific case Very common in C Consider OS code Less common in Java, but…

27 State Machines (1) o.enterAlignment [o.redoAlignment] o.exitAlignment Part of the org. eclipse. jdt. internal. formatter. Scribe package responsible for pretty-printing of code enterAlignment/exitAlignment pairs must match redoAlignment is invoked in exception cases

28 State Machines (2) o.beginCompoundEdit() (o.insert(...) | o.remove(...))+ o.endCompoundEdit() Compound edits within jEdit: can be undone at once beginCompoundEdit/endCompoundEdit act as brackets Other operations inbetween

29 State Machines (3) OS.PmMemCreateMC [OS.PmMemStart OS.PmMemFlush OS.PmMemStop] OS.PmMemReleaseMC Memory context manipulation (like memory pools) Wrappers around underlying OS functionality The middle part of the pattern is optional

30 More Complex Stuff (1) try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally { if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork)); } } catch (CoreException e) { return e.getStatus(); } finally { monitor.done(); }

31 More Complex Stuff (2) try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true ); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally { if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork)); } } catch (CoreException e) { return e.getStatus(); } finally { monitor.done(); }

32 More Complex Stuff (3) try { monitor.beginTask(null, Policy.totalWork); int depth = -1; try { workspace.prepareOperation(null, monitor); workspace.beginOperation(true); depth = workspace.getWorkManager().beginUnprotected(); return runInWorkspace(Policy.subMonitorFor(monitor, Policy.opWork, SubProgressMonitor.PREPEND_MAIN_LABEL_TO_SUBTASK)); } catch (OperationCanceledException e) { workspace.getWorkManager().operationCanceled(); return Status.CANCEL_STATUS; } finally { if (depth >= 0) workspace.getWorkManager().endUnprotected(depth); workspace.endOperation(null, false, Policy.subMonitorFor(monitor, Policy.endOpWork)); } } catch (CoreException e) { return e.getStatus(); } finally { monitor.done(); }

33 Grammar for Workspace Transactions Requires human intelligence Requires a lot of it Is actually an excellent pattern – havent seen runtime violations S O O w.prepareOperation() w.beginOperation() U w.endOperation() U w.getWorkManager().beginUnprotected() S [w.getWorkManager().operationCanceled()] w.getWorkManager().beginUnprotected()

34 Dynamic checking

35 Dynamically Check the Patterns Home-grown bytecode instrumentor Get a list of matching patterns Instrument calls to any of the methods to dump parameters Post-processing of the output Process a stream of events Find and count matches and mismatches … o.register(d) … o.deregister(d) … o.deregister(d) matched mismatched ???

36 Experiments

37 Experimental Setup Applied to Eclipse and jEdit 3,600,000 lines of Java code combined Included many plugins Times: 6 days to fetch and process CVS histories 30 minutes to compute the patterns An hour to instrument 15 minutes to run And we are done!

38 Experimental Summary Pattern classification: 56 patterns total 13 are usage patterns 8 are error patterns 11 are unlikely patterns 24 were not hit at runtime Error patterns Resulted in a total of 264 dynamically confirmed pattern violations

39 Summary Knowing code patterns is important We explored using software histories: Co-change often indicates patterns Use previous fixes (one-line changes) to drive error patterns Found interesting patterns: Matching method pairs State machines More complex stuff Confirmed valid patterns Found pattern violations at runtime We have a paper in FSE 2005


Download ppt "DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University."

Similar presentations


Ads by Google