Locating Matching Method Calls by Mining Revision History Data Benjamin Livshits Stanford University and Thomas Zimmermann Saarland University.

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Servlets and Java Server Pages.
DynaMine: Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland University.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 6 1 Microsoft Office Word 2003 Tutorial 6 – Creating Form Letters and Mailing Labels.
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
compilers and interpreters
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
INTRODUCTION Chapter 1 1. Java CPSC 1100 University of Tennessee at Chattanooga 2  Difference between Visual Logic & Java  Lots  Visual Logic Flowcharts.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
INSE - Lectures 19 & 20 SE for Real-Time & SE for Concurrency  Really these are two topics – but rather tangled together.
Programming Types of Testing.
Test Logging and Automated Failure Analysis Why Weak Automation Is Worse Than No Automation Geoff Staneff
JAVA BASICS SYNTAX, ERRORS, AND DEBUGGING. OBJECTIVES FOR THIS UNIT Upon completion of this unit, you should be able to: Explain the Java virtual machine.
/ PSWLAB Concurrent Bug Patterns and How to Test Them by Eitan Farchi, Yarden Nir, Shmuel Ur published in the proceedings of IPDPS’03 (PADTAD2003)
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
Finding Security Errors in Java Applications Using Lightweight Static Analysis Benjamin Livshits Computer Science Lab Stanford University.
ADVERSARIAL MEMORY FOR DETECTING DESTRUCTIVE RACES Cormac Flanagan & Stephen Freund UC Santa Cruz Williams College PLDI 2010 Slides by Michelle Goodstein.
Tools for Investigating Graphics System Performance
1 Introducing Collaboration to Single User Applications A Survey and Analysis of Recent Work by Brian Cornell For Collaborative Systems Fall 2006.
ReferencesReferences DiscussionDiscussion Vulnerability Example: SQL injection Auditing Tool for Eclipse LAPSE: a Security Auditing Tool for Eclipse IntroductionIntroductionResultsResults.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
Memories of Bug Fixes Sunghun Kim, Kai Pan, and E. James Whitehead Jr., University of California, Santa Cruz Presented By Gleneesha Johnson CMSC 838P,
Microsoft Research Faculty Summit Yuanyuan(YY) Zhou Associate Professor University of Illinois, Urbana-Champaign.
COMP205 Comparative Programming Languages Part 1: Introduction to programming languages Lecture 3: Managing and reducing complexity, program processing.
V0.01 © 2009 Research In Motion Limited Introduction to Java Application Development for the BlackBerry Smartphone Trainer name Date.
Chapter 3.7 Memory and I/O Systems. 2 Memory Management Only applies to languages with explicit memory management (C or C++) Memory problems are one of.
DynaMine DynaMine Finding Common Error Patterns by Mining Software Revision Histories Benjamin Livshits Stanford University Thomas Zimmermann Saarland.
A Characterization of Processor Performance in the VAX-11/780 From the ISCA Proceedings 1984 Emer & Clark.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Customer Portal – Customer User. You will receive an indicating that your Customer Portal registration is complete. A link to the Customer Portal,
What’s new in Stack 3.2 Michael Youngstrom. Disclaimer This IS a presentation – So sit back and relax Please ask questions.
Mining Function Usage Patterns to Find Bugs Chadd Williams.
University of Maryland Bug Driven Bug Finding Chadd Williams.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Deutsches Elektronen-Synchrotron DESY Helmholtz Association of German Research Centres Hamburg, Germany The European X-Ray Laser Project.
Chapter Three The UNIX Editors. 2 Lesson A The vi Editor.
Bug Localization with Machine Learning Techniques Wujie Zheng
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Chapter 3.5 Memory and I/O Systems. 2 Memory Management Memory problems are one of the leading causes of bugs in programs (60-80%) MUCH worse in languages.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
Debugging in Java. Common Bugs Compilation or syntactical errors are the first that you will encounter and the easiest to debug They are usually the result.
Pallavi Joshi* Mayur Naik † Koushik Sen* David Gay ‡ *UC Berkeley † Intel Labs Berkeley ‡ Google Inc.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
Using Model Checking to Find Serious File System Errors StanFord Computer Systems Laboratory and Microsft Research. Published in 2004 Presented by Chervet.
What is Eclipse? Official Definition: Eclipse Evolution
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
C o n f i d e n t i a l 1 Course: BCA Semester: III Subject Code : BC 0042 Subject Name: Operating Systems Unit number : 1 Unit Title: Overview of Operating.
Application of Design Heuristics in the Designing and Implementation of Object Oriented Informational Systems.
Chapter Three The UNIX Editors.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Unix Security Assessing vulnerabilities. Classifying vulnerability types Several models have been proposed to classify vulnerabilities in UNIX-type Oses.
Bugs (part 1) CPS210 Spring Papers  Bugs as Deviant Behavior: A General Approach to Inferring Errors in System Code  Dawson Engler  Eraser: A.
Demo of Scalable Pluggable Types Michael Ernst MIT Dagstuhl Seminar “Scalable Program Analysis” April 17, 2008.
Differences Training BAAN IVc-BaanERP 5.0c: Application Administration, Customization and Exchange BaanERP 5.0c Tools / Exchange.
Cross Language Clone Analysis Team 2 February 3, 2011.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
Announcements You will receive your scores back for Assignment 2 this week. You will have an opportunity to correct your code and resubmit it for partial.
Approaches to Intrusion Detection statistical anomaly detection – threshold – profile based rule-based detection – anomaly – penetration identification.
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Data Mining What is to be done before we get to Data Mining?
Improve Embedded System Stability and Performance through Memory Analysis Tools Bill Graham, Product Line Manager Development Tools November 14, 2006.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Code improvement: Coverity static analysis Valgrind dynamic analysis GABRIELE COSMO CERN, EP/SFT.
In-situ Visualization using VisIt
Mining and Analyzing Data from Open Source Software Repository
Mobile Programming Dr. Mohsin Ali Memon.
Presentation transcript:

Locating Matching Method Calls by Mining Revision History Data Benjamin Livshits Stanford University and Thomas Zimmermann Saarland University

The Usual Suspects Much bug-detection research in recent years Focus: generic patterns, sometimes language-specific NULL dereferences Security Buffer overruns Format string violations Memory Double-deletes Memory leaks Locking errors/threads Deadlock/races/atomicity Lets look at the space of error patterns in more detail

Classification of Error Patterns NULL dereferences Buffer overruns Double-deletes Locking errors/threads Generic patterns -- the usual suspects App-specific patterns particular to a system or a set of APIs Bugs in Linux code Bugs in J2EE servlets Device drivers Error Pattern Iceberg NULL dereferences Buffer overruns Double-deletes Locks/threads …

Classification of Error Patterns App-specific patterns particular to a system or a set of APIs Intuition: Many other application-specific patterns exist Much of application-specific stuff remains a gray area so far Goal: Lets figure out what the patterns are Generic patterns -- the usual suspects NULL dereferences Buffer overruns Double-deletes Locking errors/threads Anybody knows any good error patterns specific to WinAmp plugins? ? There are hundreds of WinAmp plugins out there

Focus: Matching Method Pairs Start small: Matching method pairs Only two methods A very simple state machine Calls have match perfectly Very common, our inspiration is System calls fopen/fclose lock/unlock … GUI operations addNotify/removeNotify addListener/removeListener createWidget/destroy Widget … Want to find more of the same

Our Insight Our problem: Want to find matching method pairs that are error patterns Our technique: Look at revision histories (think CVS) Crucial observation: Use data mining techniques to find method that are often added at the same time Things that are frequently checked in together often form a pattern

Our Insight (continued) Now we know the potential patterns Profile the patterns Run the application See how many times each pattern hits – number of times a pattern is followed misses – number of times a pattern is violated Based on this statistics, classify the patterns Usage patterns – almost always hold Error patterns – violated a large number of the times, but still hold most of the time Unlikely patterns – not validated enough times

System Architecture mine CVS histories patterns run the application post-process usage patterns error patterns unlikely patterns sort and filter revision history mining dynamic analysis report bugs report patterns reporting instrument relevant method calls

Mining Basics Sequence of revisions Files Foo.java, Bar.java, Baz.java, Qux.java Simplification: look at method calls only Look for interesting patterns in the way methods are called o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java

Mining Matching Method Calls Use our observation: Methods that are frequently added simultaneously often represent a use pattern For instance: … addListener(…); … removeListener(…); … o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java

Data Mining Summary We consider method calls added in each check-in We want to find patterns of method calls Too many potential pairs to consider Want to filter and rank them Use support and confidence for that Support and confidence of each pattern Standard metrics used in data mining Support reflects how many times each pair appears Confidence reflects how strongly a particular pair is correlated Refer to the paper for details

Improvements Over the Traditional Approach The default data mining approach doesnt really work Filters based on confidence and support Still too many potential patterns! 1.Filtering: Consider only pairs with the same initial subsequence as potential patterns 2.Ranking: Use one-line fixes to find likely error patterns

Matching Initial Call Sequences o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java Pair 3 Pairs 1 Pair 10 Pairs 2 Pairs 1 Pair 0 Pairs 0 Pairs

Using Fixes to Rank Patterns Look for one-call additions which likely indicate fixes. Rank pairs for such methods higher. o1.addListener o1.removeListener o2.addListener o2.removeListener System.out.println o3.addListener o3.removeListener list.iterator iter.hasNext iter.next o4.addListener System.out.println o4.removeListener Foo.java 1.12 Bar.java 1.47 Baz.java 1.23 Qux.java This is a fix! Move pairs containing removeListener up

Mining Setup Apply these ideas to the revision history of Eclipse Very large open-source project Many people working on it Perform filtering based on Pattern support Pattern strength Get 32 strongly correlated method pairs in Eclipse

Some Interesting Method Pairs (1) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen …

Some Interesting Method Pairs (2) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Begin applying a thread scheduling rule to a Java thread

Some Interesting Method Pairs (3) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Register/unregister the current widget with the parent display object for subsequent event forwarding

Some Interesting Method Pairs (4) kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Add/remove listener for a particular kind of GUI events

Some Interesting Method Pairs kEventControlActivatekEventControlDeactivate addDebugEventListenerremoveDebugEventListener beginRuleendRule suspendresume NewPtrDisposePtr addListenerremoveListener registerderegister addElementChangedListenerremoveElementChangedListener addResourceChangeListenerremoveResourceChangeListener addPropertyChangeListenerremovePropertyChangeListener createPropertyListreapPropertyList preReplaceChildpostReplaceChild addWidgetremoveWidget stopMeasuringcommitMeasurements blockSignalunblockSignal HLockHUnlock OpenEventfireOpen … Use OS native locking mechanism for resources such as icons, etc.

Dynamically Check the Patterns Home-grown bytecode instrumenter Get a list of matching method pairs Instrument calls to any of the methods to dump parameters Post-processing of the output Process a stream of events Find and count matches and mismatches … o.register(d) … o.deregister(d) … o.deregister(d) matched mismatched ???

Experimental Setup Applied our approach to Eclipse One of the biggest Java applications 2,900,000 lines of Java code Included many Eclipse plugins consisting of lower quality code than the core Chose 32 matching method pairs Times: 5 days to fetch and process CVS histories 30 minutes to compute the patterns An hour to instrument Eclipse And we are done!

Experimental Summary Pattern classification: 5 are usage patterns 5 are error patterns 5 are unlikely patterns 17 were not hit at runtime Error patterns Resulted in a total of 107 dynamically confirmed bugs Results for a single run of Eclipse

A Preview of Coming Attractions… We have a paper in FSE 2005 Describes a tool called DynaMine Provides various extensions: More complex patterns: State machines Grammars More applications analyzed More bugs found