Mining Function Usage Patterns to Find Bugs Chadd Williams.

Slides:



Advertisements
Similar presentations
VCE SD Theory Slideshows By Mark Kelly Vceit.com Debugging Techniques.
Advertisements

A Taste of Visual Studio 2005 David Grey. Introduction In this session we will introduce Visual Studio 2005 and its features and examine those features.
Javascript Code Quality Check Tools Javascript Code Quality Check Tools JavaScript was originally intended to do small tasks in webpages, but now JavaScript.
Understanding and Detecting Real-World Performance Bugs
Making Choices in C if/else statement logical operators break and continue statements switch statement the conditional operator.
Module 7: Advanced Development  GEM only slides here  Started on page 38 in SC09 version Module 77-0.
Chapter 4 Quality Assurance in Context
{ Dominion - Test Plan Version 1 – 22 nd Apr Aravind Palanisami.
Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
CS 501 : An Introduction to SCM & GForge An Introduction to SCM & GForge Lin Guo
Applied Software Project Management Andrew Stellman & Jennifer Greene Applied Software Project Management Applied Software.
Low level CASE: Source Code Management. Source Code Management  Also known as Configuration Management  Source Code Managers are tools that: –Archive.
Source Code Management Or Configuration Management: How I learned to Stop Worrying and Hate My Co-workers Less.
Automated Tests in NICOS Nightly Control System Alexander Undrus Brookhaven National Laboratory, Upton, NY Software testing is a difficult, time-consuming.
/* iComment: Bugs or Bad Comments? */
Source Control Repositories for Enabling Team Working Svetlin Nakov Telerik Corporation
Prof. Aiken CS 169 Lecture 71 Version Control CS169 Lecture 7.
What Are File Maintenance Techniques and Validation Techniques?
©Ian Sommerville 2000Software Engineering, 6th edition. Chapter 19Slide 1 Verification and Validation l Assuring that a software system meets a user's.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
1 Topics for this Lecture Software maintenance in general Source control systems (intro to svn)
Dr Andy Brooks1 FOR0383 Software Quality Assurance Lecture 1 Introduction Forkröfur/prerequisite: FOR0283 Programming II Website:
Software Engineering Modern Approaches
System Analysis and Design
CS4723 Software Validation and Quality Assurance
University of Maryland Bug Driven Bug Finding Chadd Williams.
Software Engineering CS3003
1 Lecture 19 Configuration Management Software Engineering.
Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.
Bug Localization with Machine Learning Techniques Wujie Zheng
Auditing Information Systems (AIS)
What Change History Tells Us about Thread Synchronization RUI GU, GUOLIANG JIN, LINHAI SONG, LINJIE ZHU, SHAN LU UNIVERSITY OF WISCONSIN – MADISON, USA.
Progress with migration to SVN Part3: How to work with g4svn and geant4tags tools. Geant4.
Testing. 2 Overview Testing and debugging are important activities in software development. Techniques and tools are introduced. Material borrowed here.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
Chapter 5 Control Structure (Repetition). Objectives In this chapter, you will: Learn about repetition (looping) control structures Explore how to construct.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
Verification and Validation Assuring that a software system meets a user's needs.
University of Maryland Mining Source Code Change History for Program Understanding Chadd Williams.
14th Oct 2005CERN AB Controls Development Process of Accelerator Controls Software G.Kruk L.Mestre, V.Paris, S.Oglaza, V. Baggiolini, E.Roux and Application.
Bug Localization with Association Rule Mining Wujie Zheng
Consensus-based Mining of API Preconditions in Big Code Hoan NguyenRobert DyerTien N. NguyenHridesh Rajan.
12 CVS Mauro Jaskelioff (originally by Gail Hopkins)
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
Copyright © 2014 Pearson Addison-Wesley. All rights reserved. Chapter 2 C++ Basics.
Chapter 16 Maintaining Information Systems. Objectives:  Explain and contrast four types of system maintenance.  Describe factors affecting maintenance.
Static Analysis Introduction Emerson Murphy-Hill.
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
ACES User Interface Workshop #1 Prototype Inspection 22. November 2011.
Source Control Repositories for Enabling Team Working Doncho Minkov Telerik Corporation
Content Coverity Static Analysis Use cases of Coverity Examples
AUDACIOUS: USER DRIVEN ACCESS CONTROL WITH UNMODIFIED OPERATING SYSTEM
Aspect-Oriented Programming
Implementation and Maintenance
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
Design and Programming
CodePeer Update Arnaud Charlet CodePeer Update Arnaud Charlet
Predicting Fault-Prone Modules Based on Metrics Transitions
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
CSE S. Tanimoto Paradigms
Introduction to Static Analyzer
Version Control CS169 Lecture 7 Prof. Aiken CS 169 Lecture 7 1.
Helping you make your code better
Systems Operations and Support
Chapter 15 Debugging.
Recommending Adaptive Changes for Framework Evolution
Presentation transcript:

Mining Function Usage Patterns to Find Bugs Chadd Williams

2/19 University of Maryland open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) Thesis Source code is full of interesting properties –describes how the source code is written –rule that one must adhere to for code to work correctly –what to do with values from a function –how to use an API Can we find the properties? –every change is committed –changes highlight misunderstood code We can discover important properties by looking at source code changes Can we use these rules to help the developer to find bugs?

3/19 University of Maryland Why? We wrote the code, we know the rules! Implicit rules build up over time –little or no documentation –failure to understand implicit rules causes bugs 32% of bugs detected during maintenance 1 How much do you know about your 10 year old code base? –Didn’t someone rewrite the matrix objects? –What about that third party library? [1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02

4/19 University of Maryland Static Analysis Analysis of code without execution –examine the source code only Many successful static analysis tools check for violations of system specific rules –how to use an internal API –specialized lock/unlock functionality –data validation requirements Often produces many false warnings –can historical information improve this?

5/19 University of Maryland General Technique Inspect each commit to each file Identify properties in each version Compare sets of properties to determine new instances of properties Identify commonly added properties … value = foo(); newPosition + = value; … value = foo(); if( value != error_code) { newPosition + = value; } … Commit

6/19 University of Maryland Evaluation Does historical information help? –can we get the same value by only looking at the latest version of the source code? Metric –are the likely bugs near the top? –cumulative precision Precision: number of likely bugs vs. number of warnings inspected

7/19 University of Maryland Return Value Check Bug Identify functions whose return value induces a code change … value = foo(); newPosition + = value; // ??? … value = foo(); if( value != error_code) { // Check newPosition + = value; } … Tool Inferred Bug Fix Apache Results Provide developers a list of sorted warnings –use historical information for sorting Chi-square = 6.15 p is less than or equal to 0.025

8/19 University of Maryland Discovering Function Usage Patterns Function Usage Pattern –describe function invocations with respect to each other static analysis intraprocedural –describe relationships between functions implicit rules mdi = HeapAlloc(GetProcessHeap()); if (!mdi) HeapFree(GetProcessHeap(), 0, cs); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps );

9/19 University of Maryland Goals Discover valid patterns –use data mining techniques to identify patterns Identify buggy patterns –which patterns commonly cause a code change Find violations of these patterns –static analysis –use history to rank violations

10/19 University of Maryland Mining Changes in Function Usage Patterns Find new instances of patterns –where that instance was not found in the revision immediately prior This finds a large number of patterns –need context to strengthen the ties between the pair of functions –Data Flow new instance of the pattern open() -> read() int foo(){ open(); } int foo(){ open(); read(); } Commit

11/19 University of Maryland Data Flow Identify data flow relationships between function pairs –produced/consume –use same data –update same data HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) hdc.x = genX(); Data flow confidence –what percent of new instances of foo() -> bar() have a data flow relationship?

12/19 University of Maryland Bug-Prone Patterns How does a new instance enter the source code –both of the function calls were added –one function call was added the added function completed the pairing bug fix? refactoring? Bug confidence –what percent of new instances of foo()->bar() are created by adding one function call? int foo(){ } int foo(){ open(); read(); } Commit int foo(){ open(); read(); close(); } Commit And which function call is most likely to be added?

13/19 University of Maryland Valid, Bug Prone Patterns Patterns added completely could indicate valid patterns Patterns added by adding one function call indicate: –refactoring/very misunderstood pattern –random noise Which are likely to be buggy? Two Function Calls Added One Function Call Added Two Function Calls Added One Function Call Added

14/19 University of Maryland Ranking of Violations Number of violations for each pattern –experience from the current code base Data Flow Confidence –which are valid patterns Bug Confidence –which have caused code changes in the past Confidence –how often, when foo() is added, is foo()->bar() created

15/19 University of Maryland Preliminary Results Student Projects – CS 3 –Introduction to C –CVS history for each student for each project CVS commit to see automated test results –50% precision on final submission Apache web server –50% precision rate top 10 warnings –identified a refactoring Wine TREEVIEW_ValidItem(tree,item); TREEVIEW_SendTreeviewNotify(tree,command,item);

16/19 University of Maryland Apache Case Study 1,129 C source files –includes modules –Apache Portable Runtime 41,000 CVS commits –6,000 compilable CVS transactions that change source files for the Linux version Studied httpd-2.0 branch –July 1996 through Oct 2003 –some files have history back through 1.0 branch

17/19 University of Maryland Apache Refactoring Found many patterns of this form: Thu Nov 18 23:07: UTC (6 years, 3 months ago) … I then changed all the fprintf(stderr calls to ap_log_error … Function 1Function 2Bug ConfidenceAdd Second Function shmcb_get_safe_uintap_log_error1.0 ssl_util_vhostidap_log_error Change debug logging –previously printf –now ap_log_error or ap_log_rerror Change debug logging –previously printf –now ap_log_error or ap_log_rerror How often is this pattern created by adding exactly one function call How often, when one function call is added to create this pattern, is it the second function call

18/19 University of Maryland Can we find bugs? Static analysis to identify violations of ap_log_error patterns –16 of first 20 warnings are likely bugs first 20 warnings involving ap_log_error –ranking based on violations per pattern bug confidence data flow confidence Why do these bugs exist? –missed refactorings –bugs caused by not knowing implicit rules This refactoring started in 1999

19/19 University of Maryland Conclusions Interesting properties can be mined from change history –function usage patterns Using historical information has improved static analysis tools –provide a list of ranked warnings to user –reduced false positive rate