Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Function Usage Patterns to Find Bugs Chadd Williams.

Similar presentations


Presentation on theme: "Mining Function Usage Patterns to Find Bugs Chadd Williams."— Presentation transcript:

1 Mining Function Usage Patterns to Find Bugs Chadd Williams

2 2/19 University of Maryland open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) open(f) tmp = cnt = 0 while(cnt < sz & tmp != -1) tmp = read(f,sz) if(tmp != -1) cnt += tmp close(f) Thesis Source code is full of interesting properties –describes how the source code is written –rule that one must adhere to for code to work correctly –what to do with values from a function –how to use an API Can we find the properties? –every change is committed –changes highlight misunderstood code We can discover important properties by looking at source code changes Can we use these rules to help the developer to find bugs?

3 3/19 University of Maryland Why? We wrote the code, we know the rules! Implicit rules build up over time –little or no documentation –failure to understand implicit rules causes bugs 32% of bugs detected during maintenance 1 How much do you know about your 10 year old code base? –Didn’t someone rewrite the matrix objects? –What about that third party library? [1] Matsumura, T., Monden, A., Matsumoto, K., The Detection of Faulty Code Violating Implicit Coding Rules, IWPSE ’02

4 4/19 University of Maryland Static Analysis Analysis of code without execution –examine the source code only Many successful static analysis tools check for violations of system specific rules –how to use an internal API –specialized lock/unlock functionality –data validation requirements Often produces many false warnings –can historical information improve this?

5 5/19 University of Maryland General Technique Inspect each commit to each file Identify properties in each version Compare sets of properties to determine new instances of properties Identify commonly added properties … value = foo(); newPosition + = value; … value = foo(); if( value != error_code) { newPosition + = value; } … Commit

6 6/19 University of Maryland Evaluation Does historical information help? –can we get the same value by only looking at the latest version of the source code? Metric –are the likely bugs near the top? –cumulative precision Precision: number of likely bugs vs. number of warnings inspected

7 7/19 University of Maryland Return Value Check Bug Identify functions whose return value induces a code change … value = foo(); newPosition + = value; // ??? … value = foo(); if( value != error_code) { // Check newPosition + = value; } … Tool Inferred Bug Fix Apache Results Provide developers a list of sorted warnings –use historical information for sorting Chi-square = 6.15 p is less than or equal to 0.025

8 8/19 University of Maryland Discovering Function Usage Patterns Function Usage Pattern –describe function invocations with respect to each other static analysis intraprocedural –describe relationships between functions implicit rules mdi = HeapAlloc(GetProcessHeap()); if (!mdi) HeapFree(GetProcessHeap(), 0, cs); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps );

9 9/19 University of Maryland Goals Discover valid patterns –use data mining techniques to identify patterns Identify buggy patterns –which patterns commonly cause a code change Find violations of these patterns –static analysis –use history to rank violations

10 10/19 University of Maryland Mining Changes in Function Usage Patterns Find new instances of patterns –where that instance was not found in the revision immediately prior This finds a large number of patterns –need context to strengthen the ties between the pair of functions –Data Flow new instance of the pattern open() -> read() int foo(){ open(); } int foo(){ open(); read(); } Commit

11 11/19 University of Maryland Data Flow Identify data flow relationships between function pairs –produced/consume –use same data –update same data HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) DrawIcon( hdc, x, y, hIcon ); EndPaint( hwnd, &ps ); HDC hdc = BeginPaint( hwnd, &ps ); if( hdc ) hdc.x = genX(); Data flow confidence –what percent of new instances of foo() -> bar() have a data flow relationship?

12 12/19 University of Maryland Bug-Prone Patterns How does a new instance enter the source code –both of the function calls were added –one function call was added the added function completed the pairing bug fix? refactoring? Bug confidence –what percent of new instances of foo()->bar() are created by adding one function call? int foo(){ } int foo(){ open(); read(); } Commit int foo(){ open(); read(); close(); } Commit And which function call is most likely to be added?

13 13/19 University of Maryland Valid, Bug Prone Patterns Patterns added completely could indicate valid patterns Patterns added by adding one function call indicate: –refactoring/very misunderstood pattern –random noise Which are likely to be buggy? Two Function Calls Added One Function Call Added Two Function Calls Added One Function Call Added

14 14/19 University of Maryland Ranking of Violations Number of violations for each pattern –experience from the current code base Data Flow Confidence –which are valid patterns Bug Confidence –which have caused code changes in the past Confidence –how often, when foo() is added, is foo()->bar() created

15 15/19 University of Maryland Preliminary Results Student Projects – CS 3 –Introduction to C –CVS history for each student for each project CVS commit to see automated test results –50% precision on final submission Apache web server –50% precision rate top 10 warnings –identified a refactoring Wine TREEVIEW_ValidItem(tree,item); TREEVIEW_SendTreeviewNotify(tree,command,item);

16 16/19 University of Maryland Apache Case Study 1,129 C source files –includes modules –Apache Portable Runtime 41,000 CVS commits –6,000 compilable CVS transactions that change source files for the Linux version Studied httpd-2.0 branch –July 1996 through Oct 2003 –some files have history back through 1.0 branch

17 17/19 University of Maryland Apache Refactoring Found many patterns of this form: Thu Nov 18 23:07:53 1999 UTC (6 years, 3 months ago) … I then changed all the fprintf(stderr calls to ap_log_error … Function 1Function 2Bug ConfidenceAdd Second Function shmcb_get_safe_uintap_log_error1.0 ssl_util_vhostidap_log_error0.81.0 Change debug logging –previously printf –now ap_log_error or ap_log_rerror Change debug logging –previously printf –now ap_log_error or ap_log_rerror How often is this pattern created by adding exactly one function call How often, when one function call is added to create this pattern, is it the second function call

18 18/19 University of Maryland Can we find bugs? Static analysis to identify violations of ap_log_error patterns –16 of first 20 warnings are likely bugs first 20 warnings involving ap_log_error –ranking based on violations per pattern bug confidence data flow confidence Why do these bugs exist? –missed refactorings –bugs caused by not knowing implicit rules This refactoring started in 1999

19 19/19 University of Maryland Conclusions Interesting properties can be mined from change history –function usage patterns Using historical information has improved static analysis tools –provide a list of ranked warnings to user –reduced false positive rate


Download ppt "Mining Function Usage Patterns to Find Bugs Chadd Williams."

Similar presentations


Ads by Google