Presentation is loading. Please wait.

Presentation is loading. Please wait.

End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator:

Similar presentations


Presentation on theme: "End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator:"— Presentation transcript:

1 End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator: Xavier Rival (INRIA)

2 2 Software errors cost a lot $60 billion ~$60 billion annually (~0.5% of US GDP) –2002 National Institute of Standards and Technology report total annual revenue of> 10x annual budget of> Bor-Yuh Evan Chang - End-User Program Analysis

3 3 But there’s hope in program analysis Microsoft Microsoft uses and distributes Static Driver Verifier the Static Driver Verifier Airbus Airbus applies Astrée Static Analyzer the Astrée Static Analyzer Companies, such as Coverity and Fortify, market static source code analysis tools Bor-Yuh Evan Chang - End-User Program Analysis

4 4 Because program analysis can eliminate entire classes of bugs For example, –Reading from a closed file: –Reacquiring a locked lock: How? –Systematically examine the program –Simulate running program on “all inputs” –“Automated code review” read( ); acquire( ); Bor-Yuh Evan Chang - End-User Program Analysis

5 5 … code … // x now points to an unlocked lock acquire(x); … code … analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang - End-User Program Analysis Simulate running program on “all inputs” x acquire(); acquire(x); … code …

6 6 in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state Program analysis by example: Checking for double acquires Bor-Yuh Evan Chang - End-User Program Analysis Simulate running program on “all inputs” x xx or …

7 7 … code … in a linked list // x now points to an unlocked lock in a linked list acquire() acquire(x); … code … ideal analysis state analysis state Must abstract Bor-Yuh Evan Chang - End-User Program Analysis x xx or … x abstract For decidability, must abstract—“model all inputs” (e.g., merge objects) not precise Abstraction too coarse or not precise enough (e.g., lost x is always unlocked) mislabels good code as buggy

8 8 To address the precision challenge Traditional Traditional program analysis mentality: specifications for our analysis “ Why can’t developers write more specifications for our analysis? Then, we could verify so much more.” default abstractions “ Since developers won’t write specifications, we will use default abstractions (perhaps coarse) that work hopefully most of the time.” End-user approach End-user approach: adapt the analysis “ Can we design program analyses around the user? Developers write testing code. Can we adapt the analysis to use those as specifications?” Bor-Yuh Evan Chang - End-User Program Analysis

9 9 Summary of overview Challenge in analysis: Finding a good abstraction precise enough but not more than necessary Powerful, generic abstractions expensive, hard to use and understand Built-in, default abstractions often not precise enough (e.g., data structures) End-user approach End-user approach: Must involve the user in abstraction without expecting the user to be a program analysis expert Bor-Yuh Evan Chang - End-User Program Analysis

10 10 Overview of contributions Extensible Inductive Shape Analysis [POPL’08,SAS’07] Precise inference of data structure properties Able to check, for instance, the locking example Targeted to software developers Uses data structure checking code for guidance  Turns testing code into a specification for static analysis Efficient ~10-100x speed-up over generic approaches  Builds abstraction out of developer-supplied checking code Bor-Yuh Evan Chang - End-User Program Analysis

11 Extensible Inductive Shape Analysis Precise Precise inference of data structure properties End-user End-user approach [POPL’08, SAS’07] …

12 12 Shape analysis is a fundamental analysis Data structures are at the core of – Traditional languages (C, C++, Java) – Emerging web scripting languages Improves verifiers that try to – Eliminate resource usage bugs (locks, file handles) – Eliminate memory errors (leaks, dangling pointers) – Eliminate concurrency errors (data races) – Validate developer assertions Enables program transformations – Compile-time garbage collection – Data structure refactorings … Bor-Yuh Evan Chang - End-User Program Analysis

13 13 Shape analysis by example: Removing duplicates // l is a sorted doubly-linked list for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly-linked with no duplicates; Example/Testing Code Review/Static Analysis “no duplicates” l “sorted dl list” l program-specific l 2244 l 244 cur l 24 “sorted dl list” l “segment with no duplicates” cur intermediate state more complicated Bor-Yuh Evan Chang - End-User Program Analysis

14 14 Shape analysis is not yet practical Choosing the heap abstraction difficult for precision Parametric in high-level, developer-oriented predicates + +Extensible + +Targeted to developers Xisa Built-in high-level predicates - -Hard to extend + +No additional user effort (if precise enough) Parametric in low-level, analyzer-oriented predicates + +Very general and expressive - -Hard for non-expert 89 Bor-Yuh Evan Chang - End-User Program Analysis Traditional approaches Traditional approaches: End-user approach End-user approach: Space Invader [Distefano et al.] TVLA [Sagiv et al.]

15 15 Key insight for being developer-friendly and efficient checking code Utilize “run-time checking code” as specification for static analysis. assert(sorted_dll(l,…)); for each node cur in list l { remove cur if duplicate; } assert(sorted_dll_nodup(l,…)); ll cur l Bor-Yuh Evan Chang - End-User Program Analysis dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) checker Contribution: Automatically generalize checkers for complicated intermediate states Contribution: Build the abstraction for analysis out of developer-specified checking code Contribution: Build the abstraction for analysis out of developer-specified checking code p specifies where prev should point

16 16 Our framework is … Extensible and targeted for developers –Parametric in developer-supplied checkers Precise yet compact abstraction for efficiency –Data structure-specific based on properties of interest to the developer shape analysis invariant checkers An automated shape analysis with a precise memory abstraction based around invariant checkers. shape analyzer dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang - End-User Program Analysis

17 17 Splitting Splitting of summaries To reflect updates precisely summarizing And summarizing for termination Shape analysis is an abstract interpretation on abstract memory descriptions with … cur l l l l l l Bor-Yuh Evan Chang - End-User Program Analysis

18 18 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers Bor-Yuh Evan Chang - End-User Program Analysis Learn information about the checker to use it as an abstraction 1 1 2 2 3 3 Compare and contrast manual code review and our automated shape analysis

19 19 Overview: Split summaries to interpret updates precisely l cur l Bor-Yuh Evan Chang - End-User Program Analysis Want abstract update to be “exact”, that is, to update one “concrete memory cell”. The example at a high-level: iterate using cur changing the doubly-linked list from purple to red. l cur split at cur update cur purple to red l cur Challenge: How does the analysis “split” summaries and know where to “split”? Challenge: How does the analysis “split” summaries and know where to “split”?

20 20 “Split forward” by unfolding inductive definition Ç dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis l cur get: cur ! next l cur null p dll(cur, p) l cur p dll(n, cur) n Analysis doesn’t forget the empty case

21 21 “Split backward” also possible and necessary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis l cur p dll(n, cur) n for each node cur in list l { remove cur if duplicate; } assertl is sorted, doubly- linked with no duplicates; “dll segment” l cur p0p0 dll(n, cur) n “dll segment” cur ! prev ! next = cur ! next; l cur dll(n, cur) n null get: cur ! prev ! next Ç Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding? Technical Details: How does the analysis do this unfolding? Why is this unfolding allowed? (Key: Segments are also inductively defined) [POPL’08] How does the analysis know to do this unfolding?

22 22 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions Bor-Yuh Evan Chang - End-User Program Analysis Contribution: Turns testing code into specification for static analysis 1 1 2 2 3 3 How do we decide where to unfold? Derives additional information to guide unfolding dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

23 23 memory cell (points-to: ° ! next = ± ) Abstract memory as graphs dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis l ® dll(null)dll( ¯ ) cur ° dll( ° ) ¯ prev next ± Make endpoints and segments explicit, yet high-level l dll( ±, ° ) ± “dll segment” cur ° ® segment summary checker summary (inductive pred) memory address (value) Contribution: Generalization of checker (Intuitively, dll( ®,null) up to dll( °, ¯ ).) Contribution: Generalization of checker (Intuitively, dll( ®,null) up to dll( °, ¯ ).) Some number of memory cells (thin edges) Which summary (thick edge), in what direction, and how far do we unfold to get the edge ¯ ! next (cur ! prev ! next)? ¯ next

24 24 Types for deciding where to unfold ® dll(null) dll( ¯ ) ° dll( ®,null) dll( ¯, ® ) dll( °, ¯ ) dll( ±, ° ) dll(null, ± ) Checker “Run” Checker “Run” (call tree/derivation) Instance Summary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) If it exists, where is: ° ! next ? ¯ ! next ? If it exists, where is: ° ! next ? ¯ ! next ? Checker Definition Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Says Says: from For h ! next/h ! prev, unfold from h before For p ! next/p ! prev, unfold before h Bor-Yuh Evan Chang - End-User Program Analysis

25 25 Types make the analysis robust with respect to how checkers are written ¯ dll( ® )dll( ¯ ) ° Instance Summary dll(h, p) = if (h = null) then true else h ! prev = p and dll(h ! next, h) Bor-Yuh Evan Chang - End-User Program Analysis Instance ¯ dll 0 ° Summary dll 0 (h) = if (h ! next = null) then true else h ! next ! prev = h and dll 0 (h ! next) Alternative doubly-linked list checker Doubly-linked list checker (as before) Different types for different unfolding

26 26 Summary of checker parameter types wherewhich Tell where to unfold for which fields robust Make analysis robust with respect to how checkers are written Learn where in summaries unfolding won’t help Bor-Yuh Evan Chang - End-User Program Analysis inferred automatically Can be inferred automatically with a fixed- point computation on the checker definitions

27 27 Summary of interpreting updates Splitting of summaries needed for precision Unfolding checkers is a natural way to do splitting When checker traversal matches code traversal Checker parameter types Enable, for example, “back pointer” traversal without blindly guessing where to unfold Bor-Yuh Evan Chang - End-User Program Analysis

28 28 Outline shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions Bor-Yuh Evan Chang - End-User Program Analysis 1 1 2 2 3 3 dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

29 29 Summarize by folding into inductive predicates last = l; cur = l ! next; while (cur != null) { // … cur, last … if (…) last = cur; cur = cur ! next; } list l, last next cur list l next curlast list l next curlast summarize list last list next cur list l Challenge: Precision (e.g., last, cur separated by at least one step) Previous approaches guess where to fold for each graph. Bor-Yuh Evan Chang - End-User Program Analysis Contribution: Determine where by comparing graphs across history

30 30 Summary: Given checkers, everything is automatic shape analyzer abstract interpretation splitting and interpreting update summarizing type inference on checker definitions Bor-Yuh Evan Chang - End-User Program Analysis dll(h, p) = if (h = null) then true else h ! prev = prev and dll(h ! next, h) checkers

31 31 Results: Performance Benchmark Max. Num. Graphs at a Program Pt ms Analysis Time (ms) singly-linked list reverse10.6 doubly-linked list reverse11.4 doubly-linked list copy25.3 doubly-linked list remove56.5 doubly-linked list remove and back56.8 search tree with parent insert58.3 search tree with parent insert and back547.0 two-level skip list rebalance687.0 Linux scull driver (894 loc) (char arrays ignored, functions inlined) 49710.0 Times negligible for data structure operations (often in sec or 1 / 10 sec) Expressiveness Expressiveness: Different data structures Verified shape invariant as given by the checker is preserved across the operation. Bor-Yuh Evan Chang - End-User Program Analysis TVLA: 850 ms TVLA: 290 ms Space Invader only analyzes lists (built-in) Space Invader only analyzes lists (built-in)

32 32 Demo: Doubly-linked list reversal http://xisa.cs.berkeley.edu Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Body of loop over the elements Body of loop over the elements: Swaps the next and prev fields of curr. Already reversed segment Node whose next and prev fields were swapped Not yet reversed list Bor-Yuh Evan Chang - End-User Program Analysis

33 33 Experience with the tool Checkers are easy to write Checkers are easy to write and try out – Enlightening (e.g., red-black tree checker in 6 lines) – Harder to “reverse engineer” for someone else’s code – Default checkers based on types useful Future expressiveness and usability improvements – Pointer arithmetic and arrays – More generic checkers: polymorphic“element kind unspecified” higher-orderparameterized by other predicates Future evaluation: user study Bor-Yuh Evan Chang - End-User Program Analysis

34 34 Summary of Extensible Inductive Shape Analysis Key Insight: Checkers as specifications Developer View:Global, Expressed in a familiar style Analysis View:Capture developer intent, Not arbitrary inductive definitions Constructing the program analysis Generalized segment Intermediate states: Generalized segment predicates types with levels Splitting: Checker parameter types with levels History-guided Summarizing: History-guided approach next list ®¯ c( ° )c0(°0)c0(°0) Bor-Yuh Evan Chang - End-User Program Analysis

35 35 Conclusion Extensible Inductive Shape Analysis precision demanding program analysis improved by novel user interaction Developer: Gets results corresponding to intuition Analysis:Focused on what’s important to the developer Practical precise tools for better software with an end-user approach! Bor-Yuh Evan Chang - End-User Program Analysis

36 What can inductive shape analysis do for you? http://xisa.cs.berkeley.edu


Download ppt "End-User Program Analysis Bor-Yuh Evan Chang University of California, Berkeley Dissertation Talk August 28, 2008 Advisor: George C. Necula, Collaborator:"

Similar presentations


Ads by Google