Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology.

Similar presentations


Presentation on theme: "Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology."— Presentation transcript:

1 Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology arnaud@kestreltechnology.com Please see Notes View for annotations.

2 Page 2 5/2/2007  Kestrel Technology LLC Static Analysis Code Static Analysis Tool Parameters List of defects Confirmed property violations Indeterminates (possible violations)

3 Page 3 5/2/2007  Kestrel Technology LLC Can we find all defects? Classic undecidability results in Computer Science apply (halting problem) We require soundness (no defects are missed) –A conservative approach is acceptable Abstract Interpretation is the enabling theory in CodeHawk –Sound –Tunably precise –Scalable –“Generatively general”

4 Page 4 5/2/2007  Kestrel Technology LLC Intuition SW Defects All lines in the code Abstract Interpretation

5 Page 5 5/2/2007  Kestrel Technology LLC Abstract Interpretation Computes an envelope of all data in the program Mathematical assurance Static analyzers based on Abstract interpretation are difficult to engineer KT’s expertise: building scalable and effective abstract interpreters

6 Page 6 5/2/2007  Kestrel Technology LLC Properties vs. defects An application might be free of programming language defects but not carry the desired property –resource issues (memory, execution time) –separation –range of output data Abstract Interpretation can address application-specific of properties as well as language defects

7 Page 7 5/2/2007  Kestrel Technology LLC Objectives Go over a detailed example –Understand how the technology works Achievements and challenges in the engineering of abstract interpreters –What it means to build an analyzer based on Abstract Interpretation

8 Page 8 5/2/2007  Kestrel Technology LLC Detailed example

9 Page 9 5/2/2007  Kestrel Technology LLC Buffer overflow for(i = 0; i < 10; i++) { if(message[i].kind == SHORT_CMD) allocate_space (channel, 1000); else allocate_space (channel, 2000); } Can we exceed the channel’s buffer capacity?

10 Page 10 5/2/2007  Kestrel Technology LLC Control Flow Graph i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop

11 Page 11 5/2/2007  Kestrel Technology LLC i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop Analytic model of the code i < 10 ? i = 0 i++ M = M + 2000 M = M + 1000 stop M = 0

12 Page 12 5/2/2007  Kestrel Technology LLC Analysis process intuition We mimic the execution of the program We collect all possible data values for all variables for all possible traces

13 Page 13 5/2/2007  Kestrel Technology LLC Analyzing the model i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

14 Page 14 5/2/2007  Kestrel Technology LLC Initially i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

15 Page 15 5/2/2007  Kestrel Technology LLC Loop initialization i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

16 Page 16 5/2/2007  Kestrel Technology LLC Loop entry i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

17 Page 17 5/2/2007  Kestrel Technology LLC Analyzing a branching (1) i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

18 Page 18 5/2/2007  Kestrel Technology LLC Analyzing a branching (2) i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M ?

19 Page 19 5/2/2007  Kestrel Technology LLC Accumulating all possible values i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

20 Page 20 5/2/2007  Kestrel Technology LLC Abstraction of point clouds We want the analysis to terminate in reasonable time We need a tractable representation of point clouds in arbitrary dimensions Abstract Interpretation offers a broad choice of such representations Example: convex polyhedra –Compute the convex hull of a point cloud

21 Page 21 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M

22 Page 22 5/2/2007  Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

23 Page 23 5/2/2007  Kestrel Technology LLC Iterating the loop analysis i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M

24 Page 24 5/2/2007  Kestrel Technology LLC Building the loop invariant i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

25 Page 25 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

26 Page 26 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M

27 Page 27 5/2/2007  Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

28 Page 28 5/2/2007  Kestrel Technology LLC Building the loop invariant i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M

29 Page 29 5/2/2007  Kestrel Technology LLC Keeping iterating… i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

30 Page 30 5/2/2007  Kestrel Technology LLC Passing to the limit We want this iterative process to end at some point We need to converge when analyzing loops After some iteration steps, we use a widening operation at loop entry to enforce convergence

31 Page 31 5/2/2007  Kestrel Technology LLC Widening  Let a 1, a 2, …a n, … be a sequence of polyhedra, then the sequence –w 1 = a 1 –w n+1 = w n  a n+1 is ultimately stationary The widening is a join operation i.e., a  a  b & b  a  b

32 Page 32 5/2/2007  Kestrel Technology LLC Widening for intervals [a, b]  [c, d] = [if c < a then -  else a, if b < d then +  else b] Example: [10, 20]  [11, 30] = [10, +  ]

33 Page 33 5/2/2007  Kestrel Technology LLC Widening for polyhedra We eliminate the faces of the computed convex envelope that are not stable Convergence is reached in at most N steps where N is the number of faces of the polyhedron at loop entry

34 Page 34 5/2/2007  Kestrel Technology LLC Widening i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M

35 Page 35 5/2/2007  Kestrel Technology LLC After widening i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M

36 Page 36 5/2/2007  Kestrel Technology LLC Detecting convergence Abstract iteration sequence –F 1 = P (initial polyhedron) –F n+1 = F n if S(F n )  F n F n  S(F n ) otherwise where S is the semantic transformer associated to the loop body Theorem: if there exists N such that F N+1  F N, then F n = F N for n > N.

37 Page 37 5/2/2007  Kestrel Technology LLC Convergence i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M The computation has converged

38 Page 38 5/2/2007  Kestrel Technology LLC We are not done yet… The analyzer has just proven that 1000 * i ≤ M ≤ 2000 * i But we have lost all information about the termination condition 0 ≤ i ≤ 10 Since we have obtained an envelope of all possible values of the variables, if we run the computation again we still get such an envelope The point is that this new envelope can be smaller This refinement step is called narrowing

39 Page 39 5/2/2007  Kestrel Technology LLC Refinement i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ M i i M 9

40 Page 40 5/2/2007  Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i i M 9 i i M 9

41 Page 41 5/2/2007  Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M 9

42 Page 42 5/2/2007  Kestrel Technology LLC Back to loop entry i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M 10 1

43 Page 43 5/2/2007  Kestrel Technology LLC Narrowing i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M 10 M i

44 Page 44 5/2/2007  Kestrel Technology LLC Refined loop invariant i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ M i 10

45 Page 45 5/2/2007  Kestrel Technology LLC Invariant at loop exit i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ M i 10 20,000 10,000 i  10

46 Page 46 5/2/2007  Kestrel Technology LLC [ ] Interpretation of the results 5,000 15,000 25,000 10,000 20,000 Space allocated in the buffer Buffer size: Certain buffer overflow Possible buffer overflow No buffer overflow

47 Page 47 5/2/2007  Kestrel Technology LLC Achievements and challenges

48 Page 48 5/2/2007  Kestrel Technology LLC Assured static analyzers for NASA C Global Surveyor: verified array bound compliance for NASA mission-critical software –Mars Exploration Rovers: 550K LOC –Deep Space 1: 280K LOC –Mars Path Finder: 140K LOC Pointer analysis: –International Space Station payload software (major bug found)

49 Page 49 5/2/2007  Kestrel Technology LLC What we observe No scalable, precise, and general-purpose abstract interpreter PolySpace: –Handles all kinds of runtime errors –Decent precision (<20% indeterminates) –Doesn’t scale (topped out at  40K LOC) Customization is the key –Specialized for a property or a class of applications –Manually crafted by experts

50 Page 50 5/2/2007  Kestrel Technology LLC CodeHawk™ Abstract Interpretation development platform/ static analyzer generator Automated generation of customized static analyzers –Leverage from pre-built analyzers –Directly tunable by the end-user

51 Page 51 5/2/2007  Kestrel Technology LLC Where CodeHawk stands Application: architecture environment input range usage protocol … Abstract Interpretation: application independent abstract model algorithms difficult to implement needs a lot of expertise CodeHawk

52 Page 52 5/2/2007  Kestrel Technology LLC Recent Applications Malware detection –Customized analyzers for specific kinds of malware –Naturally resistant to complex obfuscating transformations –Evaluated on DoD-supplied test case Library/Component analysis –Proof of absence of buffer overflow in OpenSSH’s dynamic buffer library Shared variables –Protection policies for shared variables –Evaluated on a code from a large DoD prime contractor

53 Page 53 5/2/2007  Kestrel Technology LLC Conclusions Promising and proven technology –key distinction for assurance: no false negatives –can verify application properties as well as detect defects –can be tailored for various domains (e.g., malware) Not a silver bullet..rather a bullet generator –each modeled domain offers leverage –required expertise for broad use must still be reduced


Download ppt "Page 1 5/2/2007  Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk  Arnaud Venet Kestrel Technology."

Similar presentations


Ads by Google