Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenges in Program Analysis

Similar presentations


Presentation on theme: "Challenges in Program Analysis"— Presentation transcript:

1 Challenges in Program Analysis
Mooly Sagiv

2 Content Future directions of program analysis
Open problems in program analysis

3 Future Directions New applications New abstractions
Combine with other methods Dynamic analysis Decision procedures Machine learning

4 SQL injection String queryString = "SELECT info FROM userTable WHERE "; if ((! login.equals("")) && (! password.equals(""))) { queryString += "login='" + login + "' AND pass='" + password + "'"; } else { queryString+="login='guest'"; } ResultSet tempSet = stmt.executeQuery(queryString); User submits: login “doe” and password “xyz” SELECT info FROM users WHERE login=’doe’ AND pass=’xyz’ Attacker submits: login “admin’ – ” and password “”

5 SQL injection solutions
Compile-time detection Static string context analysis followed by a cheap runtime check Regular languages Context free languages Taint dynamic monitoring

6 Other code injection attacks
Heap spraying Shell/script injection html injection

7 Static Analysis of long lived programs
Static analysis can be applied at runtime Just in time compilation Cloud computing Distributed applications Hadoop

8 The Prime Code Search Tool Alon Mishne & Eran Yahav
Lots of public domain code Hard to look for the right code sequence Use abstract interpretation

9 Static Analysis for Program Equivalence
Program equivalence applications Compiler correctnes Software patches Can we use static analysis to detect equivalence? Instrument & Abstract

10 Abstraction-Guided Synthesis
Eran Yahav Technion Joint work with Martin Vechev, Greta Yorsh, Michael Kuperstein, Veselyn Raychev

11 Verification with Abstraction
P  S ?

12 Now what? P ’ S Refine the abstraction P  S

13 Alternatively… P ’ S’ Relax the specification (but to what?) P  S

14 Alternatively… P’ ’ S Change the program P  S

15 A Standard Approach: Abstraction Refinement
program Valid specification Abstract counter example Verify abstraction Abstract counter example Abstraction Refinement Change the abstraction to match the program

16 Abstraction-Guided Synthesis [VYY-POPL’10]
program P’ Program Restriction Implement specification Abstract counter example Verify Notes: Should say here --- could have many solutions, implement is picking one based on quantitative criterion Constraint captures changes to the program execution --- what is permitted during program execution abstraction Abstract counter example Abstraction Refinement Change the program to match the abstraction

17 Example Initially x = z = 0 Every single statement is atomic
1: y1 = f(x) 2: y2 = x 3: assert(y1 != y2) f(x) { if (x == 1) return 3 else if (x == 2) return 6 else return 5 } Initially x = z = 0 Every single statement is atomic f(x) is atomic

18 Example: Concrete Values
y1 6 5 x += z; x += z; z++;z++;y1=f(x);y2=x;assert  y1=5,y2=0 4 3 2 z++; x+=z; y1=f(x); z++; x+=z; y2=x;assert  y1=3,y2=3 1 1 2 3 4 y2 Concrete values T1 1: x += z 2: x += z T2 1: z++ 2: z++ T3 1: y1 = f(x) 2: y2 = x 3: assert(y1 != y2) f(x) { if (x == 1) return 3 else if (x == 2) return 6 else return 5 }

19 Example: Parity Abstraction
2 3 1 4 5 6 y2 y1 6 5 4 3 2 1 1 2 3 4 y2 Concrete values Parity abstraction (even/odd) x += z; x += z; z++;z++;y1=f(x);y2=x;assert  y1=Odd,y2=Even T1 1: x += z 2: x += z T2 1: z++ 2: z++ T3 1: y1 = f(x) 2: y2 = x 3: assert(y1 != y2) f(x) { if (x == 1) return 3 else if (x == 2) return 6 else return 5 }

20 Dynamically enforce consistency
Use static analysis to reduce the cost Example array-bound check int a[100] … for (i=0; i <n; i++) { … … a[i] … Can be very effective Hardware support may be available Does not assure the absence of bugs

21 Reducing the cost of static analysis via dynamic analysis
Quickly find properties which do not hold Locate good abstractions

22 Abstractions from Tests [POPL’12]
program P query q info Dynamic Analysis Parameter Inference parameter Parametric Static Analysis proof don’t know disproved

23 Hypothesis If a query is simple, we can find why the query holds simply by looking at a few execution traces

24 Parameter Inference based on separability
[[Q]] (a)

25 Thread-local information
Does a local variable point to an object that cannot be reached from other threads Reachable from global for (i = 0; i < n; i++) { x0 = new h0; x1 = new h1; x1.f1 = x0; x2 = new h2; x2.f2 = x1; x3 = new h3; x3.f3 = x2; x0.start(); pc: x2.id = i; //local(x2)? x3.start(); }

26 Parametric thread-escape analysis
Represent the heap with two summary nodes E and L (L) represents objects which are guaranteed to be thread-local (E) represents objects which may escape Param = AllocSite  {L, E} L can move to E but not vice versa

27 Example Partition for (i = 0; i < n; i++) { x0 = new h0;//E
x1.f1 = x0; x2 = new h2;//L x2.f2 = x1; x3 = new h3;// L x3.f3 = x2; x0.start(); pc: x2.id = i; //local(x2)? x3.start(); }

28 Difficulties in choosing a good parameter
Using more L makes the analysis more expensive More L does not necessarily mean more precision for (i = 0; i < n; i++) { x0 = new h0;//L x1 = new h1; // L x1.f1 = x0; x2 = new h2;//L x2.f2 = x1; x3 = new h3;// L x3.f3 = x2; x0.start(); pc: x2.id = i; //local(x2)? x3.start(); }

29 Setting for the experiments
6 concurrent Java programs from Dacapo: 161K - 491K bytecode (including analyzed JDK) Up to 5K allocation sites per program 47K queries, but only 17K(37%) reached during testing

30 Experiments 6-8 s program P query q 38s-86ms info Dynamic Analysis
Parameter Inference parameter Parametric Static Analysis 20% don’t know 52% proved 28% disproved

31 Summary Learning for Dynamic Analysis
Can be effective Not sure that actual runs are needed

32 Abstractions can be used to improve dynamic analysis
Debugging Garbage collection

33 Abstractions and Decision Procedures
Compute best transformers Assume Gurantee reasoning CEGAR (interpolants) Abduction for composing analysis Abstraction can help scaling decision procedures SMT

34 Symbolic Operations: Three Value-Spaces
T# T Concrete Values Formulas Abstract Values

35 Symbolic Operations: Three Value-Spaces
2, 4, 16, … x=E  even(x) Concrete Values Formulas Abstract Values

36 Symbolic Operations: Three Value-Spaces
x ... u1 x u  Concrete Values Formulas Abstract Values

37 Required Primitive Operations
Abstraction (S) = storeS (store) ( ) = { } Symbolic concretization ( ) = v1,v2 : nodeu1(v1)  nodeu (v2)  v1 ≠ v2  v : nodeu1(v)  nodeu (v) Theorem prover returning a satisfying structure (store) S   u1 x u x u1 x u

38 Constant-Propagation Domain
(Var  ZT), where ZT = Examples: , [x0, y43, z0], [xT, yT, z0], [xT, yT, z T] Infinite cardinality, but finite height

39 Three Value-Spaces   Concrete Values Formulas Abstract Values 
[x0, y0, z0] [x0, y1, z0] [x0, y2, z0] [x0, yT, z0] (x = 0)  (z = 0)  Concrete Values Formulas Abstract Values

40 Three Value-Spaces   Concrete Values Formulas Abstract Values
[x0, y0, z0] [x0, y1, z0] [x0, y2, z0] (x = 0)  (z = 0) [x0, y2, z0] Concrete Values Formulas Abstract Values

41 Required Primitive Operations
Abstraction (S) = storeS (store) ([x  0, y  2, z  0]) = [x0, y2, z0] Symbolic concretization ([x0, yT, z0]) = (x = 0)  (z = 0) Theorem prover returning a satisfying structure (store) S   [x  0, y  2, z  0]  (x = 0)  (z = 0)

42 Required Primitive Operations
Abstraction (S) = storeS (store) ([x  0, y  2, z  0]) = [x0, y2, z0] Symbolic concretization ([x0, yT, z0]) = (x = 0)  (z = 0) Theorem prover returning a satisfying structure (store) S   [x  0, y  2, z  0]  (z = 0)  (x = y*z)

43 Constant Propagation λe.e[x e(y)*e(z)] [x3, y4, z1] x = y * z
T[x = y * z] λe.e[x e(y)*e(z)] [x’4, y’4, z’1] T[x := y*z] =df (x’ = y * z)  (y’ = y)  (z’ = z)  (x’ = y * z)  (y’ = y)  (z’ = z) [x3, y4, z1, x’4, y’4, z’1]

44 Constant Propagation λe.e[x e(y) # e(z)] x = y * z T#[x = y * z]
[x3, yT, z1] x = y * z T#[x = y * z] λe.e[x e(y) # e(z)] [x’T, y’T, z’1]

45 Three Value-Spaces α  Concrete Values Formulas αT Abstract Values
[x’0,y’T,z’0] αT (x’ = 0)  (z’ = 0) T[x := y*z] [xT,yT,z0] (z = 0) Abstract Values

46 Remainder () – best abstract value that represents 
Best = T   – best abstract transformer

47 Idea Behind Procedure CP()
ans  Concrete Values Formulas Abstract Values

48 Idea Behind Procedure CP()
S   S  (S) ans Concrete Values Formulas Abstract Values

49 Idea Behind Procedure CP()
  (ans) S   S  (ans) (S) (ans) ans Concrete Values Formulas Abstract Values

50 Idea Behind Procedure CP()
1 1  (ans) S  1 1 (ans) (S) (ans) S ans Concrete Values Formulas Abstract Values

51 Idea Behind Procedure CP()
2 S  2 S (S) 2 ans Concrete Values Formulas Abstract Values 2 = 1  (ans)

52 Idea Behind Procedure CP()
2 2  (ans) S  2 S (ans) (ans) (S) 2 ans Concrete Values Formulas Abstract Values

53 Idea Behind Procedure CP()
(ans)  (ans), (ans) 5 = false ans Concrete Values Formulas Abstract Values

54 Procedure   (formula ) { ans :=   :=  while ( is satisfiable) {
Select a store S such that S   ans := ans  (S)  :=   (ans) } return ans

55 Procedure CP()   ans Concrete Values Formulas Abstract Values
(z = 0)  (x = y * z) [x0, y43, z0] S ans [x0,y43,z0] Concrete Values Formulas Abstract Values

56 Procedure CP()   ans Concrete Values Formulas Abstract Values
(z = 0)  (x = y * z)   (ans) [x0, y43, z0] S (x = 0)  (y = 43)  (z = 0) (ans) ans [x0,y43,z0] Concrete Values Formulas Abstract Values

57 Procedure CP()   Concrete Values Formulas Abstract Values S
(z = 0) (x = y * z) (y  43) S [x0, y24, z0] [x0,y24,z0] [x0, y43, z0] Concrete Values Formulas Abstract Values

58 Procedure CP()   ans Concrete Values Formulas Abstract Values  S
(z = 0) (x = y * z) (y  43) [x0, yT, z0] S (x = 0)  (z = 0) (x = 0)  (z = 0) ans Concrete Values Formulas Abstract Values

59 The Idea Behind Best = T  
(a)T (a) a (a) T Formulas Abstract Values

60 The Idea Behind Best = T  
(a)T (a) a (a) T Formulas Abstract Values

61 The Idea Behind Best = T  
(a)T (a) a (a) ans T Formulas Abstract Values

62 The Idea Behind Best = T  
(a)T (a) a (a) ans T Formulas Abstract Values

63 Procedure Best Best(two-store-formula T, abs-store a) { ans’ := ’
while ( is satisfiable) { Select a store pair (S,S ’) such that (S,S ’)   ans’ := ans’   ’(S ’)  :=   ’(ans’) } return ans’

64 Best((x’ = y * z)  (y’ = y)  (z’ = z), [xT, yT, z0])
Initialization: ans’ := ’  := (z = 0)  (x’ = y * z)  (y’ = y)  (z’ = z) Iteration 1: (S,S ’) := [x  5, y  17, z  0, x’ 0, y’ 17, z’ 0]

65 The Idea Behind Best = T  
[ x’0, y’17, z’0] (a)T (a) a (a) [x5, y17, z0] Formulas Abstract Values T

66 Best((x’ = y * z)  (y’ = y)  (z’ = z), [xT, yT, z0])
Initialization: ans’ := ’  := (z = 0)  (x’ = y * z)  (y’ = y)  (z’ = z) Iteration 1: (S,S ’) := [x  5, y  17, z  0, x’ 0, y’ 17, z’ 0] ans’ := [x’0, y’17, z’0] ’(ans’) = (x’= 0)  (y’= 17)  (z’= 0)  := (z = 0)  (x’ = y*z)  (y’ = y)  (z’ = z)  (y’  17)

67 Best((x’ = y * z)  (y’ = y)  (z’ = z), [xT, yT, z0])
Iteration 2: (S,S ’) := [x 12, y  99, z  0, x’ 0, y’ 99, z’ 0] ans’ := [x’0, y’17, z’0]  [x’0, y’99, z’0] = [x’0, y’T, z’0] ’(ans’) = (x’= 0)  (z’= 0)  := (z = 0)  (x’ = y * z)  (y’ = y)  (z’ = z)  (y’  17)  ((x’  0)  (z’  0)) = false Iteration 3:  is unsatisfiable Return value: [x’0, y’T, z’0]

68 . . .  (y’(v)   v1: x(v1)  n(v1,v))  . . .
u1 x u Best(y = x  next, ) r[x] r[x] . . .  (y’(v)   v1: x(v1)  n(v1,v))  . . . u1 u2 u3 x’ r[x]’,r[y]’ r[x]’ y’ x r[x] u4 u2 x u r[x],r[y] u1 r[x] y

69 Predicate Abstraction
y := 3 x := 4*y + 1 [x  13, y  3] { B1  (y = 1), B2  (y = 3), B3  (y = 4), B4  (x = 1), B5  (x = 3), B6  (x = 4) } B1  B2  B3  B4  B5  B6 y = 3  x {1, 3, 4} [x  13, y  3]

70 Three Value-Spaces   Concrete Values Formulas Abstract Values
[x5, y3] [x0, y3] [x17, y3] (B1, B2,B3, B4,B5,B6) (y ≠ 1)  (y = 3)  (y ≠ 4)  (x ≠ 1)  (x ≠ 3)  (x ≠ 4) Abstract Values

71 Three Value-Spaces α  Concrete Values Formulas αT Abstract Values
(B1, B2,B3,B6) α αT (y ≠ 1)  (y = 3)  (y ≠ 4)  (x ≠ 4) T[x := x+1] (B1, B2,B3, B4,B5,B6) (y ≠ 1)  (y = 3)  (y ≠ 4)  (x ≠ 1)  (x ≠ 3)  (x ≠ 4) Abstract Values

72 Predicate Abstraction
Abstract values (B1, B2, B3, B4, B5, B6) Apply , which performs  symbolically (y ≠ 1)  (y = 3)  (y ≠ 4)  (x ≠ 1)  (x ≠ 3)  (x ≠ 4) Apply T, which implements α  T

73 α PA: Most-Precise Abstract Value [Predicate Abstraction]
(B1, B2,B3, B4,B5,B6) αPA (y = 3)  (x = 4*y + 1) Concrete Values Formulas Abstract Values

74 α PA: Most-Precise Abstract Value [Predicate Abstraction]
false j = 1 k Bj if   j is valid Bj if   j is valid true otherwise if  is unsatisfiable otherwise PA((y = 3)  (x = 4*y + 1)) = B1, B2, B3, B4, B5, B6 (y = 3)  (x = 4*y + 1)  (y = 1) (y = 3)  (x = 4*y + 1)  (y = 3) (y = 3)  (x = 4*y + 1)  (y = 4)

75 α PA: Most-Precise Abstract Value [Predicate Abstraction]
false j = 1 k Bj if   j is valid Bj if   j is valid true otherwise if  is unsatisfiable otherwise PA((y = 3)  (x = 4*y + 1)) = B1, B2, B3, B4, B5, B6 (y = 3)  (x = 4*y + 1)  (x = 1) (y = 3)  (x = 4*y + 1)  (x = 3) (y = 3)  (x = 4*y + 1)  (x = 4)

76 Procedure PA vs. General 
Concrete Values Formulas Abstract PA i Formulas Concrete Values Abstract i S  i S ansi = ansi-1 (S) ansi-1  (ansi-1) 

77 Conclusions Requirements () – best abstract value that represents 
Finite-height abstract domain Theorem prover that returns a satisfying structure (store) (S) = sS (S) Symbolic-concretization operation () () – best abstract value that represents  Best(T,a) – best abstract transformer

78 Abstractions and Machine Learning Sriram Rajamani & Percy Liang
Machine learning techniques can learn abstractions Abstractions can be used for machine learning

79 Open Problems in Program Analysis
Predictability Specializing static analysis to a set of programs Numerical analysis Polyhedra Cost of operations Scaling Disjunctions Narrowing Interesting subclasses Shape Analysis Ownership Non disjunctive domains Binary widening Can PL be designed for better static analysis? Concurrency Abstractions relations on programs Modularity Is greatest fixed point the answer? Theory of AI Widening Montonicity Necessity


Download ppt "Challenges in Program Analysis"

Similar presentations


Ads by Google