Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000

Similar presentations


Presentation on theme: "Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000"— Presentation transcript:

1 Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000 http://www.cs.wisc.edu/~reps/

2 PLDI 00 Registration Form PLDI 00: …………………….. $ ____ Tutorial (morning): …………… $ ____ Tutorial (afternoon): ………….. $ ____ Tutorial (evening): ……………. $ – 0 –

3 Applications Program optimization Program-understanding and software-reengineering Security –information flow Verification –model checking –security of crypto-based protocols for distributed systems

4 1987 1993 1994 1995 1997 1998 1996 Slicing & Applications Dataflow Analysis Demand Algorithms Set Constraints Structure- Transmitted Dependences CFL Reachability

5 ... As Well As... Flow-insensitive points-to analysis Complexity results –Linear... cubic... undecidable variants –PTIME -completeness Model checking of recursive hierarchical finite-state machines –“infinite”-state systems –linear-time and cubic-time algorithms

6 ... And Also Analysis of attribute grammars Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Formal-language problems –CFL-recognition (given G and , is   L(G)?) –2DPDA- and 2NPDA-simulation Given M and , is   L(M)? String-matching problems

7 Unifying Conceptual Model for Dataflow-Analysis Literature Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] Linear-time bidirectional gen-kill [Dhamdhere 94] Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]

8 Collaborators Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski David Binkley Michael Benedikt Patrice Godefroid

9 Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

10 Program Slicing The backward slice w.r.t variable v at program point p The program subset that may influence the value of variable v at point p. The forward slice w.r.t variable v at program point p The program subset that may be influenced by the value of variable v at point p.

11 int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Backward Slice Backward slice with respect to “printf(“%d\n”,i)”

12 int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Backward Slice Backward slice with respect to “printf(“%d\n”,i)”

13 int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); } Slice Extraction Backward slice with respect to “printf(“%d\n”,i)”

14 Forward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Forward slice with respect to “sum = 0”

15 Forward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); }

16 Who Cares About Slices? Understanding programs Restructuring Programs Program Specialization and Reuse Program Differencing Testing (and Retesting) Year 2000 Problem Automatic Differentiation

17 What Are Slices Useful For? Understanding Programs –What is affected by what? Restructuring Programs –Isolation of separate “computational threads” Program Specialization and Reuse –Slices = specialized programs –Only reuse needed slices Program Differencing –Compare slices to identify changes Testing –What new test cases would improve coverage? –What regression tests must be rerun after a change?

18 Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line (FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

19 Character-Count Program void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line (FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

20 Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line (FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

21 Line-Count Program void line_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line2 (FILE *f, BOOL *bptr, int *iptr); scan_line2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line2(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

22 Specialization Via Slicing wc -lc wc -c wc -l void line_count(FILE *f); Not partial evaluation!

23 How are Slices Computed? Reachability in a Dependence Graph –Program Dependence Graph (PDG) Dependences within one procedure Intraprocedural slicing is reachability in one PDG –System Dependence Graph (SDG) Dependences within entire system Interprocedural slicing is reachability in the SDG

24 How is a PDG Created? Control Flow Graph (CFG) PDG is union of: Control Dependence Graph Flow Dependence Graph computed from CFG

25 Control Flow Graph Enter sum = 0i = 1 while(i < 11) printf(sum)printf(i) sum = sum + ii = i + i T F int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); }

26 Flow Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0printf(sum) printf(i) sum = sum + ii = i + i Flow dependence pq Value of variable assigned at p may be used at q. i = 1 while(i < 11)

27 q is reached from p if condition p is true (T), not otherwise. Control Dependence Graph Control dependence pq T pq F Similar for false (F). Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T T T T int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); }

28 Program Dependence Graph (PDG) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T Control dependence Flow dependence T T T

29 Program Dependence Graph (PDG) int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T T T T Opposite Order Same PDG

30 Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T T T T

31 Backward Slice (2) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i T T T T T T T T

32 Backward Slice (3) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i T T T T T T T T

33 Backward Slice (4) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i T T T T T T T T

34 Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); } Enter i = 1 while(i < 11) printf(i) i = i + i T T T T T

35 CodeSurfer

36

37

38

39 Browsing a Dependence Graph Pretend this is your favorite browser What does clicking on a link do? You get a new page Or you move to an internal tag

40

41

42

43 Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

44 Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

45 int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } Interprocedural Slice int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

46 Each PDG has nodes for –entry point –procedure parameters and function result Each call site has nodes for –call –arguments and function result Appropriate edges –entry node to parameters –call node to arguments –call node to entry node –arguments to parameters How is an SDG Created?

47 System Dependence Graph (SDG) Enter main Call p Enter p

48 SDG for the Sum Program Enter main sum = 0i = 1 while(i < 11) printf(sum) printf(i) Call add x in = sum y in = i sum = x out x in = iy in = 1i = x out Enter add x = x in y = y in x = x + yx out = x

49 Interprocedural Backward Slice Enter main Call p Enter p

50 Interprocedural Backward Slice (2) Enter main Call p Enter p

51 Interprocedural Backward Slice (3) Enter main Call p Enter p

52 Interprocedural Backward Slice (4) Enter main Call p Enter p

53 Interprocedural Backward Slice (5) Enter main Call p Enter p

54 Interprocedural Backward Slice (6) Enter main Call p Enter p [ ] ) (

55 Matched-Parenthesis Path ) ( ) [

56 Interprocedural Backward Slice (6) Enter main Call p Enter p

57 Interprocedural Backward Slice (7) Enter main Call p Enter p

58 Slice Extraction Enter main Call p Enter p

59 Slice of the Sum Program Enter main i = 1 while(i < 11) printf(i) Call add x in = iy in = 1i = x out Enter add x = x in y = y in x = x + yx out = x

60 CFL-Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A context-free language L-path from s to t iff Running time: O(N 3 )

61 Interprocedural Slicing via CFL-Reachability Graph: System dependence graph L: L(matched) [roughly] Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

62 Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94] [Reps, Horwitz, Sagiv, & Rosay 94] CFL-reachability –System dependence graph: N nodes, E edges –Running time: O(N 3 ) System dependence graph Special structure Running time: O(E + CallSites % MaxParams 3 )

63 ( e [ e ] e [ e ]] e ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability s t s ( eeeeee[[[ t ) ]]] s t s t Ordinary Graph Reachability

64 CFL-Reachability via Dynamic Programming Grammar Graph B C A A  B C

65 st Degenerate Case: CFL-Recognition “(a + b) * c”  L(exp) ? exp  id | exp + exp | exp * exp | ( exp ) ) (acb+*

66 * a++)bc st Degenerate Case: CFL-Recognition “a + b) * c +”  L(exp) ? exp  id | exp + exp | exp * exp | ( exp )

67 CYK: Context-Free Recognition  = “( [ ] ) [ ]” Is   L(M)? M  M M | ( M ) | [ M ] | ( ) | [ ]

68 CYK: Context-Free Recognition M  M M | ( M ) | [ M ] | ( ) | [ ] M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M

69 Is “( [ ] ) [ ]”  L(M)? ( [ ] ) [ ]      {M}   {LPM} { ( }{ [ }{ ) }{ ] }{ [ }{ ] } length startstart M  [ ] LPM  ( M

70 Is “( [ ] ) [ ]”  L(M)? ( [ ] ) [ ]    {M}     {LPM} { (}{ [ }{ ) }{ ] }{ [ }{ ] } length startstart M? M  M M

71 CYK: Graphs vs. Tables Is “( [ ] ) [ ]”  L(M)? st ( [ ] ) [ ] M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M MM LPM M M

72 CFL-Reachability via Dynamic Programming Grammar Graph B C A A  B C

73 Dynamic Transitive Closure ?! Aiken et al. –Set-constraint solvers –Points-to analysis Henglein et al. –type inference But a CFL captures a non-transitive reachability relation [Valiant 75]

74 S T Program Chopping Given source S and target T, what program points transmit effects from S to T? Intersect forward slice from S with backward slice from T, right?

75 Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

76 int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } Forward slice with respect to “sum = 0” Non-Transitivity and Slicing int add(int x, int y) { return x + y; }

77 Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

78 Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

79 Forward slice with respect to “sum = 0” Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)” 

80 Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Chop with respect to “sum = 0” and “printf(“%d\n”,i)” 

81 Non-Transitivity and Slicing Enter main sum = 0i = 1 while(i < 11) printf(sum) printf(i) Call add x in = sum y in = i sum = x out x in = iy in = 1i = x out Enter add x = x in y = y in x = x + yx out = x ( ]

82 Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]

83 CF-Recognition vs. CFL-Reachability CF-Recognition –Chain graphs –General grammar: sub-cubic time [Valiant75] –LL(1), LR(1): linear time CFL-Reachability –General graphs: O(N 3 ) –LL(1): O(N 3 ) –LR(1): O(N 3 ) –Certain kinds of graphs: O(N+E) –Regular languages: O(N+E) Gen/kill IDFA GMOD IDFA

84 Regular-Language Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A regular language L-path from s to t iff Running time: O(N+E) Ordinary reachability (= transitive closure) –Label each edge with e –L is e* vs. O(N 3 )

85 Security of Crypto-Based Protocols for Distributed System “Ping-pong” protocols (1) X —Encrypt Y (M X)  Y (2) Y —Encrypt X (M)  X [Dolev & Yao 83] –O(N 8 ) algorithm [Dolev, Even, & Karp 83] –Less well known than [Dolev & Yao 83] –O(N 3 ) algorithm

86 [Dolev, Even, & Karp 83] Id  Encrypt X Id Decrypt X Id  Decrypt X Id Encrypt X Id ... Id ? Message Saboteur EYEY EYEY AXAX AZAZ

87 Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

88 Relationship to Other Analysis Paradigms Dataflow analysis –reachability versus equation solving Deduction Set constraints

89 1987 1993 1994 1995 1997 1998 1996 Slicing & Applications Dataflow Analysis Demand Algorithms Set Constraints Structure- Transmitted Dependences CFL Reachability Dataflow Analysis Demand Algorithms

90 Dataflow Analysis Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution Examples –Constant propagation –Reaching definitions –Live variables –Possibly uninitialized variables

91 Useful For... Optimizing compilers Parallelizing compilers Tools that detect possible logical errors Tools that show the effects of a proposed modification

92 Possibly Uninitialized Variables Startx = 3 if... y = x y = w w = 8 printf(y) {w,x,y} {w,y} {w} {w,y} {} {w,y} {}

93 Precise Intraprocedural Analysis start n C

94 x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p ( ) ] (

95 Precise Interprocedural Analysis start n C ret () [Sharir & Pnueli 81]

96 Representing Dataflow Functions Identity Function Constant Function a bc a bc

97 Representing Dataflow Functions “Gen/Kill” Function Non-“Gen/Kill” Function a bc a bc

98 x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b

99 a bcbc a Composing Dataflow Functions bc a

100 x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) Might b be uninitialized here? printf(b) NO! ( ] Might y be uninitialized here? YES! ( )

101 matched  matched matched | ( i matched ) i 1  i  CallSites | edge |  stack ) ( ( ( ( ( ) ) ) ) ( ) Off Limits!

102 ) ( ( ( ( ( ) ) ) ( ) ( stack ( ( unbalLeft  matched unbalLeft | ( i unbalLeft 1  i  CallSites |  stack Off Limits!

103 Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

104 Asymptotic Running Time [Reps, Horwitz, & Sagiv 95] [Reps, Horwitz, & Sagiv 95] CFL-reachability –Exploded control-flow graph: ND nodes –Running time: O(N 3 D 3 ) Exploded control-flow graph Special structure Running time: O(ED 3 ) Typically: E l N, hence O(ED 3 ) l O(ND 3 ) “Gen/kill” problems: O(ED)

105 Why Bother? “We’re only interested in million-line programs” Know thy enemy! –“ Any” algorithm must do these operations –Avoid pitfalls (e.g., claiming O(N 2 ) algorithm) The essence of “context sensitivity” Special cases –“Gen/kill” problems: O(ED) Compression techniques –Basic blocks –SSA form, sparse evaluation graphs Demand algorithms

106 Relationship to Other Analysis Paradigms Dataflow analysis –reachability versus equation solving Deduction Set constraints

107 The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); } int add(int x, int y) { return x + y; }

108 The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); } int add(int x, int y) { return x + y; }

109 The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; }

110 Flow-Sensitive Points-To Analysis p = &q; p = q; p = *q; *p = q; pq p r1r1 r2r2 q r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2 pq p r1r1 r2r2 q r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2

111 Flow-Sensitive  Flow-Insensitive start main exit main 33 22 11 44 55 33 22 11 44 55

112 Flow-Insensitive Points-To Analysis [Andersen 94, Shapiro & Horwitz 97] p = &q; p = q; p = *q; *p = q; pq p r1r1 r2r2 q r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2

113 Flow-Insensitive Points-To Analysis a = &e; b = a; c = &f; *b = c; d = *a; a d b c f e

114 Flow-Insensitive Points-To Analysis Andersen [Thesis 94] –Formulated using set constraints –Cubic-time algorithm Shapiro & Horwitz (1995; [POPL 97] ) –Re-formulated as a graph-grammar problem Reps (1995; [unpublished] ) –Re-formulated as a Horn-clause program Melski (1996; see [Reps, IST98] ) –Re-formulated via CFL-reachability

115 CFL-Reachability via Dynamic Programming Grammar Graph B C A A  B C

116 CFL-Reachability = Chain Programs Grammar A  B C Graph B C a(X,Z) :- b(X,Y), c(Y,Z). z x y A

117 Base Facts for Points-To Analysis p = &q; p = q; p = *q; *p = q; assignAddr(p,q). assign(p,q). assignStar(p,q). starAssign(p,q).

118 Rules for Points-To Analysis (I) pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). p = &q; pq p = q; p r1r1 r2r2 q

119 Rules for Points-To Analysis (II) pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). p = *q; r1r1 r2r2 q s1s1 s2s2 s3s3 p *p = q; p s1s1 s2s2 q r1r1 r2r2

120 Rules for Points-To Analysis (II) pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). p = *q; r1r1 r2r2 q s1s1 s2s2 s3s3 p *p = q; p s1s1 s2s2 q r1r1 r2r2 pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S).

121 Creating a Chain Program pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). *p = q; p s1s1 s2s2 q r1r1 r2r2 pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,P) :- pointsTo(P,R).

122 Base Facts for Points-To Analysis p = &q; p = q; p = *q; *p = q; assignAddr(p,q). assign(p,q). assignStar(p,q). starAssign(p,q). starAssign(q,p). assignStar(q,p). assign(q,p). assignAddr(q,p).

123 Creating a Chain Program pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(Q,P) :- assignAddr(Q,P). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P). pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R). pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P).

124 ... and now to CFL-Reachability pointsTo  assign pointsTo pointsTo  assignStar pointsTo pointsTo pointsTo  assignAddr pointsTo  pointsTo starAssign pointsTo pointsTo  pointsTo pointsTo assignStar pointsTo  pointsTo starAssign pointsTo pointsTo  pointsTo assign

125 Points-To Analysis as CFL-Reachability: Consequences Points-to analysis solvable in time cubic in the number of variables –Known previously [Andersen 94] Demand algorithms: –What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem –What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

126 Relationship to Other Analysis Paradigms Dataflow analysis –reachability versus equation solving Deduction Set constraints

127 1987 1993 1994 1995 1997 1998 1996 Slicing & Applications Dataflow Analysis Demand Algorithms Set Constraints Structure- Transmitted Dependences CFL Reachability Structure- Transmitted Dependences Set Constraints

128 Structure-Transmitted Dependences [Reps1995] [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w yx

129 Set Constraints w = cons(x,y); v = car(w); McCarthy’s Equations Revisited Semantics of Set Constraints

130 CFL-Reachability versus Set Constraints Lazy languages: CFL-reachability is more natural –car(cons(X,Y)) = X Strict languages: Set constraints are more natural –car(cons(X,Y)) = X, provided I(Y) g v But... SC and CFL-reachability are equivalent! –[Melski & Reps 97][Melski & Reps 97]

131 Solving Set Constraints X is “inhabited” Y is “inhabited” W is “inhabited”

132 W Simulating “Inhabited” inhab a

133 W Y X Simulating “Inhabited” inhab

134 V W Y X Simulating “Provided I(Y) g v ” inhab provided I(Y) g v

135 SC = CFL-Reachability: Consequences Demand algorithm for SC SC is log-space complete for PTIME –Limitations on ability to parallelize algorithms for solving set-constraint problems

136 Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

137 Exhaustive Versus Demand Analysis Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest

138 Exhaustive Versus Demand Analysis Demand analysis: –Does a given fact hold at a given point? –Which facts hold at a given point? –At which points does a given fact hold? Demand analysis via CFL-reachability –single-source/single-target CFL-reachability –single-source/multi-target CFL-reachability –multi-source/single-target CFL-reachability

139 x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b YES! ( ) NO! “Semi-exhaustive”: All “appropriate” demands Might y be uninitialized here? Might b be uninitialized here?

140 Experimental Results [Horwitz, Reps, & Sagiv 1995] [Horwitz, Reps, & Sagiv 1995] 53 C programs (200-6,700 lines) For a single fact of interest: –demand always better than exhaustive All “appropriate” demands beats exhaustive when percentage of “yes” answers is high –Live variables –Truly live variables –Constant predicates –...

141 A Related Result [Sagiv, Reps, & Horwitz 1996] [Sagiv, Reps, & Horwitz 1996] [Uses a generalized analysis technique] 38 C programs (300-6,000 lines) –copy-constant propagation –linear-constant propagation All “appropriate” demands always beats exhaustive –factor of 1.14 to about 6

142 Exhaustive Versus Demand Analysis Demand algorithms for –Interprocedural dataflow analysis –Set constraints –Points-to analysis

143 Demand Analysis and LP Queries (I) Flow-insensitive points-to analysis –Does variable p point to q? Issue query: ?- pointsTo(p, q). Solve single-source/single-target L(pointsTo)- reachability problem –What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem –What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

144 Demand Analysis and LP Queries (II) Flow-sensitive analysis –Does a given fact f hold at a given point p? ?- dfFact(p, f). –Which facts hold at a given point p? ?- dfFact(p, F). –At which points does a given fact f hold? ?- dfFact(P, f). E.g., flow-sensitive points-to analysis ?- dfFact(p, pointsTo(x, Y)). ?- dfFact(P, pointsTo(x, y)). etc.

145 Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

146 Interprocedural Backward Slice Enter main Call p Enter p [ ] ) (

147 x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) printf(b) y may be uninitialized here [ ] ) (

148 Structure-Transmitted Dependences [Reps1995] [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w yx

149 Dependences + Matched Paths? Enter main Enter p w=cons(x,y) Call p w v = car(w) w w x y hd hd -1 ( ) tl [ ]

150 Undecidable! [Reps, TOPLAS 00] hd hd -1 () Interleaved Parentheses!

151 Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

152 CFL-Reachability via Dynamic Programming Grammar Graph B C A A  B C

153 Beyond CFL-Reachability: Composition of Linear Functions x.3x+5 x.2x+1 x.6x+11 ( x.2x+1 )  ( x.3x+5 ) = x.6x+11

154 Beyond CFL-Reachability: Composition of Linear Functions Interprocedural constant propagation –[Sagiv, Reps, & Horwitz TCS 96] Interprocedural path profiling –The number of path fragments contributed by a procedure is a function –[Melski & Reps CC 99]

155 Ball-Larus Intraprocedural Path Profiling Counting paths in the CFG Exit w1w1 w2w2 wkwk v NumPathsToExit(v) =  NumPathsToExit(w) w  succ(v) NumPathsToExit(Exit) = 1

156 Melski-Reps Interprocedural Path Profiling  Exit(P) = x. x Exit vertex  GExit(P) = x.1 GExit vertex  c =  Exit(Q)   r Call vertex to Q with return vertex r w  succ(v)  v =   w Otherwise Sharir-Pnueli Interprocedural Dataflow Analysis  Exit(P) = x. x Exit vertex  c =  Exit(Q)   r Call vertex to Q with return vertex r w  succ(v)  v =   w Otherwise

157 Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)] Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs –T-reachability/circularity queries Recursive HFSMs –Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity –Single-entry/multi-exit [or multi-entry/single-exit] –Deterministic, multi-entry/multi-exit

158 T-Cyclicity in Hierarchical Kripke Structures SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k| 3 ) rec: ? SN/SXSN/MXMN/SXMN/MX O(|k|) O(|k|) O(|k|)O(|k| 3 ) O(|k||t|) [lin rec] O(|k|) [det]

159 Recursive HFSMs: Data Complexity SN/SX SN/MX MN/SXMN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL * O(|k| 2 ) [L 2 ] bad ? bad

160 Recursive HFSMs: Data Complexity SN/SXSN/MXMN/SXMN/MX LTL O(|k|)O(|k|)O(|k|)O(|k| 3 ) O(|k||t|) [lin rec] O(|k|) [det] CTL O(|k|)badO(|k|)bad CTL * O(|k|)badO(|k|)bad Not Dual Problems!

161 CFL-Reachability: Scope of Applicability Static analysis –Slicing, DFA, structure-transmitted dep., points-to analysis Verification –Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] –Model-checking recursive HFSMs Formal-language theory –CF-, 2DPDA-, 2NPDA-recognition –Attribute-grammar analysis

162 CFL-Reachability: Benefits Algorithms –Exhaustive & demand Complexity –Linear-time and cubic-time algorithms –PTIME -completeness –Variants that are undecidable Complementary to –Equations –Set constraints –Types –...

163 But...  Model checking –Huge graphs (10 100 reachable states) –Reachability/circularity queries –Represent implicitly (OBDDs) Dataflow analysis –Large graphs e.g., Stmts  Vars (  10 11 ) –CFL-reachability queries [Reps,Horwitz,Sagiv 95] –OBDDs blew up [Siff & Reps 95 (unpub.)]... yes, we tried the usual tricks...

164 Most Significant Contributions: 1987-2000 Asymptotically fastest algorithms –Interprocedural slicing –Interprocedural dataflow analysis Demand algorithms –Interprocedural dataflow analysis [CC94,FSE95] –All “appropriate” demands beats exhaustive Tool for slicing and browsing ANSI C –Slices programs as large as 75,000 lines –University research distribution –Commercial product: CodeSurfer (GrammaTech, Inc.)

165 Most Significant Contributions: 1987-2000 Unifying conceptual model –[Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88],... Identifies fundamental bottlenecks –Cubic-time “barrier” –Litmus test: quadratic-time algorithm?! –PTIME -complete  limits to parallelizability Existence proofs for new algorithms –Demand algorithm for set constraints –Demand algorithm for points-to analysis

166 References Papers by Reps and collaborators: – http://www.cs.wisc.edu/~reps/ http://www.cs.wisc.edu/~reps/ CFL-reachability –Yannakakis, M., Graph-theoretic methods in database theory, PODS 90. –Reps, T., Program analysis via graph reachability, Inf. and Softw. Tech. 98.Program analysis via graph reachability

167 References Slicing, chopping, etc. –Horwitz, Reps, & Binkley, TOPLAS 90TOPLAS 90 –Reps, Horwitz, Sagiv, & Rosay, FSE 94FSE 94 –Reps & Rosay, FSE 95FSE 95 Dataflow analysis –Reps, Horwitz, & Sagiv, POPL 95POPL 95 –Horwitz, Reps, & Sagiv, FSE 95, TR-1283FSE 95TR-1283 Structure dependences; set constraints –Reps, PEPM 95PEPM 95 –Melski & Reps, Theor. Comp. Sci. 00Theor. Comp. Sci. 00

168 References Complexity –Undecidability: Reps, TOPLAS 00?TOPLAS 00? –PTIME -completeness: Reps, Acta Inf. 96.Acta Inf. 96 Verification –Dolev, Even, & Karp, Inf & Control 82. –Benedikt, Godefroid, & Reps, In prep. Beyond CFL-reachability –Sagiv, Reps, Horwitz, Theor. Comp. Sci 96Theor. Comp. Sci 96 –Melski & Reps, CC 99, TR-1382CC 99TR-1382

169 Automatic Differentiation

170 double F(double x) { int i; double ans = 1.0; for(i = 1; i <= n; i++) { ans = ans * f[i](x); } return ans; } double delta =...; /* small constant */ double F’(double x) { return (F(x+delta) - F(x)) / delta; }

171 Automatic Differentiation double F (double x) { int i; double ans = 1.0; for(i = 1; i <= n; i++) { ans = ans * f[i](x); } return ans’; }

172 Automatic Differentiation double F’(double x) { int i; double ans’ = 0.0; double ans = 1.0; for(i = 1; i <= n; i++) { ans’ = ans * f’[i](x) + ans’ * f[i](x); ans = ans * f[i](x); } return ans’; }

173 Automatic Differentiation x1x1 xixi xmxm y1y1 y j+1 ynyn x2x2 x i+1 y2y2 yjyj x2x2 y2y2 yjyj Program Chopping


Download ppt "Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000"

Similar presentations


Ads by Google