Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis via Graph Reachability

Similar presentations


Presentation on theme: "Program Analysis via Graph Reachability"— Presentation transcript:

1 Program Analysis via Graph Reachability
Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000

2 PLDI 00 Registration Form
Tutorial (morning): …………… $ ____ Tutorial (afternoon): ………….. $ ____ Tutorial (evening): ……………. $ – 0 –

3 Applications Program optimization
Program-understanding and software-reengineering Security information flow Verification model checking security of crypto-based protocols for distributed systems

4 1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

5 . . . As Well As . . . Flow-insensitive points-to analysis
Complexity results Linear cubic undecidable variants PTIME-completeness Model checking of recursive hierarchical finite-state machines “infinite”-state systems linear-time and cubic-time algorithms

6 . . . And Also Analysis of attribute grammars
Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Formal-language problems CFL-recognition (given G and , is   L(G)?) 2DPDA- and 2NPDA-simulation Given M and , is   L(M)? String-matching problems

7 Unifying Conceptual Model for Dataflow-Analysis Literature
Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] Linear-time bidirectional gen-kill [Dhamdhere 94] Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]

8 Collaborators Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski
David Binkley Michael Benedikt Patrice Godefroid

9 Themes Harnessing CFL-reachability
Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear cubic undecidable Beyond CFL-reachability

10 Backward slice with respect to “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

11 Backward slice with respect to “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

12 Backward slice with respect to “printf(“%d\n”,i)”
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

13 Forward slice with respect to “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

14 Forward slice with respect to “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

15 What Are Slices Useful For?
Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

16 Line-Character-Count Program
void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

17 Character-Count Program
void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

18 Line-Character-Count Program
void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

19 Line-Count Program void line_count(FILE *f) { int lines = 0;
int chars; BOOL eof_flag = FALSE; int n; extern void scan_line2(FILE *f, BOOL *bptr, int *iptr); scan_line2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

20 Specialization Via Slicing
wc -lc wc -c wc -l Not partial evaluation! void line_count(FILE *f);

21 Control Flow Graph int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

22 Flow Dependence Graph Flow dependence p q Value of variable
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

23 Control Dependence Graph
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

24 Program Dependence Graph (PDG)
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

25 Program Dependence Graph (PDG)
int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

26 Backward Slice int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

27 Backward Slice (2) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

28 Backward Slice (3) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

29 Backward Slice (4) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

30 Slice Extraction int main() { int i = 1; while (i < 11) {
i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

31 CodeSurfer

32

33 Browsing a Dependence Graph
Pretend this is your favorite browser What does clicking on a link do? You get a new page Or you move to an internal tag

34

35

36 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

37 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

38 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

39 System Dependence Graph (SDG)
Enter main Call p Call p Enter p

40 SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1
Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

41 Interprocedural Backward Slice
Enter main Call p Call p Enter p

42 Interprocedural Backward Slice (2)
Enter main Call p Call p Enter p

43 Interprocedural Backward Slice (3)
Enter main Call p Call p Enter p

44 Interprocedural Backward Slice (4)
Enter main Call p Call p Enter p

45 Interprocedural Backward Slice (5)
Enter main Call p Call p Enter p

46 Interprocedural Backward Slice (6)
Enter main Call p Call p ) ( [ ] Enter p

47 Matched-Parenthesis Path
) ( ) [

48 Interprocedural Backward Slice (6)
Enter main Call p Call p Enter p

49 Interprocedural Backward Slice (7)
Enter main Call p Call p Enter p

50 Slice Extraction Enter main Call p Enter p

51 Slice of the Sum Program
Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

52 CFL-Reachability [Yannakakis 90]
G: Graph (N nodes, E edges) L: A context-free language L-path from s to t iff Running time: O(N 3)

53 Interprocedural Slicing via CFL-Reachability
Graph: System dependence graph L: L(matched) [roughly] Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

54 Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]
CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

55 Ordinary Graph Reachability
( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

56 CFL-Reachability via Dynamic Programming
Graph Grammar A  B C B C A

57 Degenerate Case: CFL-Recognition
exp  id | exp + exp | exp * exp | ( exp ) “(a + b) * c”  L(exp) ? ) ( a c b + * s t

58 Degenerate Case: CFL-Recognition
exp  id | exp + exp | exp * exp | ( exp ) “a + b) * c +”  L(exp) ? * a + ) b c s t

59 CYK: Context-Free Recognition
M  M M | ( M ) | [ M ] | ( ) | [ ]  = “( [ ] ) [ ]” Is   L(M)?

60 CYK: Context-Free Recognition
M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M M  M M | ( M ) | [ M ] | ( ) | [ ]

61 Is “( [ ] ) [ ]”  L(M)? length start M  [ ] LPM  ( M ( [ ] ) [ ]
( [ ] ) [ ] start { ( } { [ } { ] } { [ } { ) } { ] } LPM  ( M M  [ ] {M} {M} {LPM} {M}

62  Is “( [ ] ) [ ]”  L(M)? length start M  M M ( [ ] ) [ ] { (} { [ }
( [ ] ) [ ] start { (} { [ } { ] } { [ } { ) } { ] } M  M M {M} {M} {LPM} {M} M? {M}

63  CYK: Graphs vs. Tables Is “( [ ] ) [ ]”  L(M)? s t ( [ ] ) [ ] M
( [ ] ) [ ] M LPM M M M M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M

64 CFL-Reachability via Dynamic Programming
Graph Grammar A  B C B C A

65 Dynamic Transitive Closure ?!
Aiken et al. Set-constraint solvers Points-to analysis Henglein et al. type inference But a CFL captures a non-transitive reachability relation [Valiant 75]

66 Program Chopping Given source S and target T, what program points transmit effects from S to T? S T Intersect forward slice from S with backward slice from T, right?

67 Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

68 Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

69 Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

70 Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

71 Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0” Backward slice with respect to “printf(“%d\n”,i)”

72 Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Chop with respect to “sum = 0” and “printf(“%d\n”,i)”

73 Non-Transitivity and Slicing
Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout ( ] Enter add x = xin y = yin x = x + y xout = x

74 “Precise interprocedural chopping”
Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]

75 CF-Recognition vs. CFL-Reachability
Chain graphs General grammar: sub-cubic time [Valiant75] LL(1), LR(1): linear time CFL-Reachability General graphs: O(N3) LL(1): O(N3) LR(1): O(N3) Certain kinds of graphs: O(N+E) Regular languages: O(N+E) Gen/kill IDFA GMOD IDFA

76 Regular-Language Reachability [Yannakakis 90]
G: Graph (N nodes, E edges) L: A regular language L-path from s to t iff Running time: O(N+E) Ordinary reachability (= transitive closure) Label each edge with e L is e* vs. O(N3)

77 Security of Crypto-Based Protocols for Distributed System
“Ping-pong” protocols (1) X —EncryptY(M X) Y (2) Y —EncryptX(M) X [Dolev & Yao 83] O(N8) algorithm [Dolev, Even, & Karp 83] Less well known than [Dolev & Yao 83] O(N3) algorithm

78 [Dolev, Even, & Karp 83] Id  EncryptX Id DecryptX
Id  DecryptX Id EncryptX Id  Message Saboteur EY AX AZ Id ?

79 Themes Harnessing CFL-reachability
Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear cubic undecidable Beyond CFL-reachability

80 Relationship to Other Analysis Paradigms
Dataflow analysis reachability versus equation solving Deduction Set constraints

81 Dataflow Analysis Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution Examples Constant propagation Reaching definitions Live variables Possibly uninitialized variables

82 Useful For . . . Optimizing compilers Parallelizing compilers
Tools that detect possible logical errors Tools that show the effects of a proposed modification

83 Possibly Uninitialized Variables
{} Start {w,x,y} x = 3 {w,y} if . . . {w,y} y = x {w,y} y = w {w} w = 8 {w,y} {} printf(y) {w,y}

84 Precise Intraprocedural Analysis
start n

85 ( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b)
return from p return from p printf(y) printf(b) exit main exit p

86 Precise Interprocedural Analysis
ret start n ( ) [Sharir & Pnueli 81]

87 Representing Dataflow Functions
b c Identity Function a b c Constant Function

88 Representing Dataflow Functions
b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

89 x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

90 Composing Dataflow Functions
b c a b c a a b c

91 ( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3
Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

92 Off Limits! matched  matched matched
| (i matched )i  i  CallSites | edge |  stack ) ( stack Off Limits!

93 Off Limits! unbalLeft  matched unbalLeft
| (i unbalLeft  i  CallSites |  stack ) ( stack Off Limits! (

94 Interprocedural Dataflow Analysis via CFL-Reachability
Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

95 Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]
CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)

96 Why Bother? “We’re only interested in million-line programs”
Know thy enemy! “Any” algorithm must do these operations Avoid pitfalls (e.g., claiming O(N2) algorithm) The essence of “context sensitivity” Special cases “Gen/kill” problems: O(ED) Compression techniques Basic blocks SSA form, sparse evaluation graphs Demand algorithms

97 Relationship to Other Analysis Paradigms
Dataflow analysis reachability versus equation solving Deduction Set constraints

98 The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); int add(int x, int y) { return x + y; }

99 The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); int add(int x, int y) { return x + y; }

100 The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; }

101 Flow-Sensitive Points-To Analysis
q p = &q; p q p r1 r2 q p r1 r2 q p = q; r1 r2 q s1 s2 s3 p r1 r2 q s1 s2 s3 p p = *q; p s1 s2 q r1 r2 p s1 s2 q r1 r2 *p = q;

102 Flow-Sensitive  Flow-Insensitive
start main exit main 3 2 1 4 5 3 2 1 4 5

103 Flow-Insensitive Points-To Analysis [Andersen 94, Shapiro & Horwitz 97]
p = &q; p q p r1 r2 q p = q; r1 r2 q s1 s2 s3 p p = *q; p s1 s2 q r1 r2 *p = q;

104 Flow-Insensitive Points-To Analysis
a = &e; b = a; c = &f; *b = c; d = *a; e b c f d

105 Flow-Insensitive Points-To Analysis
Andersen [Thesis 94] Formulated using set constraints Cubic-time algorithm Shapiro & Horwitz (1995; [POPL 97]) Re-formulated as a graph-grammar problem Reps (1995; [unpublished]) Re-formulated as a Horn-clause program Melski (1996; see [Reps, IST98]) Re-formulated via CFL-reachability

106 CFL-Reachability via Dynamic Programming
Graph Grammar A  B C B C A

107 CFL-Reachability = Chain Programs
Graph Grammar A  B C x y B C z A a(X,Z) :- b(X,Y), c(Y,Z).

108 Base Facts for Points-To Analysis
p = &q; assignAddr(p,q). p = q; assign(p,q). p = *q; assignStar(p,q). *p = q; starAssign(p,q).

109 Rules for Points-To Analysis (I)
p = &q; p q pointsTo(P,Q) :- assignAddr(P,Q). p = q; p r1 r2 q pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).

110 Rules for Points-To Analysis (II)
p = *q; r1 r2 q s1 s2 s3 p pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). *p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).

111 Creating a Chain Program
*p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,P) :- pointsTo(P,R).

112 Base Facts for Points-To Analysis
p = &q; assignAddr(p,q). assignAddr(q,p). p = q; assign(p,q). assign(q,p). p = *q; assignStar(p,q). assignStar(q,p). *p = q; starAssign(p,q). starAssign(q,p).

113 Creating a Chain Program
pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(Q,P) :- assignAddr(Q,P). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P). pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R).

114 . . . and now to CFL-Reachability
pointsTo  assign pointsTo pointsTo  assignStar pointsTo pointsTo pointsTo  assignAddr pointsTo  pointsTo starAssign pointsTo pointsTo  pointsTo pointsTo assignStar pointsTo  pointsTo assign

115 Relationship to Other Analysis Paradigms
Dataflow analysis reachability versus equation solving Deduction Set constraints

116 1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences Set Constraints Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

117 Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

118 Set Constraints w = cons(x,y); v = car(w);
McCarthy’s Equations Revisited Semantics of Set Constraints

119 CFL-Reachability versus Set Constraints
Lazy languages: CFL-reachability is more natural car(cons(X,Y)) = X Strict languages: Set constraints are more natural car(cons(X,Y)) = X, provided I(Y) g v But SC and CFL-reachability are equivalent! [Melski & Reps 97]

120 Solving Set Constraints
W is “inhabited” X is “inhabited” Y is “inhabited” W is “inhabited” Y is “inhabited” X is “inhabited”

121 Simulating “Inhabited”
W

122 Simulating “Inhabited”
X Y W inhab

123 Simulating “Provided I(Y) g v”
inhab X Y W provided I(Y) g v V

124 Themes Harnessing CFL-reachability
Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear cubic undecidable Beyond CFL-reachability

125 Exhaustive Versus Demand Analysis
Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest

126 Exhaustive Versus Demand Analysis
Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold? Demand analysis via CFL-reachability single-source/single-target CFL-reachability single-source/multi-target CFL-reachability multi-source/single-target CFL-reachability

127 All “appropriate” demands
x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

128 Experimental Results [Horwitz , Reps, & Sagiv 1995]
53 C programs (200-6,700 lines) For a single fact of interest: demand always better than exhaustive All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

129 A Related Result [Sagiv, Reps, & Horwitz 1996]
[Uses a generalized analysis technique] 38 C programs (300-6,000 lines) copy-constant propagation linear-constant propagation All “appropriate” demands always beats exhaustive factor of 1.14 to about 6

130 Exhaustive Versus Demand Analysis
Demand algorithms for Interprocedural dataflow analysis Set constraints Points-to analysis

131 Demand Analysis and LP Queries (I)
Flow-insensitive points-to analysis Does variable p point to q? Issue query: ?- pointsTo(p, q). Solve single-source/single-target L(pointsTo)-reachability problem What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

132 Demand Analysis and LP Queries (II)
Flow-sensitive analysis Does a given fact f hold at a given point p? ?- dfFact(p, f). Which facts hold at a given point p? ?- dfFact(p, F). At which points does a given fact f hold? ?- dfFact(P, f). E.g., flow-sensitive points-to analysis ?- dfFact(p, pointsTo(x, Y)). ?- dfFact(P, pointsTo(x, y)). etc.

133 Themes Harnessing CFL-reachability
Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear cubic undecidable Beyond CFL-reachability

134 Interprocedural Backward Slice
Enter main Call p Call p [ ] ) ( Enter p

135 ( [ ) ] x y start p(a,b) a b start main if . . . x = 3 b = a p(x,y)
return from p return from p y may be uninitialized here printf(y) printf(b) exit main exit p

136 Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

137 Dependences + Matched Paths?
Enter main x y hd hd-1 [ ] tl w=cons(x,y) Call p Call p w w ( ) Enter p w v = car(w)

138 Undecidable! [Reps, TOPLAS 00]
hd hd-1 ( ) Interleaved Parentheses!

139 Themes Harnessing CFL-reachability
Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear cubic undecidable Beyond CFL-reachability

140 CFL-Reachability via Dynamic Programming
Graph Grammar A  B C B C A

141 Beyond CFL-Reachability: Composition of Linear Functions
x.3x+5 x.2x+1 x.6x+11 (x.2x+1)  (x.3x+5) = x.6x+11

142 Beyond CFL-Reachability: Composition of Linear Functions
Interprocedural constant propagation [Sagiv, Reps, & Horwitz TCS 96] Interprocedural path profiling The number of path fragments contributed by a procedure is a function [Melski & Reps CC 99]

143 Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]
Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit

144 T-Cyclicity in Hierarchical Kripke Structures
SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k|3) rec: ? SN/SX SN/MX MN/SX MN/MX O(|k|) O(|k|) O(|k|) O(|k|3) O(|k||t|) [lin rec] O(|k|) [det]

145 Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL* O(|k|2) [L2] bad ? bad

146 Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k|3) O(|k||t|) [lin rec] O(|k|) [det] CTL O(|k|) bad O(|k|) bad CTL* O(|k|) bad O(|k|) bad Not Dual Problems!

147 CFL-Reachability: Scope of Applicability
Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Verification Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Model-checking recursive HFSMs Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis

148 CFL-Reachability: Benefits
Algorithms Exhaustive & demand Complexity Linear-time and cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .

149 Most Significant Contributions: 1987-2000
Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms Interprocedural dataflow analysis [CC94,FSE95] All “appropriate” demands beats exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 75,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.)

150 Most Significant Contributions: 1987-2000
Unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks Cubic-time “barrier” Litmus test: quadratic-time algorithm?! PTIME-complete  limits to parallelizability Existence proofs for new algorithms Demand algorithm for set constraints Demand algorithm for points-to analysis

151 References Papers by Reps and collaborators: CFL-reachability
CFL-reachability Yannakakis, M., Graph-theoretic methods in database theory, PODS 90. Reps, T., Program analysis via graph reachability, Inf. and Softw. Tech. 98.

152 References Slicing, chopping, etc. Dataflow analysis
Horwitz, Reps, & Binkley, TOPLAS 90 Reps, Horwitz, Sagiv, & Rosay, FSE 94 Reps & Rosay, FSE 95 Dataflow analysis Reps, Horwitz, & Sagiv, POPL 95 Horwitz, Reps, & Sagiv, FSE 95, TR-1283 Structure dependences; set constraints Reps, PEPM 95 Melski & Reps, Theor. Comp. Sci. 00

153 References Complexity Verification Beyond CFL-reachability
Undecidability: Reps, TOPLAS 00? PTIME-completeness: Reps, Acta Inf. 96. Verification Dolev, Even, & Karp, Inf & Control 82. Benedikt, Godefroid, & Reps, In prep. Beyond CFL-reachability Sagiv, Reps, Horwitz, Theor. Comp. Sci 96 Melski & Reps, CC 99, TR-1382


Download ppt "Program Analysis via Graph Reachability"

Similar presentations


Ads by Google