Download presentation

Presentation is loading. Please wait.

Published byAugustus Marshall Modified about 1 year ago

1
Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000 http://www.cs.wisc.edu/~reps/

2
PLDI 00 Registration Form PLDI 00: …………………….. $ ____ Tutorial (morning): …………… $ ____ Tutorial (afternoon): ………….. $ ____ Tutorial (evening): ……………. $ – 0 –

3
Applications Program optimization Program-understanding and software-reengineering Security –information flow Verification –model checking –security of crypto-based protocols for distributed systems

4
1987 1993 1994 1995 1997 1998 1996 Slicing & Applications Dataflow Analysis Demand Algorithms Set Constraints Structure- Transmitted Dependences CFL Reachability

5
... As Well As... Flow-insensitive points-to analysis Complexity results –Linear... cubic... undecidable variants –PTIME -completeness Model checking of recursive hierarchical finite-state machines –“infinite”-state systems –linear-time and cubic-time algorithms

6
... And Also Analysis of attribute grammars Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Formal-language problems –CFL-recognition (given G and , is L(G)?) –2DPDA- and 2NPDA-simulation Given M and , is L(M)? String-matching problems

7
Unifying Conceptual Model for Dataflow-Analysis Literature Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] Linear-time bidirectional gen-kill [Dhamdhere 94] Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]

8
Collaborators Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski David Binkley Michael Benedikt Patrice Godefroid

9
Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg. Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

10
Program Slicing The backward slice w.r.t variable v at program point p The program subset that may influence the value of variable v at point p. The forward slice w.r.t variable v at program point p The program subset that may be influenced by the value of variable v at point p.

11
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Backward Slice Backward slice with respect to “printf(“%d\n”,i)”

12
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Backward Slice Backward slice with respect to “printf(“%d\n”,i)”

13
int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); } Slice Extraction Backward slice with respect to “printf(“%d\n”,i)”

14
Forward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Forward slice with respect to “sum = 0”

15
Forward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); }

16
Who Cares About Slices? Understanding programs Restructuring Programs Program Specialization and Reuse Program Differencing Testing (and Retesting) Year 2000 Problem Automatic Differentiation

17
What Are Slices Useful For? Understanding Programs –What is affected by what? Restructuring Programs –Isolation of separate “computational threads” Program Specialization and Reuse –Slices = specialized programs –Only reuse needed slices Program Differencing –Compare slices to identify changes Testing –What new test cases would improve coverage? –What regression tests must be rerun after a change?

18
Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line (FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

19
Character-Count Program void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line (FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

20
Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line (FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

21
Line-Count Program void line_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line2 (FILE *f, BOOL *bptr, int *iptr); scan_line2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; scan_line2(f, &eof_flag, &n); chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars); }

22
Specialization Via Slicing wc -lc wc -c wc -l void line_count(FILE *f); Not partial evaluation!

23
How are Slices Computed? Reachability in a Dependence Graph –Program Dependence Graph (PDG) Dependences within one procedure Intraprocedural slicing is reachability in one PDG –System Dependence Graph (SDG) Dependences within entire system Interprocedural slicing is reachability in the SDG

24
How is a PDG Created? Control Flow Graph (CFG) PDG is union of: Control Dependence Graph Flow Dependence Graph computed from CFG

25
Control Flow Graph Enter sum = 0i = 1 while(i < 11) printf(sum)printf(i) sum = sum + ii = i + i T F int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); }

26
Flow Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0printf(sum) printf(i) sum = sum + ii = i + i Flow dependence pq Value of variable assigned at p may be used at q. i = 1 while(i < 11)

27
q is reached from p if condition p is true (T), not otherwise. Control Dependence Graph Control dependence pq T pq F Similar for false (F). Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T T T T int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); }

28
Program Dependence Graph (PDG) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T Control dependence Flow dependence T T T

29
Program Dependence Graph (PDG) int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T T T T Opposite Order Same PDG

30
Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0i = 1 while(i < 11) printf(sum) printf(i) sum = sum + ii = i + i T T T T T T T T

31
Backward Slice (2) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i T T T T T T T T

32
Backward Slice (3) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i T T T T T T T T

33
Backward Slice (4) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); } Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i T T T T T T T T

34
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); } Enter i = 1 while(i < 11) printf(i) i = i + i T T T T T

35
CodeSurfer

36

37

38

39
Browsing a Dependence Graph Pretend this is your favorite browser What does clicking on a link do? You get a new page Or you move to an internal tag

40

41

42

43
Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

44
Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

45
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } Interprocedural Slice int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

46
Each PDG has nodes for –entry point –procedure parameters and function result Each call site has nodes for –call –arguments and function result Appropriate edges –entry node to parameters –call node to arguments –call node to entry node –arguments to parameters How is an SDG Created?

47
System Dependence Graph (SDG) Enter main Call p Enter p

48
SDG for the Sum Program Enter main sum = 0i = 1 while(i < 11) printf(sum) printf(i) Call add x in = sum y in = i sum = x out x in = iy in = 1i = x out Enter add x = x in y = y in x = x + yx out = x

49
Interprocedural Backward Slice Enter main Call p Enter p

50
Interprocedural Backward Slice (2) Enter main Call p Enter p

51
Interprocedural Backward Slice (3) Enter main Call p Enter p

52
Interprocedural Backward Slice (4) Enter main Call p Enter p

53
Interprocedural Backward Slice (5) Enter main Call p Enter p

54
Interprocedural Backward Slice (6) Enter main Call p Enter p [ ] ) (

55
Matched-Parenthesis Path ) ( ) [

56
Interprocedural Backward Slice (6) Enter main Call p Enter p

57
Interprocedural Backward Slice (7) Enter main Call p Enter p

58
Slice Extraction Enter main Call p Enter p

59
Slice of the Sum Program Enter main i = 1 while(i < 11) printf(i) Call add x in = iy in = 1i = x out Enter add x = x in y = y in x = x + yx out = x

60
CFL-Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A context-free language L-path from s to t iff Running time: O(N 3 )

61
Interprocedural Slicing via CFL-Reachability Graph: System dependence graph L: L(matched) [roughly] Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

62
Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94] [Reps, Horwitz, Sagiv, & Rosay 94] CFL-reachability –System dependence graph: N nodes, E edges –Running time: O(N 3 ) System dependence graph Special structure Running time: O(E + CallSites % MaxParams 3 )

63
( e [ e ] e [ e ]] e ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability s t s ( eeeeee[[[ t ) ]]] s t s t Ordinary Graph Reachability

64
CFL-Reachability via Dynamic Programming Grammar Graph B C A A B C

65
st Degenerate Case: CFL-Recognition “(a + b) * c” L(exp) ? exp id | exp + exp | exp * exp | ( exp ) ) (acb+*

66
* a++)bc st Degenerate Case: CFL-Recognition “a + b) * c +” L(exp) ? exp id | exp + exp | exp * exp | ( exp )

67
CYK: Context-Free Recognition = “( [ ] ) [ ]” Is L(M)? M M M | ( M ) | [ M ] | ( ) | [ ]

68
CYK: Context-Free Recognition M M M | ( M ) | [ M ] | ( ) | [ ] M M M | LPM ) | LBM ] | ( ) | [ ] LPM ( M LBM [ M

69
Is “( [ ] ) [ ]” L(M)? ( [ ] ) [ ] {M} {LPM} { ( }{ [ }{ ) }{ ] }{ [ }{ ] } length startstart M [ ] LPM ( M

70
Is “( [ ] ) [ ]” L(M)? ( [ ] ) [ ] {M} {LPM} { (}{ [ }{ ) }{ ] }{ [ }{ ] } length startstart M? M M M

71
CYK: Graphs vs. Tables Is “( [ ] ) [ ]” L(M)? st ( [ ] ) [ ] M M M | LPM ) | LBM ] | ( ) | [ ] LPM ( M LBM [ M MM LPM M M

72
CFL-Reachability via Dynamic Programming Grammar Graph B C A A B C

73
Dynamic Transitive Closure ?! Aiken et al. –Set-constraint solvers –Points-to analysis Henglein et al. –type inference But a CFL captures a non-transitive reachability relation [Valiant 75]

74
S T Program Chopping Given source S and target T, what program points transmit effects from S to T? Intersect forward slice from S with backward slice from T, right?

75
Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

76
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } Forward slice with respect to “sum = 0” Non-Transitivity and Slicing int add(int x, int y) { return x + y; }

77
Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

78
Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

79
Forward slice with respect to “sum = 0” Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

80
Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; } Chop with respect to “sum = 0” and “printf(“%d\n”,i)”

81
Non-Transitivity and Slicing Enter main sum = 0i = 1 while(i < 11) printf(sum) printf(i) Call add x in = sum y in = i sum = x out x in = iy in = 1i = x out Enter add x = x in y = y in x = x + yx out = x ( ]

82
Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]

83
CF-Recognition vs. CFL-Reachability CF-Recognition –Chain graphs –General grammar: sub-cubic time [Valiant75] –LL(1), LR(1): linear time CFL-Reachability –General graphs: O(N 3 ) –LL(1): O(N 3 ) –LR(1): O(N 3 ) –Certain kinds of graphs: O(N+E) –Regular languages: O(N+E) Gen/kill IDFA GMOD IDFA

84
Regular-Language Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A regular language L-path from s to t iff Running time: O(N+E) Ordinary reachability (= transitive closure) –Label each edge with e –L is e* vs. O(N 3 )

85
Security of Crypto-Based Protocols for Distributed System “Ping-pong” protocols (1) X —Encrypt Y (M X) Y (2) Y —Encrypt X (M) X [Dolev & Yao 83] –O(N 8 ) algorithm [Dolev, Even, & Karp 83] –Less well known than [Dolev & Yao 83] –O(N 3 ) algorithm

86
[Dolev, Even, & Karp 83] Id Encrypt X Id Decrypt X Id Decrypt X Id Encrypt X Id ... Id ? Message Saboteur EYEY EYEY AXAX AZAZ

87
Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg. Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

88
Relationship to Other Analysis Paradigms Dataflow analysis –reachability versus equation solving Deduction Set constraints

89
1987 1993 1994 1995 1997 1998 1996 Slicing & Applications Dataflow Analysis Demand Algorithms Set Constraints Structure- Transmitted Dependences CFL Reachability Dataflow Analysis Demand Algorithms

90
Dataflow Analysis Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution Examples –Constant propagation –Reaching definitions –Live variables –Possibly uninitialized variables

91
Useful For... Optimizing compilers Parallelizing compilers Tools that detect possible logical errors Tools that show the effects of a proposed modification

92
Possibly Uninitialized Variables Startx = 3 if... y = x y = w w = 8 printf(y) {w,x,y} {w,y} {w} {w,y} {} {w,y} {}

93
Precise Intraprocedural Analysis start n C

94
x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p ( ) ] (

95
Precise Interprocedural Analysis start n C ret () [Sharir & Pnueli 81]

96
Representing Dataflow Functions Identity Function Constant Function a bc a bc

97
Representing Dataflow Functions “Gen/Kill” Function Non-“Gen/Kill” Function a bc a bc

98
x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b

99
a bcbc a Composing Dataflow Functions bc a

100
x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) Might b be uninitialized here? printf(b) NO! ( ] Might y be uninitialized here? YES! ( )

101
matched matched matched | ( i matched ) i 1 i CallSites | edge | stack ) ( ( ( ( ( ) ) ) ) ( ) Off Limits!

102
) ( ( ( ( ( ) ) ) ( ) ( stack ( ( unbalLeft matched unbalLeft | ( i unbalLeft 1 i CallSites | stack Off Limits!

103
Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

104
Asymptotic Running Time [Reps, Horwitz, & Sagiv 95] [Reps, Horwitz, & Sagiv 95] CFL-reachability –Exploded control-flow graph: ND nodes –Running time: O(N 3 D 3 ) Exploded control-flow graph Special structure Running time: O(ED 3 ) Typically: E l N, hence O(ED 3 ) l O(ND 3 ) “Gen/kill” problems: O(ED)

105
Why Bother? “We’re only interested in million-line programs” Know thy enemy! –“ Any” algorithm must do these operations –Avoid pitfalls (e.g., claiming O(N 2 ) algorithm) The essence of “context sensitivity” Special cases –“Gen/kill” problems: O(ED) Compression techniques –Basic blocks –SSA form, sparse evaluation graphs Demand algorithms

106
Relationship to Other Analysis Paradigms Dataflow analysis –reachability versus equation solving Deduction Set constraints

107
The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); } int add(int x, int y) { return x + y; }

108
The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); } int add(int x, int y) { return x + y; }

109
The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); } int add(int x, int y) { return x + y; }

110
Flow-Sensitive Points-To Analysis p = &q; p = q; p = *q; *p = q; pq p r1r1 r2r2 q r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2 pq p r1r1 r2r2 q r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2

111
Flow-Sensitive Flow-Insensitive start main exit main 33 22 11 44 55 33 22 11 44 55

112
Flow-Insensitive Points-To Analysis [Andersen 94, Shapiro & Horwitz 97] p = &q; p = q; p = *q; *p = q; pq p r1r1 r2r2 q r1r1 r2r2 q s1s1 s2s2 s3s3 p p s1s1 s2s2 q r1r1 r2r2

113
Flow-Insensitive Points-To Analysis a = &e; b = a; c = &f; *b = c; d = *a; a d b c f e

114
Flow-Insensitive Points-To Analysis Andersen [Thesis 94] –Formulated using set constraints –Cubic-time algorithm Shapiro & Horwitz (1995; [POPL 97] ) –Re-formulated as a graph-grammar problem Reps (1995; [unpublished] ) –Re-formulated as a Horn-clause program Melski (1996; see [Reps, IST98] ) –Re-formulated via CFL-reachability

115
CFL-Reachability via Dynamic Programming Grammar Graph B C A A B C

116
CFL-Reachability = Chain Programs Grammar A B C Graph B C a(X,Z) :- b(X,Y), c(Y,Z). z x y A

117
Base Facts for Points-To Analysis p = &q; p = q; p = *q; *p = q; assignAddr(p,q). assign(p,q). assignStar(p,q). starAssign(p,q).

118
Rules for Points-To Analysis (I) pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). p = &q; pq p = q; p r1r1 r2r2 q

119
Rules for Points-To Analysis (II) pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). p = *q; r1r1 r2r2 q s1s1 s2s2 s3s3 p *p = q; p s1s1 s2s2 q r1r1 r2r2

120
Rules for Points-To Analysis (II) pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). p = *q; r1r1 r2r2 q s1s1 s2s2 s3s3 p *p = q; p s1s1 s2s2 q r1r1 r2r2 pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S).

121
Creating a Chain Program pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). *p = q; p s1s1 s2s2 q r1r1 r2r2 pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,P) :- pointsTo(P,R).

122
Base Facts for Points-To Analysis p = &q; p = q; p = *q; *p = q; assignAddr(p,q). assign(p,q). assignStar(p,q). starAssign(p,q). starAssign(q,p). assignStar(q,p). assign(q,p). assignAddr(q,p).

123
Creating a Chain Program pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(Q,P) :- assignAddr(Q,P). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P). pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R). pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P).

124
... and now to CFL-Reachability pointsTo assign pointsTo pointsTo assignStar pointsTo pointsTo pointsTo assignAddr pointsTo pointsTo starAssign pointsTo pointsTo pointsTo pointsTo assignStar pointsTo pointsTo starAssign pointsTo pointsTo pointsTo assign

125
Points-To Analysis as CFL-Reachability: Consequences Points-to analysis solvable in time cubic in the number of variables –Known previously [Andersen 94] Demand algorithms: –What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem –What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

126
Relationship to Other Analysis Paradigms Dataflow analysis –reachability versus equation solving Deduction Set constraints

127
1987 1993 1994 1995 1997 1998 1996 Slicing & Applications Dataflow Analysis Demand Algorithms Set Constraints Structure- Transmitted Dependences CFL Reachability Structure- Transmitted Dependences Set Constraints

128
Structure-Transmitted Dependences [Reps1995] [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w yx

129
Set Constraints w = cons(x,y); v = car(w); McCarthy’s Equations Revisited Semantics of Set Constraints

130
CFL-Reachability versus Set Constraints Lazy languages: CFL-reachability is more natural –car(cons(X,Y)) = X Strict languages: Set constraints are more natural –car(cons(X,Y)) = X, provided I(Y) g v But... SC and CFL-reachability are equivalent! –[Melski & Reps 97][Melski & Reps 97]

131
Solving Set Constraints X is “inhabited” Y is “inhabited” W is “inhabited”

132
W Simulating “Inhabited” inhab a

133
W Y X Simulating “Inhabited” inhab

134
V W Y X Simulating “Provided I(Y) g v ” inhab provided I(Y) g v

135
SC = CFL-Reachability: Consequences Demand algorithm for SC SC is log-space complete for PTIME –Limitations on ability to parallelize algorithms for solving set-constraint problems

136
Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg. Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

137
Exhaustive Versus Demand Analysis Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest

138
Exhaustive Versus Demand Analysis Demand analysis: –Does a given fact hold at a given point? –Which facts hold at a given point? –At which points does a given fact hold? Demand analysis via CFL-reachability –single-source/single-target CFL-reachability –single-source/multi-target CFL-reachability –multi-source/single-target CFL-reachability

139
x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b YES! ( ) NO! “Semi-exhaustive”: All “appropriate” demands Might y be uninitialized here? Might b be uninitialized here?

140
Experimental Results [Horwitz, Reps, & Sagiv 1995] [Horwitz, Reps, & Sagiv 1995] 53 C programs (200-6,700 lines) For a single fact of interest: –demand always better than exhaustive All “appropriate” demands beats exhaustive when percentage of “yes” answers is high –Live variables –Truly live variables –Constant predicates –...

141
A Related Result [Sagiv, Reps, & Horwitz 1996] [Sagiv, Reps, & Horwitz 1996] [Uses a generalized analysis technique] 38 C programs (300-6,000 lines) –copy-constant propagation –linear-constant propagation All “appropriate” demands always beats exhaustive –factor of 1.14 to about 6

142
Exhaustive Versus Demand Analysis Demand algorithms for –Interprocedural dataflow analysis –Set constraints –Points-to analysis

143
Demand Analysis and LP Queries (I) Flow-insensitive points-to analysis –Does variable p point to q? Issue query: ?- pointsTo(p, q). Solve single-source/single-target L(pointsTo)- reachability problem –What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem –What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

144
Demand Analysis and LP Queries (II) Flow-sensitive analysis –Does a given fact f hold at a given point p? ?- dfFact(p, f). –Which facts hold at a given point p? ?- dfFact(p, F). –At which points does a given fact f hold? ?- dfFact(P, f). E.g., flow-sensitive points-to analysis ?- dfFact(p, pointsTo(x, Y)). ?- dfFact(P, pointsTo(x, y)). etc.

145
Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg. Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

146
Interprocedural Backward Slice Enter main Call p Enter p [ ] ) (

147
x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) printf(b) y may be uninitialized here [ ] ) (

148
Structure-Transmitted Dependences [Reps1995] [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w yx

149
Dependences + Matched Paths? Enter main Enter p w=cons(x,y) Call p w v = car(w) w w x y hd hd -1 ( ) tl [ ]

150
Undecidable! [Reps, TOPLAS 00] hd hd -1 () Interleaved Parentheses!

151
Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg. Demand alg. Understanding complexity –Linear... cubic... undecidable Beyond CFL-reachability

152
CFL-Reachability via Dynamic Programming Grammar Graph B C A A B C

153
Beyond CFL-Reachability: Composition of Linear Functions x.3x+5 x.2x+1 x.6x+11 ( x.2x+1 ) ( x.3x+5 ) = x.6x+11

154
Beyond CFL-Reachability: Composition of Linear Functions Interprocedural constant propagation –[Sagiv, Reps, & Horwitz TCS 96] Interprocedural path profiling –The number of path fragments contributed by a procedure is a function –[Melski & Reps CC 99]

155
Ball-Larus Intraprocedural Path Profiling Counting paths in the CFG Exit w1w1 w2w2 wkwk v NumPathsToExit(v) = NumPathsToExit(w) w succ(v) NumPathsToExit(Exit) = 1

156
Melski-Reps Interprocedural Path Profiling Exit(P) = x. x Exit vertex GExit(P) = x.1 GExit vertex c = Exit(Q) r Call vertex to Q with return vertex r w succ(v) v = w Otherwise Sharir-Pnueli Interprocedural Dataflow Analysis Exit(P) = x. x Exit vertex c = Exit(Q) r Call vertex to Q with return vertex r w succ(v) v = w Otherwise

157
Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)] Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs –T-reachability/circularity queries Recursive HFSMs –Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity –Single-entry/multi-exit [or multi-entry/single-exit] –Deterministic, multi-entry/multi-exit

158
T-Cyclicity in Hierarchical Kripke Structures SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k| 3 ) rec: ? SN/SXSN/MXMN/SXMN/MX O(|k|) O(|k|) O(|k|)O(|k| 3 ) O(|k||t|) [lin rec] O(|k|) [det]

159
Recursive HFSMs: Data Complexity SN/SX SN/MX MN/SXMN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL * O(|k| 2 ) [L 2 ] bad ? bad

160
Recursive HFSMs: Data Complexity SN/SXSN/MXMN/SXMN/MX LTL O(|k|)O(|k|)O(|k|)O(|k| 3 ) O(|k||t|) [lin rec] O(|k|) [det] CTL O(|k|)badO(|k|)bad CTL * O(|k|)badO(|k|)bad Not Dual Problems!

161
CFL-Reachability: Scope of Applicability Static analysis –Slicing, DFA, structure-transmitted dep., points-to analysis Verification –Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] –Model-checking recursive HFSMs Formal-language theory –CF-, 2DPDA-, 2NPDA-recognition –Attribute-grammar analysis

162
CFL-Reachability: Benefits Algorithms –Exhaustive & demand Complexity –Linear-time and cubic-time algorithms –PTIME -completeness –Variants that are undecidable Complementary to –Equations –Set constraints –Types –...

163
But... Model checking –Huge graphs (10 100 reachable states) –Reachability/circularity queries –Represent implicitly (OBDDs) Dataflow analysis –Large graphs e.g., Stmts Vars ( 10 11 ) –CFL-reachability queries [Reps,Horwitz,Sagiv 95] –OBDDs blew up [Siff & Reps 95 (unpub.)]... yes, we tried the usual tricks...

164
Most Significant Contributions: 1987-2000 Asymptotically fastest algorithms –Interprocedural slicing –Interprocedural dataflow analysis Demand algorithms –Interprocedural dataflow analysis [CC94,FSE95] –All “appropriate” demands beats exhaustive Tool for slicing and browsing ANSI C –Slices programs as large as 75,000 lines –University research distribution –Commercial product: CodeSurfer (GrammaTech, Inc.)

165
Most Significant Contributions: 1987-2000 Unifying conceptual model –[Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88],... Identifies fundamental bottlenecks –Cubic-time “barrier” –Litmus test: quadratic-time algorithm?! –PTIME -complete limits to parallelizability Existence proofs for new algorithms –Demand algorithm for set constraints –Demand algorithm for points-to analysis

166
References Papers by Reps and collaborators: – http://www.cs.wisc.edu/~reps/ http://www.cs.wisc.edu/~reps/ CFL-reachability –Yannakakis, M., Graph-theoretic methods in database theory, PODS 90. –Reps, T., Program analysis via graph reachability, Inf. and Softw. Tech. 98.Program analysis via graph reachability

167
References Slicing, chopping, etc. –Horwitz, Reps, & Binkley, TOPLAS 90TOPLAS 90 –Reps, Horwitz, Sagiv, & Rosay, FSE 94FSE 94 –Reps & Rosay, FSE 95FSE 95 Dataflow analysis –Reps, Horwitz, & Sagiv, POPL 95POPL 95 –Horwitz, Reps, & Sagiv, FSE 95, TR-1283FSE 95TR-1283 Structure dependences; set constraints –Reps, PEPM 95PEPM 95 –Melski & Reps, Theor. Comp. Sci. 00Theor. Comp. Sci. 00

168
References Complexity –Undecidability: Reps, TOPLAS 00?TOPLAS 00? –PTIME -completeness: Reps, Acta Inf. 96.Acta Inf. 96 Verification –Dolev, Even, & Karp, Inf & Control 82. –Benedikt, Godefroid, & Reps, In prep. Beyond CFL-reachability –Sagiv, Reps, Horwitz, Theor. Comp. Sci 96Theor. Comp. Sci 96 –Melski & Reps, CC 99, TR-1382CC 99TR-1382

169
Automatic Differentiation

170
double F(double x) { int i; double ans = 1.0; for(i = 1; i <= n; i++) { ans = ans * f[i](x); } return ans; } double delta =...; /* small constant */ double F’(double x) { return (F(x+delta) - F(x)) / delta; }

171
Automatic Differentiation double F (double x) { int i; double ans = 1.0; for(i = 1; i <= n; i++) { ans = ans * f[i](x); } return ans’; }

172
Automatic Differentiation double F’(double x) { int i; double ans’ = 0.0; double ans = 1.0; for(i = 1; i <= n; i++) { ans’ = ans * f’[i](x) + ans’ * f[i](x); ans = ans * f[i](x); } return ans’; }

173
Automatic Differentiation x1x1 xixi xmxm y1y1 y j+1 ynyn x2x2 x i+1 y2y2 yjyj x2x2 y2y2 yjyj Program Chopping

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google