Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis via Graph Reachability

Similar presentations


Presentation on theme: "Program Analysis via Graph Reachability"— Presentation transcript:

1 Program Analysis via Graph Reachability
Thomas Reps University of Wisconsin Joint work with S. Horwitz, M. Sagiv, G. Rosay, and D. Melski

2 1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

3 More Recently Flow-insensitive points-to analysis
An undecidability result context-sensitive structure-transmitted data-dependence analysis Model checking of recursive hierarchical finite-state machines “infinite”-state systems CFL-reachability/circularity queries linear-, quadratic-, and cubic-time algorithms

4 Other Applications of CFL-Reachability
Analysis of attribute grammars CFL-recognition   L(G)? 2DPDA- and 2NPDA-recognition   L(M)? String-matching problems “Ping-pong” protocols in distributed systems [Dolev, Even, & Karp 83]

5 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis (Model-checking of recursive HFSMs)

6 Backward slice with respect to statement “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

7 Backward slice with respect to statement “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

8 Forward slice with respect to statement “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”

9 Forward slice with respect to statement “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”

10 What Are Slices Useful For?
Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

11 Control Flow Graph int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

12 Control Dependence Graph
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

13 Flow Dependence Graph Flow dependence p q Value of variable
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

14 Program Dependence Graph (PDG)
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

15 Program Dependence Graph (PDG)
int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

16 Backward Slice int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

17 Backward Slice (2) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

18 Backward Slice (3) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

19 Backward Slice (4) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

20 Slice Extraction int main() { int i = 1; while (i < 11) {
i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

21 CodeSurfer

22

23

24

25 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”

26 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”

27 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

28 System Dependence Graph (SDG)
Enter main Call p Call p Enter p

29 SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1
Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

30 Interprocedural Backward Slice
Enter main Call p Call p Enter p

31 Interprocedural Backward Slice (2)
Enter main Call p Call p Enter p

32 Interprocedural Backward Slice (3)
Enter main Call p Call p Enter p

33 Interprocedural Backward Slice (4)
Enter main Call p Call p Enter p

34 Interprocedural Backward Slice (5)
Enter main Call p Call p Enter p

35 Interprocedural Backward Slice (6)
Enter main Call p Call p ) ( [ ] Enter p

36 Matched-Parenthesis Path
) ( ) [

37 Interprocedural Backward Slice (6)
Enter main Call p Call p Enter p

38 Interprocedural Backward Slice (7)
Enter main Call p Call p Enter p

39 Slice Extraction Enter main Call p Enter p

40 Slice of the Sum Program
Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

41 CFL-Reachability [Yannakakis 90]
G: Graph L: A context-free language L-path from s to t iff Running time: O(N 3)

42 Ordinary Graph Reachability
( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

43 Degenerate Case: CFL-Recognition
“(a + b) * c” L(exp) ? exp id | exp + exp | exp * exp | ( exp ) “a + b) * c +” L(exp) ? * a + ) b c ) ( a c b + * s t

44 CFL-Reachability via Dynamic Programming
Graph Grammar A B C B C A

45 Interprocedural Slicing via CFL-Reachability
Graph: System dependence graph L: L(matched) Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

46 Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]
CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

47 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis (Model-checking of recursive HFSMs)

48 Possibly Uninitialized Variables
{} Start {w,x,y} x = 3 {w,y} if . . . {w,y} y = x {w,y} y = w {w} w = 8 {w,y} {} printf(y) {w,y}

49 Precise Intraprocedural Analysis
start n

50 ( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b)
return from p return from p printf(y) printf(b) exit main exit p

51 Precise Interprocedural Analysis
ret start n ( ) [Sharir & Pnueli 81]

52 Representing Dataflow Functions
b c Identity Function a b c Constant Function

53 Representing Dataflow Functions
b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

54 x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

55 Composing Dataflow Functions
b c a b c a a b c

56 ( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3
Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

57 Interprocedural Dataflow Analysis via CFL-Reachability
Graph: Exploded control-flow graph L: L(matched) Fact d holds at n iff there is an L(matched)-path from

58 Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]
CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N “Gen/kill” problems: O(ED)

59 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis (Model-checking of recursive HFSMs)

60 Exhaustive Versus Demand Analysis
Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest Demand analysis: Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold?

61 All “appropriate” demands
x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

62 Experimental Results [Horwitz , Reps, & Sagiv 1995]
53 C programs (200-6,700 lines) For a single fact of interest: Demand algorithm always better than exhaustive algorithm All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

63 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis (Model-checking of recursive HFSMs)

64 Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]
Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit

65 Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL PTIME ? ? ? O(|k|) [non-rec] CTL O(|k|) PSPACE-comp ? bad CTL* O(|k2|) [L2] bad ? bad

66 Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det] CTL O(|k|) PSPACE-comp O(|k|) bad CTL* O(|k|) bad O(|k|) bad

67 CFL-Reachability: Scope of Applicability
Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis Verification Model-checking recursive HFSMs “Ping-pong” protocols [Dolev, Even, & Karp 83]

68 CFL-Reachability: Benefits
Algorithms Demand & exhaustive Complexity Linear-, quadratic-, cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .

69 Most Significant Contributions: 1987-2000
Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms All “appropriate” demands may beat exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 60,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.) CFL-reachability as unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks (e.g., cubic-time “barrier”)


Download ppt "Program Analysis via Graph Reachability"

Similar presentations


Ads by Google