Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis via Graph Reachability

Similar presentations


Presentation on theme: "Program Analysis via Graph Reachability"— Presentation transcript:

1 Program Analysis via Graph Reachability
Thomas Reps University of Wisconsin Joint work with S. Horwitz, M. Sagiv, G. Rosay, D. Melski, M. Benedikt, and P. Godefroid

2 1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

3 More Recently Flow-insensitive points-to analysis
An undecidability result context-sensitive structure-transmitted data-dependence analysis Model checking of recursive hierarchical finite-state machines “infinite”-state systems CFL-reachability/circularity queries linear-, quadratic-, and cubic-time algorithms

4 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs

5 Backward slice with respect to statement “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

6 Backward slice with respect to statement “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

7 Backward slice with respect to statement “printf(“%d\n”,i)”
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

8 Forward slice with respect to statement “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”

9 Forward slice with respect to statement “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”

10 What Are Slices Useful For?
Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

11 Control Flow Graph int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

12 Flow Dependence Graph Flow dependence p q Value of variable
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

13 Control Dependence Graph
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

14 Program Dependence Graph (PDG)
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

15 Program Dependence Graph (PDG)
int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

16 Backward Slice int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

17 Backward Slice (2) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

18 Backward Slice (3) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

19 Backward Slice (4) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

20 Slice Extraction int main() { int i = 1; while (i < 11) {
i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

21 CodeSurfer

22

23

24

25 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”

26 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”

27 Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

28 System Dependence Graph (SDG)
Enter main Call p Call p Enter p

29 SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1
Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

30 Interprocedural Backward Slice
Enter main Call p Call p Enter p

31 Interprocedural Backward Slice (2)
Enter main Call p Call p Enter p

32 Interprocedural Backward Slice (3)
Enter main Call p Call p Enter p

33 Interprocedural Backward Slice (4)
Enter main Call p Call p Enter p

34 Interprocedural Backward Slice (5)
Enter main Call p Call p Enter p

35 Interprocedural Backward Slice (6)
Enter main Call p Call p ) ( [ ] Enter p

36 Matched-Parenthesis Path
) ( ) [

37 Interprocedural Backward Slice (6)
Enter main Call p Call p Enter p

38 Interprocedural Backward Slice (7)
Enter main Call p Call p Enter p

39 Slice Extraction Enter main Call p Enter p

40 Slice of the Sum Program
Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

41 CFL-Reachability [Yannakakis 90]
G: Graph L: A context-free language L-path from s to t iff Running time: O(N 3)

42 Ordinary Graph Reachability
( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

43 CFL-Reachability via Dynamic Programming
Graph Grammar A B C B C A

44 Interprocedural Slicing via CFL-Reachability
Graph: System dependence graph L: L(matched) Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

45 Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]
CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

46 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs

47 Precise Intraprocedural Analysis
start n

48 ( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b)
return from p return from p printf(y) printf(b) exit main exit p

49 Precise Interprocedural Analysis
ret start n ( ) [Sharir & Pnueli 81]

50 Representing Dataflow Functions
b c Identity Function a b c Constant Function

51 Representing Dataflow Functions
b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

52 x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

53 Composing Dataflow Functions
b c a b c a a b c

54 ( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3
Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

55 Off Limits! matched  matched matched
| (i matched )i  i  CallSites | edge |  stack ) ( stack Off Limits!

56 Off Limits! unbalLeft  matched unbalLeft
| (i unbalLeft  i  CallSites |  stack ) ( stack Off Limits! (

57 Interprocedural Dataflow Analysis via CFL-Reachability
Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

58 Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]
CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)

59 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs

60 Exhaustive Versus Demand Analysis
Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest Demand analysis: Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold?

61 All “appropriate” demands
x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

62 Experimental Results [Horwitz , Reps, & Sagiv 1995]
53 C programs (200-6,700 lines) For a single fact of interest: Demand algorithm always better than exhaustive algorithm All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

63 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs

64 Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

65 Dependences + Matched Paths?
Enter main x y hd hd-1 ( ) w=cons(x,y) Call p Call p w w Enter p w v = car(w)

66 Undecidable! [Reps, TOPLAS 00]
hd hd-1 ( ) Interleaved Parentheses!

67 Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs

68 Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]
Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit

69 T-Cyclicity in Hierarchical Kripke Structures
SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k3|) rec: ? SN/SX SN/MX MN/SX MN/MX O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det]

70 Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL* O(|k2|) [L2] bad ? bad

71 Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det] CTL O(|k|) bad O(|k|) bad CTL* O(|k|) bad O(|k|) bad Not Dual Problems!

72 CFL-Reachability: Scope of Applicability
Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis Verification Model-checking recursive HFSMs “Ping-pong” protocols [Dolev, Even, & Karp 83]

73 CFL-Reachability: Benefits
Algorithms Demand & exhaustive Complexity Linear-, quadratic-, cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .

74 But . . .  Model checking Dataflow analysis
Huge graphs (10100 reachable states) Reachability/circularity queries Represent implicitly (OBDDs) Dataflow analysis Large graphs e.g., Stmts Vars ( 1011) CFL-reachability queries [Reps,Horwitz,Sagiv 95] OBDDs blew up [Siff & Reps 95 (unpub.)] . . . yes, we tried the usual tricks . . .

75 Most Significant Contributions: 1987-2000
Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms All “appropriate” demands may beat exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 60,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.) CFL-reachability as unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks (e.g., cubic-time “barrier”)


Download ppt "Program Analysis via Graph Reachability"

Similar presentations


Ads by Google