Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.

Similar presentations


Presentation on theme: "Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft."— Presentation transcript:

1 Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft Research University of Washington UC Berkeley

2 Motivation Static analysis for program verification Complex dataflow analyses are popular –SLAM, ESP, BLAST, CQual, … –Flow-Sensitive –Interprocedural –Expensive! Cut down on “data flow facts” Without losing anything important

3 General Idea If complex analysis is worse than O(N) And you have a cheap analysis that –Is O(N) –Reduces N Then composing them saves time

4 Value Flow Graph (VFG) Variant of a points-to graph Encodes the flow of values in the program Conservative approximation Lightweight, fast to compute and query Early queries can safely reduce –data-flow facts considered –program points considered Like slicing a program wrt. value flow

5 Computing a VFG Use a subtyping-based pointer analysis –We used One-Level Flow [Das] Process all assignments –Not just those involving pointers Represent constant values explicitly –Put them in the graph Label graph with source locations –Encodes program slices

6 Example Points-To Graph 1: int a, *x; 2: x = &a; 3: *x = 7; x a Points-to Edge Source “Address” Node Expr Node x

7 One Level Flow Graph 1: int a, *x; 2: x = &a; 3: *x = 7; x a Flow Edge Points-to Edge Source “Address” Node Expr Node x

8 Value Flow Graph 1: int a, *x; 2: x = &a; 3: *x = 7; 7 x a 2 3 2 2,3 2 Flow Edge Points-to Edge Source “Address” Node Expr Node x

9 VFG Properties Computed in almost-linear time Get points-to sets from VFG in linear time –Backwards reachability via flow edges –Gather up all variables Get value flow from VFG in linear time –Backwards reachability via flow edges –Follow points-to edges up one

10 VFG Query: Points-To of x 1: int a, *x; 2: x = &a; 3: *x = 7; 7 x a 2 3 2 2,3 2 Flow Edge Points-to Edge Source “Address” Node Expr Node x

11 VFG Query: Value Flow into a 1: int a, *x; 2: x = &a; 3: *x = 7; 7 x a 2 3 2 2,3 2 Flow Edge Points-to Edge Source “Address” Node Expr Node x

12 VFG Summary Computed in almost-linear time Queries complete in linear time Approximates flow of values in program Show two applications that benefit –ESP –SLAM

13 Application 1: ESP Verification tool for large C++ programs Tracks “typestate” of values –Encoded as Finite State Machine –Special Error state Core: interprocedural data-flow engine –Flow sensitive: state at every point Performed bottom-up on call graph Requires function summaries

14 ESP Function Summaries Consider stateful memory locations Summarize function behavior for each loc –Reducing number of locs would be good! –But C has evil casts, so types cannot be used Worst case set of locations: –All globals and formal parameters –Everything transitively reachable from there

15 Reduce Location Set Location L needs to be considered in F if –Some exp E has its state changed in F –Value held by L at entry to F can flow into E Assuming state-changing ops are known Query VFG to find values that flow in

16 ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } Locations to consider for foo() summary: { e, *e, f, *f, g, *g, h, *h }

17 ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } (1) Compute VFG (2) Query value flow on *p (3) Reduced locations to consider for foo() summary: { e, f } (4) Reduce lines to consider for dataflow

18 ESP Results FILE * output in GCC –140 KLOC, 2149 functions, 66 files, 1068 globals VFG Queries take 200 seconds Reduce average number of locations per function summary from 1100 to <1 –Median of 15 for functions with >0 Verification takes 15 minutes –Infeasible otherwise

19 Application 2: SLAM Validates temporal safety properties –Boolean abstraction –Interprocedural dataflow analysis –Counterexample-driven refinement Convert C program to Boolean program Exhaustive dataflow analysis –No errors? Program is safe. –Real error? Program has a bug. –False error? Add predicates, repeat.

20 Boolean Programs int x,y; x = 5; y = 6; x = x * 2; y = y * 2; assert(x<y) bool p,q; p = 1; q = 1; p = 0; q = 0; q = 1; assert(q) p means “x == 5” q means “x < y” C Program Predicates (important!) Boolean Program

21 SLAM Predicates Hard to come up with good predicates Counterexample-driven refinement –Picks good predicates –Is very slow Taking all possible predicates –Is even slower Want “all the useful” predicates

22 Speeding Up SLAM For a simple subset of C –Similar to “Copy Constants” –Use VFG to find a sufficient set of predicates –Provably sufficient for this subset If this set fails to prove the real program –Fall back on counterexample-driven refinement

23 A Simple Language s ::= v i = n// constants | v i = v j // variable copy | if (*) s 1 else s 2 // condition ignored | v i = fun(v j, …)// function call | return(v i )// function return | assert(v i  v j )// safety property

24 Predicate Discovery High-level idea –Each flow edge in the VFG means “values may flow from X to Y” –Add predicates to see if they do For each assert(v i  v j ) –Consider the chain of values flowing to v i, v j –Add an equality predicate for each link –Use constants to resolve scoping

25 SLAM Example int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b 4c2

26 Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b Predicates: b == r r == 3 r == f f == a a == 1

27 Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b Predicates: b == r r == 3 r == f f == a // no scope! a == 1

28 Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b Predicates:b == rr == 3r == f f == a // no scope!f == 1 f == 3 a == 1a == 1 a == 3

29 Why does this work? Simple language –No arithmetic, etc. –Just copying around initial values Knowing final values of variables –Completely decides safety condition Still related to real life –Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.

30 Some SLAM Results ProgramLOCOriginal Runtime Improved Runtime Generated Predicates Missing Predicates apmbatt2207229s22s850 pnpmem38491132s125s1434 floppy75621063s600s15433 iscsiprt4543**729s14642 Generated predicates are between all and two-thirds of the necessary predicates. However, since SLAM must iterate once to generate 3-7 missing predicates, the net performance increase is more than linear. Predicates can be specialized or simplified if the assert() condition is a common relational operator (e.g., x==y, x<y, x==5).

31 Conclusions Complex interprocedural analyses can benefit from inexpensive value-flow VFG encodes value flow –Constructed and queried quickly Prune the set of dataflow facts and program points considered Large net performance increase


Download ppt "Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft."

Similar presentations


Ads by Google