Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Intermediate Code Generation
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Flow insensitive pointer analysis: fixed S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
Symbolic execution © Marcelo d’Amorim 2010.
Introducing BLAST Software Verification John Gallagher CS4117.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
ECE 103 Engineering Programming Chapter 11 One Minute Synopsis Herbert G. Mayer, PSU CS Status 7/1/2014.
BLAST-A Model Checker for C Developed by Thomas A. Henzinger (EPFL) Rupak Majumdar (UC Los Angeles) Ranjit Jhala (UC San Diego) Dirk Beyer (Simon Fraser.
The Software Model Checker BLAST by Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala and Rupak Majumdar Presented by Yunho Kim Provable Software Lab, KAIST.
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar Shaz Qadeer.
Permissive Interfaces Tom Henzinger Ranjit Jhala Rupak Majumdar.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Scalable Error Detection using Boolean Satisfiability 1 Yichen Xie and Alex Aiken Stanford University.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Thread-modular Abstraction Refinement Tom Henzinger Ranjit Jhala Rupak Majumdar [UC Berkeley] Shaz Qadeer [Microsoft Research]
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
Secrets of Software Model Checking Thomas Ball Sriram K. Rajamani Software Productivity Tools Microsoft Research
SLAM Over the Summer Wes Weimer (Tom Ball, Sriram Rajamani, Manuvir Das)
1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.
Control Flow Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Program analysis Mooly Sagiv html://
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
Automatically Validating Temporal Safety Properties of Interfaces Thomas Ball and Sriram K. Rajamani Software Productivity Tools, Microsoft Research Presented.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Intraprocedural Points-to Analysis Flow functions:
Temporal-Safety Proofs for Systems Code Thomas A. Henzinger Ranjit Jhala Rupak Majumdar George Necula Westley Weimer Grégoire Sutre UC Berkeley.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.
ESP [Das et al PLDI 2002] Interface usage rules in documentation –Order of operations, data access –Resource management –Incomplete, wordy, not checked.
Overview of program analysis Mooly Sagiv html://
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Lazy Abstraction Tom Henzinger Ranjit Jhala Rupak Majumdar Grégoire Sutre.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Overview of program analysis Mooly Sagiv html://
Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.
Lazy Abstraction Lecture 3 : Partial Analysis Ranjit Jhala UC San Diego With: Tom Henzinger, Rupak Majumdar, Ken McMillan, Gregoire Sutre.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Symbolic Path Simulation in Path-Sensitive Dataflow Analysis Hari Hampapuram Jason Yue Yang Manuvir Das Center for Software Excellence (CSE) Microsoft.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
CSC2108 Lazy Abstraction on Software Model Checking Wai Sum Mong.
Mining Windows Kernel API Rules Jinlin Yang 09/28/2005CS696.
Rule Checking SLAM Checking Temporal Properties of Software with Boolean Programs Thomas Ball, Sriram K. Rajamani Microsoft Research Presented by Okan.
Aditya V. Nori, Sriram K. Rajamani Microsoft Research India.
Race Checking by Context Inference Tom Henzinger Ranjit Jhala Rupak Majumdar UC Berkeley.
Lazy Abstraction Jinseong Jeon ARCS, KAIST CS750b, KAIST2/26 References Lazy Abstraction –Thomas A. Henzinger et al., POPL ’02 Software verification.
Automatically Validating Temporal Safety Properties of Interfaces Thomas Ball, Sriram K. MSR Presented by Xin Li.
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
Presentation Title 2/4/2018 Software Verification using Predicate Abstraction and Iterative Refinement: Part Bug Catching: Automated Program Verification.
Pointer analysis.
Abstraction, Verification & Refinement
Predicate Abstraction
BLAST: A Software Verification Tool for C programs
Presentation transcript:

Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft Research University of Washington UC Berkeley

Motivation Static analysis for program verification Complex dataflow analyses are popular –SLAM, ESP, BLAST, CQual, … –Flow-Sensitive –Interprocedural –Expensive! Cut down on “data flow facts” Without losing anything important

General Idea If complex analysis is worse than O(N) And you have a cheap analysis that –Is O(N) –Reduces N Then composing them saves time

Value Flow Graph (VFG) Variant of a points-to graph Encodes the flow of values in the program Conservative approximation Lightweight, fast to compute and query Early queries can safely reduce –data-flow facts considered –program points considered Like slicing a program wrt. value flow

Computing a VFG Use a subtyping-based pointer analysis –We used One-Level Flow [Das] Process all assignments –Not just those involving pointers Represent constant values explicitly –Put them in the graph Label graph with source locations –Encodes program slices

Example Points-To Graph 1: int a, *x; 2: x = &a; 3: *x = 7; x a Points-to Edge Source “Address” Node Expr Node x

One Level Flow Graph 1: int a, *x; 2: x = &a; 3: *x = 7; x a Flow Edge Points-to Edge Source “Address” Node Expr Node x

Value Flow Graph 1: int a, *x; 2: x = &a; 3: *x = 7; 7 x a ,3 2 Flow Edge Points-to Edge Source “Address” Node Expr Node x

VFG Properties Computed in almost-linear time Get points-to sets from VFG in linear time –Backwards reachability via flow edges –Gather up all variables Get value flow from VFG in linear time –Backwards reachability via flow edges –Follow points-to edges up one

VFG Query: Points-To of x 1: int a, *x; 2: x = &a; 3: *x = 7; 7 x a ,3 2 Flow Edge Points-to Edge Source “Address” Node Expr Node x

VFG Query: Value Flow into a 1: int a, *x; 2: x = &a; 3: *x = 7; 7 x a ,3 2 Flow Edge Points-to Edge Source “Address” Node Expr Node x

VFG Summary Computed in almost-linear time Queries complete in linear time Approximates flow of values in program Show two applications that benefit –ESP –SLAM

Application 1: ESP Verification tool for large C++ programs Tracks “typestate” of values –Encoded as Finite State Machine –Special Error state Core: interprocedural data-flow engine –Flow sensitive: state at every point Performed bottom-up on call graph Requires function summaries

ESP Function Summaries Consider stateful memory locations Summarize function behavior for each loc –Reducing number of locs would be good! –But C has evil casts, so types cannot be used Worst case set of locations: –All globals and formal parameters –Everything transitively reachable from there

Reduce Location Set Location L needs to be considered in F if –Some exp E has its state changed in F –Value held by L at entry to F can flow into E Assuming state-changing ops are known Query VFG to find values that flow in

ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } Locations to consider for foo() summary: { e, *e, f, *f, g, *g, h, *h }

ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } (1) Compute VFG (2) Query value flow on *p (3) Reduced locations to consider for foo() summary: { e, f } (4) Reduce lines to consider for dataflow

ESP Results FILE * output in GCC –140 KLOC, 2149 functions, 66 files, 1068 globals VFG Queries take 200 seconds Reduce average number of locations per function summary from 1100 to <1 –Median of 15 for functions with >0 Verification takes 15 minutes –Infeasible otherwise

Application 2: SLAM Validates temporal safety properties –Boolean abstraction –Interprocedural dataflow analysis –Counterexample-driven refinement Convert C program to Boolean program Exhaustive dataflow analysis –No errors? Program is safe. –Real error? Program has a bug. –False error? Add predicates, repeat.

Boolean Programs int x,y; x = 5; y = 6; x = x * 2; y = y * 2; assert(x<y) bool p,q; p = 1; q = 1; p = 0; q = 0; q = 1; assert(q) p means “x == 5” q means “x < y” C Program Predicates (important!) Boolean Program

SLAM Predicates Hard to come up with good predicates Counterexample-driven refinement –Picks good predicates –Is very slow Taking all possible predicates –Is even slower Want “all the useful” predicates

Speeding Up SLAM For a simple subset of C –Similar to “Copy Constants” –Use VFG to find a sufficient set of predicates –Provably sufficient for this subset If this set fails to prove the real program –Fall back on counterexample-driven refinement

A Simple Language s ::= v i = n// constants | v i = v j // variable copy | if (*) s 1 else s 2 // condition ignored | v i = fun(v j, …)// function call | return(v i )// function return | assert(v i  v j )// safety property

Predicate Discovery High-level idea –Each flow edge in the VFG means “values may flow from X to Y” –Add predicates to see if they do For each assert(v i  v j ) –Consider the chain of values flowing to v i, v j –Add an equality predicate for each link –Use constants to resolve scoping

SLAM Example int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b 4c2

Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b Predicates: b == r r == 3 r == f f == a a == 1

Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b Predicates: b == r r == 3 r == f f == a // no scope! a == 1

Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } a1fr3b Predicates:b == rr == 3r == f f == a // no scope!f == 1 f == 3 a == 1a == 1 a == 3

Why does this work? Simple language –No arithmetic, etc. –Just copying around initial values Knowing final values of variables –Completely decides safety condition Still related to real life –Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.

Some SLAM Results ProgramLOCOriginal Runtime Improved Runtime Generated Predicates Missing Predicates apmbatt s22s850 pnpmem s125s1434 floppy s600s15433 iscsiprt4543**729s14642 Generated predicates are between all and two-thirds of the necessary predicates. However, since SLAM must iterate once to generate 3-7 missing predicates, the net performance increase is more than linear. Predicates can be specialized or simplified if the assert() condition is a common relational operator (e.g., x==y, x<y, x==5).

Conclusions Complex interprocedural analyses can benefit from inexpensive value-flow VFG encodes value flow –Constructed and queried quickly Prune the set of dataflow facts and program points considered Large net performance increase