Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
A Deeper Look at Data-flow Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University.
1 CS 201 Compiler Construction Data Flow Framework.
Lecture 15 – Dataflow Analysis Eran Yahav 1
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
Program analysis Mooly Sagiv html://
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
From last time: Lattices A lattice is a tuple (S, v, ?, >, t, u ) such that: –(S, v ) is a poset – 8 a 2 S. ? v a – 8 a 2 S. a v > –Every two elements.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Administrative stuff Office hours: After class on Tuesday.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
1 Data-Flow Frameworks Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
1 CS 201 Compiler Construction Lecture 4 Data Flow Framework.
Data Flow Analysis Compiler Design Nov. 8, 2005.
From last lecture We want to find a fixed point of F, that is to say a map m such that m = F(m) Define ?, which is ? lifted to be a map: ? = e. ? Compute.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
Constant Propagation. The constant propagation framework is different from all the data-flow problems discussed so far, in that It has an unbounded set.
Precision Going back to constant prop, in what cases would we lose precision?
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
MIT Foundations of Dataflow Analysis Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Solving fixpoint equations
Machine-Independent Optimizations Ⅱ CS308 Compiler Theory1.
Course Outline Traditional Static Program Analysis –Theory –Classic analysis and applications Points-to analysis, CHA, RTA –The Soot analysis framework.
Formal Methods Program Slicing & Dataflow Analysis February 2015.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Still at dataflow frameworks.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Optimization Simone Campanoni
Lub and glb Given a poset (S, · ), and two elements a 2 S and b 2 S, then the: –least upper bound (lub) is an element c such that a · c, b · c, and 8 d.
DFA foundations Simone Campanoni
Data Flow Analysis Suman Jana
Simone Campanoni DFA foundations Simone Campanoni
Iterative Program Analysis Abstract Interpretation
University Of Virginia
Another example: constant prop
Data Flow Analysis Compiler Design
Lecture 20: Dataflow Analysis Frameworks 11 Mar 02
Static Single Assignment
Dataflow Analysis: Dataflow Frameworks
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic analyses and applications Software Testing Dynamic Program Analysis

Announcements No class on Monday –Tuesday follows Tuesday schedule Homework 1 due today Homework 2posted –Due Monday, February 28 th Requests for CS accounts sent to labstaff

Outline Data-flow frameworks –Monotone frameworks –Distributive frameworks –A Non-distributive example: Points-to Analysis –The “Maximal Fixed Point” (MFP) solution –The “Meet Over all Paths” (MOP) solution Reading: Compilers: Principles, Techniques and Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.3

Monotone Dataflow Frameworks Generic data-flow equations: in(i) = V out(m) out(i) = f i (in(i)) Parameters: –Property space: in(i), out(i) are elements of a property space Combination operator V: U for may problems and ∩ for must problems Initial values set to the 0 (smallest element) of the property space –Transfer functions: f i is associated with node i –If we instantiate these parameters in a certain way, then our analysis is an instance of the monotone dataflow framework m in pred(i)

Monotone Frameworks: Requirements The property space Is a complete lattice L under partial order ≤ where L satisfies the Ascending Chain Condition (i.e., all ascending chains are finite) The combination operator V Is the join ( V, pronounced “vee”) of L Initial values set to the 0 of L –Reaching Definitions: Property space? Combination operator? –Available Expressions: Property space? Combination operator?

Monotone Frameworks: Requirements The transfer functions: f i : L  L Formally, there is space F such that 1.F contains all f i 2.F contains the identity function id(x) = x 3.F is closed under composition 4.Each f i is monotone

Monotonicity It is defined as (1) a ≤ b f(a) ≤ f(b) An equivalent definitions is (2) f(x) V f(y)≤ f(x V y) Lemma: The two definitions are equivalent. First, we show that (1) implies (2). Second, we show that (2) implies (1).

Distributive Frameworks A distributive framework: A monotone framework with distributive transfer functions: f(x) V f(y) = f(x V y).

The four classical dataflow problems Let AExp denote all expressions in the program. Let 2 |AExp| denote the powerset of AExp Let Def denote all definitions in the program Let 2 |Def| denote the powerset of Def L, elements 2 |Def| 2 |AExp| L, ≤2 |Def|,2 |AExp|, L, ∨ 2 |Def|, U 2 |AExp|, ∩ fifi out(i)=(in(i)-kill(i)) U gen(i) kill(i) = ? gen(i) = ? out(i)=(in(i)-kill(i)) U gen(i) kill(i) = ? gen(i) = ? Reaching DefinitionsAvailable Expressions

Distributive Frameworks Each of the four problems is an instance of a distributive framework. –First, prove monotonicity –Second, prove distributivity of the functions

Distributivity Each of the four problems is an instance of a distributive framework. –First, prove monotonicity if in’(i) ≤ in”(i) then out’(i) ≤ out”(i) For Reaching Definitions we have to show: if in’(i) in”(i) then (in’(i) ∩ pres(i)) U gen(i) (in”(i) ∩ pres(i)) U gen(i) –Second, prove distributivity ((in’(i) U in”(i)) ∩ pres(i)) U gen(i) = ((in’(i) ∩ pres(i)) U gen(i)) U ( (in”(i) ∩ pres(i)) U gen(i))

Points-to Analysis: a fundamental analysis. It computes the memory locations that a pointer variable may point to Example 1: int a, b; int *p1, *p2; p1 = &a; p2 = p1; *p2 = 1; Points-to Analysis Example 2: int a, b = 15; int *p1, *p2; int **p3; p3 = &p1; p1 = &a; p2 = *p3; *p2 = b;

Points-to Analysis: Monotone, Non- distributive Analysis Lattice: The set of all points-to graphs Pt ≤ is inclusion, Pt1 ≤ Pt2 if Pt1 is a subgraph of Pt2 V is union, P1 V P2 = P1 U P2 Transfer functions are defined on four kinds of statements: –(1) f(p=&q) is “kill” all points-to edges from p, and “generate a new points-to edge from p to q –(2) f(p=q) is “kill” all points-to edges from p, and “generate” new points-to edges from p to every x such that q points-to x –(3) f(p=*q) is “kill” all points-to edges from p, and “generate” new points to edges from p to every x, such that there exists y and q points to y and y points to x –(4) f(*p=q) Do not perform kill. Can you think of a reason why? “Generate” new points-to edges from every y to every x, such that p points to y and q points to x.

Monotone non-distributive Analysis First, we show that the framework is monotone, –I.e., for each of the four transfer functions we have to show that if Pt1 ≤ Pt2, then f(Pt1) ≤ f(Pt2) Second, we show that the framework is not distributive –It is easy to show f(Pt1 V Pt2) ≠ f(Pt1) V f(Pt2) Another example is constant propagation

Non-distributivity of Points-to Analysis p=&x; q=&y; p=&z; q=&w; *p=q pq xy Pt1: pq zw Pt2: pq xy f(Pt1): pq zw f(Pt2): f(Pt1) V f(Pt2) : pq xy zw Pt1 V Pt2 : pq xy zw pq xy zw f(Pt1 V Pt2): What f does: Adds edges from each variable that p points to (i.e., x and z), to each variable where q points to (i.e., y and w). 4 new edges: from x to y and w, and from z to y and w.

The Maximal Fixed Point (MFP) 1 /* Initialize to initial values */ in(1)=InitialValue; in(1) = UNDEF for m := 2 to n do in(m) := 0; i n(m) := Ø W := {1,2,…,n} /* put every node on the worklist */ while W ≠ Ø do { remove i from W; out(i) = f i (in(i)); out RD (i) = in RD (i)∩pres(i) U gen(i) for j in successors(i) if out(i) ≤ in(j) then { if out RD (i) not subset of in RD (j) in(j) = out(i) V in(j); in RD (j) = out(i) U in RD (j) if j not in W do add j to W } 1. The Least Fixed Point (LFP) actually…

Properties of the algorithm Lemma1: The algorithm terminates. Sketch of the proof: We have in k (j) ≤ in k+1 (j) and since L has ACC, in(j) changes at most O(h) times. Thus, each j is put on W at most O(h) times (h is the height of the lattice L). Complexity: At each iteration, the analysis examines e(j) out edges. Thus, number of basic operations is bounded by h*(e(1) out +…+e(N) out )=O(h*E). We can do better on reducible graphs.

Properties of the Algorithm Lemma2: The algorithm computes the least solution of the dataflow equations. –For every node i MFP computes solution MFP(i) = {in(i),out(i)}, such that every other solution {in’(i),out’(i)} of the dataflow equations is “larger” than the MFP Lemma3: The algorithm computes a correct (safe) solution.

Example 1. z:=x+y 2. if (z > 500) 3. skip in AE (2) = out AE (1) V out AE (3) in out (3) = out AE (2) in AE (1) = Ø out AE (2) = in AE (2) out AE (3) = in AE (3) out AE (1) = (in AE (1)-E z ) {x+y} Equivalent to: in AE (2) = {x+y} V in AE (2) and recall that V is ∩ (i.e., set intersection). Solution1Solution2 Ø {x+y} Ø Ø Ø That is why we needed to initialize in AE (2) and the other initial values to the universal set of expressions (0 of the Available Expressions lattice), rather than to the more intuitive empty set.

Meet Over All Paths (MOP) Solution 1 Desired dataflow information at n is obtained by traversing ALL PATHS from ρ to n. For every path p=(ρ, n 1, n 2..., n k ) we compute f n k (…f n 2 (f n 1 (init(ρ)))) The MOP at entry of n is V f n k (…f n 2 (f n 1 (init(ρ)))) The MOP is the best summary of dataflow facts possible to compute with this static analysis … ρ n1n1 n2n2 nknk p in paths from ρ to n n

MOP vs. MFP For distributive functions the dataflow analysis can merge paths (p1, p2), without loss of precision! –E.g., f p1 (0) need not be calculated explicitly –MFP=MOP Due to Kam and Ullman, 1976,1977: This is not true for monotone functions. Lemma 3: The MFP approximates the MOP for general monotone functions: MFP ≥ MOP

Many Applications! White-box testing: compute coverage –Control-flow-based testing –Data-flow-based testing Intuitively, test each def-use chain Regression testing –Analyze changes and select regression tests that actually test changed code

Many Applications! Reverse engineering –UML class diagrams –UML sequence diagrams –Many tools do these; Eclipse plug-ins Automated refactoring –Analyze program, prove “safety” of the refactoring –Eclipse plug-ins

Many Applications! Static debugging –Memory errors in C/C++ programs Memory leaks Null pointer dereferences Array-out-of-bound accesses –Concurrency errors in shared-memory apps Data-races Atomicity violations Deadlocks