Formal Methods Program Slicing & Dataflow Analysis February 2015.

Slides:



Advertisements
Similar presentations
DATAFLOW TESTING DONE BY A.PRIYA, 08CSEE17, II- M.s.c [C.S].
Advertisements

Overview Structural Testing Introduction – General Concepts
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Course Outline Traditional Static Program Analysis Software Testing
Lecture 11: Code Optimization CS 540 George Mason University.
Data Flow Analysis. Goal: make assertions about the data usage in a program Use these assertions to determine if and when optimizations are legal Local:
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
1 Introduction to Data Flow Analysis. 2 Data Flow Analysis Construct representations for the structure of flow-of-data of programs based on the structure.
Jeffrey D. Ullman Stanford University. 2  Generalizes: 1.Moving loop-invariant computations outside the loop. 2.Eliminating common subexpressions. 3.True.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.
1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Data Flow Analysis Compiler Design Nov. 3, 2005.
Lecture 6 Program Flow Analysis Forrest Brewer Ryan Kastner Jose Amaral.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
2015/6/29\course\cpeg421-08s\Topic4-a.ppt1 Topic-I-C Dataflow Analysis.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Loops Guo, Yao.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Ben Livshits Based in part of Stanford class slides from
1 CS 201 Compiler Construction Data Flow Analysis.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
1 Data-Flow Analysis Proving Little Theorems Data-Flow Equations Major Examples.
Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
Data Flow Analysis Compiler Baojian Hua
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
MIT Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.
DFA foundations Simone Campanoni
Data Flow Analysis Suman Jana
Simone Campanoni DFA foundations Simone Campanoni
Dataflow Testing G. Rothermel.
Fall Compiler Principles Lecture 8: Loop Optimizations
Topic 10: Dataflow Analysis
Program Slicing Baishakhi Ray University of Virginia
University Of Virginia
Code Optimization Chapter 9 (1st ed. Ch.10)
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Fall Compiler Principles Lecture 10: Loop Optimizations
Data Flow Analysis Compiler Design
Topic-4a Dataflow Analysis 2019/2/22 \course\cpeg421-08s\Topic4-a.ppt.
Static Single Assignment
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Formal Methods Program Slicing & Dataflow Analysis February 2015

Program Analysis Automatic analysis of a program Two main objectives –Correctness: program verification –Efficiency: code optimization (compilers) –Security: understand code vulnerabilities Two types of analysis –Static Analysis: do not execute program; reason over all inputs –Dynamic Analysis: Execute program; reason over specific input

Static Analysis Based upon source code analysis Useful for: – Semantic Analysis of Programs e.g. Type Inference, etc. – Optimizations and Transformations e.g. Dataflow/Control-flow Analysis – Program Verification e.g. Dijkstra’s Weakest Precondition Methods

Dynamic Analysis Based upon one or more runs of the program on given inputs Useful for: – Performance Analysis – Dynamic Slicing – Program Debugging

Static Analysis Techniques Type Inference – Check or infer types for program expressions Data Flow Analysis – Analyze variable and other dependencies Program Slicing – Construct reduced program WRT variables of interest Model checking – Check temporal properties of programs Theorem proving – Use logical deduction to prove facts

References Hiralal Agrawal and Joseph Horgan, Dynamic Program Slicing, ACM SIGPLAN Conf. on Programming Language Design and Implementation; also in SIGPLAN Notices, 25(6): , 1990 H. Agrawal, Richard A. DeMillo, Eugene H. Spafford: Dynamic Slicing in the Presence of Unconstrained Pointers. Proceedings of Symposium on Testing, Analysis, and Verification, 1991: Frank Tip, A Survey on Program Slicing Techniques, Journal of Programming Languages, (3): , 1995 Mark Weiser: Program Slicing. IEEE Transactions on Software Engineering. 10(4): (1984)

Static and Dynamic Program Slicing

Static Program Slicing Computing a reduced program with respect to a criterion: Helps understand dependencies in programs and helps program debugging Other applications: software testing software maintenance parallelization

#define YES 1 #define NO 0 main() { int c, nl, nw, nc, inword; inword = NO; nl = 0; nw = 0; nc = 0; c = getchar(); while (c != EOF) { nc = nc + 1; if (c == ‘\n’) nl = nl + 1; if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO; else if (inword == NO) { inword = YES; nw = nw + 1; } c = getchar(); } printf(“%d \n”, nl); printf(“%d \n”, nw); printf(“%d \n”, nc); } Example: Char, Line, and Word Counter

while (c != EOF) nc = nc + 1 if (c == ‘\n’) if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) nl = nl + 1 if (inword == NO) inword = YES nw = nw + 1 inword = NO; nl = 0; nw = 0; nc = 0; c = getchar(); TRUE printf(“%d \n”, nl); printf(“%d \n”, nw); printf(“%d \n”, nc); inword == NO c = getchar();

#define YES 1 #define NO 0 main() { int c, nw, inword; inword = NO; nw = 0; c = getchar(); while (c != EOF) { if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO; else if (inword == NO) { inword = YES; nw = nw + 1; } c = getchar(); } printf(“%d \n”, nw); } Program Slice: Word Counter

#define YES 1 #define NO 0 main() { int c, nl; nl = 0; c = getchar(); while (c != EOF) { if (c == ‘\n’) nl = nl + 1; c = getchar(); } printf(“%d \n”, nl); } Program Slice: Line Counter

#define YES 1 #define NO 0 main() { int c, nc; nc = 0; c = getchar(); while (c != EOF) { nc = nc + 1; c = getchar(); } printf(“%d \n”, nc); } Program Slice: Character Counter

Slicing OO Programs: Example Ohm’s Law class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; } class parallel extends component { attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }

Slice WRT Resistance class parallel extends component { attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; } class parallel extends component { attributes component [ ] C; constraints (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }

Slice WRT Resistance class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; } class component { attributes Real R; constraints constructor component(R1) { R = R1; }

Static Slicing Classification Forward vs Backward Intra vs Inter Procedural Procedural vs OO Languages OO Slicing is a good topic for presentation. Slicing is based upon Dataflow Analysis, and hence will examine this topic first.

Data Flow Analysis Compiler does data flow analysis for various reasons: detect common subexpressions, loop invariant operations, uninitialized variables, etc. Two forms of data flow analysis: –Forward Flow –Backward Flow Characterized by Data Flow Constraints

Examples of Data Flow Analysis Forward Flow –Reaching Definitions (U) –Available Expressions (ח) Backward Flow –Live Variables (U) –Very Busy Expressions (Π)

Summary of DF Analyses Forward Backward Reaching Definitions Available Expressions Live Variables Very Busy Expressions U (LFP)Π (GFP)

KILL(B) and GEN(B) sets x := y + 1; z := w + x; v := z + u; x := x + v; d1: d2: d3: d4: B KILL(B) = {d5} GEN(B) = {d2,d3,d4} IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;} KILL(B) eliminates each definition whose variable is re-assigned within B. GEN(B) adds (the last) definition for each variable that is assigned in B.

Reaching Definitions OUT(B) = (IN(B) – KILL(B)) U GEN(B) B … … OUT(B) IN(B) IN(B) = U {OUT(P) | P  B in the graph}

Illustrating the equations x := y + 1; z := w + x; v := z + u; x := x + v; d1: d2: d3: d4: B KILL(B) = {d5} GEN(B) = {d2,d3,d4} IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;} OUT(B) = IN(B) – KILL(B) U GEN(B) = {d2,d3,d4,d6,d7,d8}

Least Fixed Point Theorem: Every Monotonic Function on a Finite Lattice has a Least Fixed Point. For Reaching Definitions, the lattice of interest is P (S), the powerset of S, the set of all definition points, ordered by the ≤ (subset or equal) relation. Note that S is finite. The least upper bound and greatest lower bound are set union and set intersection respectively.

More on Monotonicity OUT(B) = (IN(B) – KILL(B)) U GEN(B) U is monotonic in both arguments X – Y is monotonic in X but not Y Since KILL(B) is a constant for each B, its use does not violate monotonicity Fixed point iteration will converge only if the functions are monotonic. Note: The composition of monotonic functions is monotonic.

Example: Non-Monotonic Function x = not(y) y = not(x) Boolean Lattice: There are two fixed points: 1.x = T, y = F 2.x = F, y = T No unique solution!

Sketch of Algorithm: Least Fixed Point Iteration Forall basic blocks B Do { IN(B) := {}; OUT(B) := GEN(B) } While no more changes Do { Forall B Do { IN(B) := U { OUT(P) | P  B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILL(B) U GEN(B); }

Control Flow Graph From Aho & Ullman, Principles of Compiler Design, 1977

KILL(B) and GEN(B) {d1, d2} {d3} {d4} {d5} {} {d3,d4,d5} {d1} {d2,d5} {d2,d4} {}

Initialize: IN(B) = {} and OUT(B) = GEN(B) {d1, d2} {d3} {d4} {d5} {}

Iteration 1: IN(B) = U {OUT(P) | P  B} {d3} {d1, d2} {d3} {d4} {d4,d5}

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1, d2} {d2, d3} {d3, d4} {d5} {d4,d5}

IN(B) = U {OUT(P) | P  B} {d2,d3} {d1, d2, d4, d5} {d2, d3} {d3, d4} {d3, d4,d5} Iteration 2:

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1,d2} {d2,d3,d4,d5} {d3,d4} {d3,d5} {d3,d4,d5}

IN(B) = U {OUT(P) | P  B} {d2,d3,d4,d5} {d1,d2,d3,d4,d5} {d2,d3,d4,d5} {d3, d4} {d3, d4,d5} Iteration 3:

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1,d2} {d2,d3,d4,d5} {d3, d4} {d3, d5} {d3,d4,d5}

IN(B) = U {OUT(P) | P  B} {d2,d3,d4,d5} {d1,d2,d3,d4,d5} {d2,d3,d4,d5} {d3,d4} {d3,d4,d5} Iteration 4:

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1,d2} {d2,d3,d4,d5} {d3, d4} {d3, d5} {d3,d4,d5}

Uses of Reaching Definitions Uninitialized Variables: Add a dummy ass’t for all variables at start of the program and check where they “reach”. Loop Invariant Operations: An expr ‘X op Y’ in a loop is invariant if all definitions for X and Y are outside the loop. Static Program Slicing: We will examine the technique in more detail in the next class.

Analysis is Approximate From Aho & Ullman, Principles of Compiler Design, 1977 this statement will not ever be executed A := 2 and A := 3 reach point p

Algorithm Efficiencies Can represent sets by bit vectors, so that U and Π become logical \/ and /\. The number of iterations bounded by number of nodes in graph. By visiting the nodes B 1, …, B k in “depth-first order” the number iterations can be minimized. In practice, the number <= 5.

(Reverse) Depth-first Traversal Order Traversal Sequence: IN(B1) OUT(B1) IN(B2) OUT(B2) IN(B3) OUT(B3) … IN(B10) OUT(B10) The path of “back edges” 10  7  4  3 determines number of iterations

Global Common Subexpressions … … … … := X * Y … … p X and Y do not change here

Global Common Subexpressions … … … … T := X * Y … … p := T X and Y do not change here

Available Expression X op Y is said to be ‘available’ at a point p if every path from the start of the program to p evaluates X op Y and after the last such evaluation prior to p, there are no subsequent assignments to X or Y. OUT(B) = (IN(B) – KILL e (B)) U GEN e (B) IN(B) = Π {OUT(P) | P  B in the graph}

Algorithmic Sketch: Greatest Fixed Point Computation Forall basic blocks B except initial block Do { IN(B) := E (set of all exprs in program); OUT(B) := E – KILL e (B) } While no more changes Do { Forall B Do { IN(B) := Π { OUT(P) | P  B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILL e (B) U GEN e (B); }

Greatest Fixed Point Iteration Theorem: Every Monotonic Function on a Finite Lattice has a Greatest Fixed Point. For Available Expressions, the lattice of interest here is the P (S), the powerset of S, the set of all expressions appearing in the program ordered by the ≤ (subset or equal) relation. Note that S is finite. The least upper bound and greatest lower bound are set union and intersection respectively.

Live Variables A variable X is live at p if X will be referenced in some path starting from p to the end of the program IN(B) = (OUT(B) – DEF(B)) U USE(B) OUT(B) = U {IN(S) | B  S in the graph} DEF(B) = variables that are assigned in B before they are used USE(B) = variables that are used in B before any assignment to them in B

Live Variable Analysis Example of a Backward Flow Analysis. Useful in register allocation/deallocation The role of IN and OUT are reversed compared with reaching definitions and available expressions This is a least fixed point iteration due to the use of the U in defining OUT(B).

Very Busy Expressions X op Y is said to be ‘very busy’ at a point p if every path from p encounters X op Y before any assignment to X or Y. DEF(B) = expressions X op Y in B in which X or Y is defined before computing X op Y USE(B) = expressions X op Y in B in which neither X nor Y is defined before computing X op Y IN(B) = (OUT(B) – DEF vb (B)) U USE vb (B) OUT(B) = Π {IN(P) | P  B in the graph}

Code Hoisting Very Busy expressions are useful in “code hoisting” Example of backward flow analysis. p A := B op C D := B op C := A := D B and C do not change here

After Code Hoisting p := T T := B op C Assumes that B op C does not

Programming with Partial Orders and Lattices

Terms and Exprs

LUB and GLB are basic operations

Pattern Matching with Sets

Program Flow Analysis: Reaching Definitions

Program Flow Analysis: Very Busy Expressions Note: E is the set of all expressions in the program being analyzed.

Conditional Clauses: Shortest Distance

function short/total