Formal Methods Program Slicing & Dataflow Analysis February 2015.

Formal Methods Program Slicing & Dataflow Analysis February 2015

Program Analysis Automatic analysis of a program Two main objectives –Correctness: program verification –Efficiency: code optimization (compilers) –Security: understand code vulnerabilities Two types of analysis –Static Analysis: do not execute program; reason over all inputs –Dynamic Analysis: Execute program; reason over specific input

Static Analysis Based upon source code analysis Useful for: – Semantic Analysis of Programs e.g. Type Inference, etc. – Optimizations and Transformations e.g. Dataflow/Control-flow Analysis – Program Verification e.g. Dijkstra’s Weakest Precondition Methods

Dynamic Analysis Based upon one or more runs of the program on given inputs Useful for: – Performance Analysis – Dynamic Slicing – Program Debugging

Static Analysis Techniques Type Inference – Check or infer types for program expressions Data Flow Analysis – Analyze variable and other dependencies Program Slicing – Construct reduced program WRT variables of interest Model checking – Check temporal properties of programs Theorem proving – Use logical deduction to prove facts

References Hiralal Agrawal and Joseph Horgan, Dynamic Program Slicing, ACM SIGPLAN Conf. on Programming Language Design and Implementation; also in SIGPLAN Notices, 25(6): 246-256, 1990 H. Agrawal, Richard A. DeMillo, Eugene H. Spafford: Dynamic Slicing in the Presence of Unconstrained Pointers. Proceedings of Symposium on Testing, Analysis, and Verification, 1991: 60-73 Frank Tip, A Survey on Program Slicing Techniques, Journal of Programming Languages, (3):121-189, 1995 Mark Weiser: Program Slicing. IEEE Transactions on Software Engineering. 10(4): 352-357 (1984)

Static and Dynamic Program Slicing

Static Program Slicing Computing a reduced program with respect to a criterion: Helps understand dependencies in programs and helps program debugging Other applications: software testing software maintenance parallelization

#define YES 1 #define NO 0 main() { int c, nl, nw, nc, inword; inword = NO; nl = 0; nw = 0; nc = 0; c = getchar(); while (c != EOF) { nc = nc + 1; if (c == ‘\n’) nl = nl + 1; if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO; else if (inword == NO) { inword = YES; nw = nw + 1; } c = getchar(); } printf(“%d \n”, nl); printf(“%d \n”, nw); printf(“%d \n”, nc); } Example: Char, Line, and Word Counter

while (c != EOF) nc = nc + 1 if (c == ‘\n’) if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) nl = nl + 1 if (inword == NO) inword = YES nw = nw + 1 inword = NO; nl = 0; nw = 0; nc = 0; c = getchar(); TRUE printf(“%d \n”, nl); printf(“%d \n”, nw); printf(“%d \n”, nc); inword == NO c = getchar();

#define YES 1 #define NO 0 main() { int c, nw, inword; inword = NO; nw = 0; c = getchar(); while (c != EOF) { if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO; else if (inword == NO) { inword = YES; nw = nw + 1; } c = getchar(); } printf(“%d \n”, nw); } Program Slice: Word Counter

#define YES 1 #define NO 0 main() { int c, nl; nl = 0; c = getchar(); while (c != EOF) { if (c == ‘\n’) nl = nl + 1; c = getchar(); } printf(“%d \n”, nl); } Program Slice: Line Counter

#define YES 1 #define NO 0 main() { int c, nc; nc = 0; c = getchar(); while (c != EOF) { nc = nc + 1; c = getchar(); } printf(“%d \n”, nc); } Program Slice: Character Counter

Slicing OO Programs: Example Ohm’s Law class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; } class parallel extends component { attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }

Slice WRT Resistance class parallel extends component { attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; } class parallel extends component { attributes component [ ] C; constraints (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }

Slice WRT Resistance class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; } class component { attributes Real R; constraints constructor component(R1) { R = R1; }

Static Slicing Classification Forward vs Backward Intra vs Inter Procedural Procedural vs OO Languages OO Slicing is a good topic for presentation. Slicing is based upon Dataflow Analysis, and hence will examine this topic first.

Data Flow Analysis Compiler does data flow analysis for various reasons: detect common subexpressions, loop invariant operations, uninitialized variables, etc. Two forms of data flow analysis: –Forward Flow –Backward Flow Characterized by Data Flow Constraints

Examples of Data Flow Analysis Forward Flow –Reaching Definitions (U) –Available Expressions (ח) Backward Flow –Live Variables (U) –Very Busy Expressions (Π)

Summary of DF Analyses Forward Backward Reaching Definitions Available Expressions Live Variables Very Busy Expressions U (LFP)Π (GFP)

KILL(B) and GEN(B) sets x := y + 1; z := w + x; v := z + u; x := x + v; d1: d2: d3: d4: B KILL(B) = {d5} GEN(B) = {d2,d3,d4} IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;} KILL(B) eliminates each definition whose variable is re-assigned within B. GEN(B) adds (the last) definition for each variable that is assigned in B.

Reaching Definitions OUT(B) = (IN(B) – KILL(B)) U GEN(B) B … … OUT(B) IN(B) IN(B) = U {OUT(P) | P  B in the graph}

Illustrating the equations x := y + 1; z := w + x; v := z + u; x := x + v; d1: d2: d3: d4: B KILL(B) = {d5} GEN(B) = {d2,d3,d4} IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;} OUT(B) = IN(B) – KILL(B) U GEN(B) = {d2,d3,d4,d6,d7,d8}

Least Fixed Point Theorem: Every Monotonic Function on a Finite Lattice has a Least Fixed Point. For Reaching Definitions, the lattice of interest is P (S), the powerset of S, the set of all definition points, ordered by the ≤ (subset or equal) relation. Note that S is finite. The least upper bound and greatest lower bound are set union and set intersection respectively.

More on Monotonicity OUT(B) = (IN(B) – KILL(B)) U GEN(B) U is monotonic in both arguments X – Y is monotonic in X but not Y Since KILL(B) is a constant for each B, its use does not violate monotonicity Fixed point iteration will converge only if the functions are monotonic. Note: The composition of monotonic functions is monotonic.

Example: Non-Monotonic Function x = not(y) y = not(x) Boolean Lattice: There are two fixed points: 1.x = T, y = F 2.x = F, y = T No unique solution!

Sketch of Algorithm: Least Fixed Point Iteration Forall basic blocks B Do { IN(B) := {}; OUT(B) := GEN(B) } While no more changes Do { Forall B Do { IN(B) := U { OUT(P) | P  B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILL(B) U GEN(B); }

Control Flow Graph From Aho & Ullman, Principles of Compiler Design, 1977

KILL(B) and GEN(B) {d1, d2} {d3} {d4} {d5} {} {d3,d4,d5} {d1} {d2,d5} {d2,d4} {}

Initialize: IN(B) = {} and OUT(B) = GEN(B) {d1, d2} {d3} {d4} {d5} {}

Iteration 1: IN(B) = U {OUT(P) | P  B} {d3} {d1, d2} {d3} {d4} {d4,d5}

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1, d2} {d2, d3} {d3, d4} {d5} {d4,d5}

IN(B) = U {OUT(P) | P  B} {d2,d3} {d1, d2, d4, d5} {d2, d3} {d3, d4} {d3, d4,d5} Iteration 2:

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1,d2} {d2,d3,d4,d5} {d3,d4} {d3,d5} {d3,d4,d5}

IN(B) = U {OUT(P) | P  B} {d2,d3,d4,d5} {d1,d2,d3,d4,d5} {d2,d3,d4,d5} {d3, d4} {d3, d4,d5} Iteration 3:

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1,d2} {d2,d3,d4,d5} {d3, d4} {d3, d5} {d3,d4,d5}

IN(B) = U {OUT(P) | P  B} {d2,d3,d4,d5} {d1,d2,d3,d4,d5} {d2,d3,d4,d5} {d3,d4} {d3,d4,d5} Iteration 4:

OUT(B) = IN(B) – KILL(B) U GEN(B) {d1,d2} {d2,d3,d4,d5} {d3, d4} {d3, d5} {d3,d4,d5}

Uses of Reaching Definitions Uninitialized Variables: Add a dummy ass’t for all variables at start of the program and check where they “reach”. Loop Invariant Operations: An expr ‘X op Y’ in a loop is invariant if all definitions for X and Y are outside the loop. Static Program Slicing: We will examine the technique in more detail in the next class.

Analysis is Approximate From Aho & Ullman, Principles of Compiler Design, 1977 this statement will not ever be executed A := 2 and A := 3 reach point p

Algorithm Efficiencies Can represent sets by bit vectors, so that U and Π become logical \/ and /\. The number of iterations bounded by number of nodes in graph. By visiting the nodes B 1, …, B k in “depth-first order” the number iterations can be minimized. In practice, the number <= 5.

(Reverse) Depth-first Traversal Order Traversal Sequence: IN(B1) OUT(B1) IN(B2) OUT(B2) IN(B3) OUT(B3) … IN(B10) OUT(B10) The path of “back edges” 10  7  4  3 determines number of iterations

Global Common Subexpressions … … … … := X * Y … … p X and Y do not change here

Global Common Subexpressions … … … … T := X * Y … … p := T X and Y do not change here

Available Expression X op Y is said to be ‘available’ at a point p if every path from the start of the program to p evaluates X op Y and after the last such evaluation prior to p, there are no subsequent assignments to X or Y. OUT(B) = (IN(B) – KILL e (B)) U GEN e (B) IN(B) = Π {OUT(P) | P  B in the graph}

Algorithmic Sketch: Greatest Fixed Point Computation Forall basic blocks B except initial block Do { IN(B) := E (set of all exprs in program); OUT(B) := E – KILL e (B) } While no more changes Do { Forall B Do { IN(B) := Π { OUT(P) | P  B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILL e (B) U GEN e (B); }

Greatest Fixed Point Iteration Theorem: Every Monotonic Function on a Finite Lattice has a Greatest Fixed Point. For Available Expressions, the lattice of interest here is the P (S), the powerset of S, the set of all expressions appearing in the program ordered by the ≤ (subset or equal) relation. Note that S is finite. The least upper bound and greatest lower bound are set union and intersection respectively.

Live Variables A variable X is live at p if X will be referenced in some path starting from p to the end of the program IN(B) = (OUT(B) – DEF(B)) U USE(B) OUT(B) = U {IN(S) | B  S in the graph} DEF(B) = variables that are assigned in B before they are used USE(B) = variables that are used in B before any assignment to them in B

Live Variable Analysis Example of a Backward Flow Analysis. Useful in register allocation/deallocation The role of IN and OUT are reversed compared with reaching definitions and available expressions This is a least fixed point iteration due to the use of the U in defining OUT(B).

Very Busy Expressions X op Y is said to be ‘very busy’ at a point p if every path from p encounters X op Y before any assignment to X or Y. DEF(B) = expressions X op Y in B in which X or Y is defined before computing X op Y USE(B) = expressions X op Y in B in which neither X nor Y is defined before computing X op Y IN(B) = (OUT(B) – DEF vb (B)) U USE vb (B) OUT(B) = Π {IN(P) | P  B in the graph}

Code Hoisting Very Busy expressions are useful in “code hoisting” Example of backward flow analysis. p A := B op C D := B op C := A := D B and C do not change here

After Code Hoisting p := T T := B op C Assumes that B op C does not

Programming with Partial Orders and Lattices

Terms and Exprs

LUB and GLB are basic operations

Pattern Matching with Sets

Program Flow Analysis: Reaching Definitions

Program Flow Analysis: Very Busy Expressions Note: E is the set of all expressions in the program being analyzed.

Conditional Clauses: Shortest Distance

function short/total

Formal Methods Program Slicing & Dataflow Analysis February 2015.

Similar presentations

Presentation on theme: "Formal Methods Program Slicing & Dataflow Analysis February 2015."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Formal Methods Program Slicing & Dataflow Analysis February 2015.

Similar presentations

Presentation on theme: "Formal Methods Program Slicing & Dataflow Analysis February 2015."— Presentation transcript:

Similar presentations

About project

Feedback