Interprocedural Analysis Noam Rinetzky Mooly Sagiv.

Slides:

Advertisements

Similar presentations

Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.

Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.

Interprocedural Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday

Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.

Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani

A Fixpoint Calculus for Local and Global Program Flows Swarat Chaudhuri, U.Penn (with Rajeev Alur and P. Madhusudan)

1 Lecture 06 – Inter-procedural Analysis Eran Yahav.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.

Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.

Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.

Program analysis Mooly Sagiv html://

Recap Mooly Sagiv. Outline Subjects Studied Questions & Answers.

Control Flow Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –

Program analysis Mooly Sagiv html://

1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3

Previous finals up on the web page use them as practice problems look at them early.

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

Interprocedural Analysis Noam Rinetzky Mooly Sagiv.

Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.

Interprocedural Analysis Noam Rinetzky Mooly Sagiv Tel Aviv University Textbook Chapter 2.5.

1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Overview of program analysis Mooly Sagiv html://

1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Overview of program analysis Mooly Sagiv html://

Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.

Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.

1 Tentative Schedule u Today: Theory of abstract interpretation u May 5 Procedures u May 15, Orna Grumberg u May 12 Yom Hatzamaut u May.

Constant Propagation. The constant propagation framework is different from all the data-flow problems discussed so far, in that It has an unbounded set.

Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.

Precision Going back to constant prop, in what cases would we lose precision?

Precise Interprocedural Dataflow Analysis via Graph Reachibility

1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:

Program Analysis and Verification

Program Analysis and Verification

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Program Analysis and Verification

Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Interprocedural Analysis Noam Rinetzky Mooly Sagiv.

Generating Precise and Concise Procedure Summaries Greta Yorsh Eran Yahav Satish Chandra.

Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications Chapter.

1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program.

1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:

Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.

Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.

Program Analysis Last Lesson Mooly Sagiv. Goals u Show the significance of set constraints for CFA of Object Oriented Programs u Sketch advanced techniques.

Program Analysis and Verification

Mooly Sagiv html://

Spring 2016 Program Analysis and Verification

Interprocedural Analysis Chapter 19

Symbolic Implementation of the Best Transformer

Iterative Program Analysis Abstract Interpretation

Program Analysis and Verification

Objective of This Course

Program Analysis and Verification

Program Analysis via Graph Reachability

Program Analysis via Graph Reachability

Data Flow Analysis Compiler Design

Pointer analysis.

Program Analysis via Graph Reachability

Presentation transcript:

Interprocedural Analysis Noam Rinetzky Mooly Sagiv

Bibliography Textbook 2.5 Patrick Cousot & Radhia Cousot. Static determination of dynamic properties of recursive procedures In IFIP Conference on Formal Description of Programming Concepts, E.J. Neuhold, (Ed.), pages , St-Andrews, N.B., Canada, North-Holland Publishing Company (1978). Two Approaches to interprocedural analysis by Micha Sharir and Amir Pnueli IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Sagiv, Reps, Horowitz, and TCS’96

Challenges in Interprocedural Analysis Respect call-return mechanism Handling recursion Local variables Parameter passing mechanisms The called procedure is not always known The source code of the called procedure is not always available Concurrency

Stack regime P() { … R(); … } R(){ … } Q() { … R(); … } call return call return P R

Guiding light Exploit stack regime  Precision  Efficiency

Simplifying Assumptions  Parameter passed by value  No OO  No nesting  No concurrency Recursion is supported

Topics Covered The trivial approach Procedure Inlining The naive approach The call string approach Functional Approach Valid paths Interprocedural Analysis via Graph Reachability Beyond reachability [Demand analysis] [Advanced Topics] –Karr’s analysis –Backward –MOPED

Undecidability Issues It is undecidable if a program point is reachable in some execution Some static analysis problems are undecidable even if the program conditions are ignored

The Constant Propagation Example while (...) { if (...) x_1 = x_1 + 1; if (...) x_2 = x_2 + 1;... if (...) x_n = x_n + 1; } y = truncate (1/ (1 + p 2 (x_1, x_2,..., x_n)) /* Is y=0 here? */

Conservative Solution Every detected constant is indeed constant –But may fail to identify some constants Every potential impact is identified –Superfluous impacts

The extra cost of procedures Call return complicates the analysis Unbounded stack memory Increases asymptotic complexity But can be tolerable in practice Sometimes even cheaper/more precise than intraprocedural analysis

A trivial treatment of procedure Analyze a single procedure After every call continue with conservative information –Global variables and local variables which “may be modified by the call” have unknown values Can be easily implemented Procedures can be written in different languages Procedure inline can help

Disadvantages of the trivial solution Modular (object oriented and functional) programming encourages small frequently called procedures Almost all information is lost

Procedure Inlining Inline called procedures Simple Does not handle recursion Exponential blow up p1 { call p2 … call p2 } p2 { call p3 … call p3 } p3{ }

A Naive Interprocedural solution Treat procedure calls as gotos Abstract call/return correlations Obtain a conservative solution Use chaotic iterations Usually fast

Simple Example int p(int a) { return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example int p(int a) { [a  7] return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example int p(int a) { [a  7] return a + 1; [a  7, $$  8] } void main() { int x ; x = p(7); x = p(9) ; }

Simple Example int p(int a) { [a  7] return a + 1; [a  7, $$  8] } void main() { int x ; x = p(7); [x  8] x = p(9) ; [x  8] }

Simple Example int p(int a) { [a  7] return a + 1; [a  7, $$  8] } void main() { int x ; x =l p(7); [x  8] x = p(9) ; [x  8] }

Simple Example int p(int a) { [a  ] return a + 1; [a  7, $$  8] } void main() { int x ; x = p(7); [x  8] x = p(9) ; [x  8] }

Simple Example int p(int a) { [a  ] return a + 1; [a , $$  ] } void main() { int x ; x = p(7); [x  8] x = p(9); [x  8] }

Simple Example int p(int a) { [a  ] return a + 1; [a , $$  ] } void main() { int x ; x = p(7) ; [x  ] x = p(9) ; [x  ] }

The Call String Approach The data flow value is associated with sequences of calls (call string) Use Chaotic iterations To guarantee termination limit the size of call string (typically 1 or 2) –Represents tails of calls Abstract inline

Simple Example int p(int a) { return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

Simple Example int p(int a) { c1: [a  7] return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

Simple Example int p(int a) { c1: [a  7] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

Simple Example int p(int a) { c1: [a  7] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7);  : x  8 c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] c2:[a  9] return a + 1; c1:[a  7, $$  8] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] c2:[a  9] return a + 1; c1:[a  7, $$  8] c2:[a  9, $$  10] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ; }

Simple Example int p(int a) { c1:[a  7] c2:[a  9] return a + 1; c1:[a  7, $$  8] c2:[a  9, $$  10] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ;  : [x  10] }

Another Example int p(int a) { c1:[a  7] c2:[a  9] return c3: p1(a + 1); c1:[a  7, $$  ] c2:[a  9, $$  ] } void main() { int x ; c1: x = p(7);  : [x  8] c2: x = p(9) ;  : [x  10] } int p1(int b) { (c1|c2)c3:[b  ] return 2 * b; (c1|c2)c3:[b , $$  ] }

Recursion int p(int a) { c1: [a  7] if (…) { c1: [a  7] a = a -1 ; c1: [a  6] c2: p (a); a = a + 1;{ x = -2*a + 5; } void main() { c1: x = p(7); }

Recursion int p(int a) { c1: [a  7] c1.c2: [a  6] if (…) { c1: [a  7] a = a -1 ; c1: [a  6] c2: p (a); a = a + 1;} x = -2*a + 5; } void main() { c1: x = p(7); }

Recursion int p(int a) { c1: [a  7] c1.c2: [a  6] if (…) { c1: [a  7] c1.c2: [a  6] a = a -1 ; c1: [a  6] c2: p (a); a = a + 1;} x = -2*a + 5; } void main() { c1: x = p(7); }

Recursion int p(int a) { c1: [a  7] c1.c2: [a  6] if (…) { c1: [a  7] c1.c2: [a  6] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2: p (a); a = a + 1;} x = -2*a + 5; } void main() { c1: x = p(7); }

Recursion int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2: [a  6] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2: p (a); a = a + 1;} x = -2*a + 5; } void main() { c1: x = p(7); }

Recursion int p(int a) { c1: [a  7] c1.c2: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2: p (a); a = a + 1;} x = -2*a + 5; } void main() { c1: x = p(7); }

Recursion int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); a = a + 1; x = -2*a + 5; } void main() { c1: x = p(7); }

int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); a = a + 1;} c1: [a  7] c1.c2+: [a   ] x = -2*a + 5; c1: [a  7, x  -9] c1.c2+: [a  , x  ] } void main() { c1: x = p(7); }

int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); c1.c2+: [a  , x  ] a = a + 1; } c1: [a  7] c1.c2+: [a   ] x = -2*a + 5; c1: [a  7, x  -9] c1.c2+: [a  , x  ] } void main() { c1: x = p(7); }

int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); c1.c2+: [a  , x  ] a = a + 1; c1.c2+: [a  , x  ] } c1: [a  7] c1.c2+: [a   ] x = -2*a + 5; c1: [a  7, x  -9] c1.c2+: [a  , x  ] } void main() { c1: x = p(7); }

int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); c1: [a  , x  ] a = a + 1; c1.c2+: [a  , x  ] c1: [a  , x  ] } c1: [a  7] c1.c2+: [a   ] x = -2*a + 5; c1: [a  , x  ] c1.c2+: [a  , x  ] } void main() { c1: x = p(7); }

int p(int a) { c1: [a  7] c1.c2+: [a   ] if (…) { c1: [a  7] c1.c2+: [a   ] a = a -1 ; c1: [a  6] c1.c2+: [a   ] c2: p (a); c1.c2*: [a   ] a = a + 1; c1.c2*: [a   ] } c1.c2*: [a   ] x = -2*a + 5; c1.c2*: [a  , x  ] } void main() { c1: p(7);  : [x  ] }

Summary Call String Easy to implement Efficient for very small call strings Too expensive even for call strings of length 3 For finite domains can be precise even with recursion Limited precision Order of calls can be abstracted Related method: procedure cloning

Interprocedural analysis sRsR n1n1 n3n3 n2n2 f1f1 f3f3 f2f2 f1f1 eReR f3f3 sPsP n5n5 n4n4 f1f1 ePeP f5f5 P R Call Q f c2e f x2r Call node Return node Entry node Exit node sQsQ n7n7 n6n6 f1f1 eQeQ f6f6 Q Call Q Call node Return node f c2e f x2r f1f1 Supergraph

paths(n) the set of paths from s to n –( (s,n 1 ), (n 1,n 3 ), (n 3,n 1 ) ) Paths s n1n1 n3n3 n2n2 f1f1 f1f1 f3f3 f2f2 f1f1 e f3f3

Interprocedural Valid Paths () f1f1 f2f2 f k-1 fkfk f3f3 f4f4 f5f5 f k-2 f k-3 call q enter q exit q ret IVP: all paths with matching calls and returns –And prefixes

Interprocedural Valid Paths IVP set of paths –Start at program entry Only considers matching calls and returns –aka, valid Can be defined via context free grammar –matched ::= matched ( i matched ) i | ε –valid ::= valid ( i matched | matched paths can be defined by a regular expression

Join Over All Paths (JOP) start n i JOP[v] =  {[[e 1, e 2, …,e n ]](  ) | (e 1, …, e n )  paths(v)} JOP  LFP –Sometimes JOP = LFP precise up to “symbolic execution” Distributive problem  L  L

The Join-Over-Valid-Paths (JVP) vpaths(n) all valid paths from program start to n JVP[n] =  {[[e 1, e 2, …, e]](  ) (e 1, e 2, …, e)  vpaths(n)} JVP  JFP –In some cases the JVP can be computed –(Distributive problem)

The Functional Approach The meaning of a procedure is mapping from states into states The abstract meaning of a procedure is function from an abstract state to abstract states Relation between input and output In certain cases can compute JVP

The Functional Approach Two phase algorithm –Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values) –Compute the dataflow values at every point using the functional values

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; p (a); a = a + 1; { x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; p (a); a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; p (a); a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; [a  a 0, x  -2a 0 +7] { [a  a 0, x  x 0 ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; [a  a 0, x  -2a 0 +7] { [a  a 0, x  ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 1 int p(int a) { [a  a 0, x  x 0 ] if (…) { a = a -1 ; [a  a 0 -1, x  x 0 ] p (a); [a  a 0 -1, x  -2a 0 +7] a = a + 1; [a  a 0, x  -2a 0 +7] { [a  a 0, x  ] x = -2*a + 5; [a  a 0, x  -2a 0 + 5] } void main() { p(7); }

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; p (a); a = a + 1; { x = -2*a + 5; } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; p (a); a = a + 1; { [a  7, x  0] x = -2*a + 5; } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; p (a); a = a + 1; { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); a = a + 1; { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  0] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a  7, x  0] [a  6, x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a , x  0] if (…) { a = a -1 ; [a  6, x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a  6, x  -9] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a  7, x  -9] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a , x  ] { [a  7, x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a , x  ] { [a , x  ] x = -2*a + 5; [a  7, x  -9] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Phase 2 int p(int a) { [a , x  0] if (…) { a = a -1 ; [a , x  0] p (a); [a , x  ] a = a + 1; [a , x  ] { [a , x  ] x = -2*a + 5; [a , x  ] } void main() { p(7); [x  -9] } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5]

Summary Functional approach Computes procedure abstraction Sharing between different contexts Rather precise Recursive procedures may be more precise/efficient than loops But requires more from the implementation –Representing relations –Composing relations

Issues in Functional Approach How to guarantee that finite height for functional lattice? –It may happen that L has finite height and yet the lattice of monotonic function from L to L do not Efficiently represent functions –Functional join –Functional composition –Testing equality

A Complicated Example main() { a = 0; p(); print a; } p() { if (…) { a= a sgn(a -100); p(); a – 1 ; }

CFL-Graph reachability Special cases of functional analysis Finite distributive lattices Provides more efficient analysis algorithms Reduce the interprocedural analysis problem to finding context free reachability

IDFS / IDE IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Reps, Horowitz, and Sagiv, FASE’95, TCS’96 –More general solutions exist

Possibly Uninitialized Variables Startx = 3 if... y = x y = w w = 8 printf(y) {w,x,y} {w,y} {w} {w,y} {} {w,y} {}

Efficiently Representing Functions Let f:2 D  2 D be a distributive function Then: f(X) = f(  )   {z  X: f({z}) }

Representing Dataflow Functions Identity Function Constant Function a bc a bc

“Gen/Kill” Function Non-“Gen/Kill” Function a bc a bc Representing Dataflow Functions

x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b

a bcbc a Composing Dataflow Functions bc a

x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) Might b be uninitialized here? printf(b) NO! ( ] Might y be uninitialized here? YES! ( )

Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) –unbalLeft = valid Fact d holds at n iff there is an L(unbalLeft)- path from

Asymptotic Running Time CFL-reachability –Exploded control-flow graph: ND nodes –Running time: O(N 3 D 3 ) Exploded control-flow graph Special structure Running time: O(ED 3 ) Typically: E l N, hence O(ED 3 ) l O(ND 3 ) “Gen/kill” problems: O(ED)

a bcbc a Dataflow Flow Functions bc a

x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) Might b be uninitialized here? printf(b) NO! ( ] Might y be uninitialized here? YES! ( )

IDE Goes beyond IFDS problems –Can handle unbounded domains Requires special form of the domain Can be much more efficient than IFDS

Domain of environments Environments: Env(D,L) = D  L –D finite set of program symbols –L finite height lattice E.g., map from variables (D) to values (L) e  e’ =  d.(e(d)  e’(d) ) Top in Env(D,L) is  d. 

Environment transformers t: Env(D,L)  Env(D,L) –t should be distributive  d  D: (  {t(e)|e  E})(d) = (t(  {(e) | e  E}))(d) Must hold for inifinite sets E  Env(D,L)

IDE Problem IP=(G*,D,L,M) –G* supergraph –D, L as above –M assigns distributive transformers to E*

Point-wise representation of environments IFDS with “weights” Bi-partite graph –D  {  }  D  {  } –R t (d’,d)=  l. … t (  d. ) macro-function R t (d’,d) (  l.) micro-function a x  l.l - 2  l.l x = a - 2 [[R t ]](env)(d)= f( … R t (, ) …)(d) t = [[R t ]]

Point-wise representation of environments  = top  = bottom  = join

Point-wise representation of environments [[R t ]](env)(d)=R t ( ,d)(  )  (  d’  D R t (d’,d)(env(d’))) t = [[R t ]]  = bottom  = top

Example Linear Constant Propagation Consider the constant propagation lattice The value of every variable y at the program exit can be represented by: y =  {(a x x + b x )| x  Var * }  c a x,c  Z  { ,  } b x  Z Supports efficient composition and “functional” join –[z := a * y + b] –What about [z:=x+y]?

Linear constant propagation Point-wise representation of environment transformers

IDE Analysis Point-wise representation closed under composition CFL-Reachability on the exploded graph Compose functions

Costs O(ED 3 ) Class of value transformers F  L  L –id  F –Finite height Representation scheme with (efficient) Application Composition Join Equality Storage

Conclusion Handling functions is crucial for abstract interpretation Virtual functions and exceptions complicate things But scalability is an issue –Small call strings –Small functional domains –Demand analysis