CMPUT 680 - Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic P: Reference Analysis José Nelson Amaral

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
CSC 4181 Compiler Construction Code Generation & Optimization.
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Introduction to Algorithms Quicksort
Topic G: Static Single-Assignment Form José Nelson Amaral
CMPUT Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic J: Wavefront Scheduling José Nelson Amaral
CMPUT Compiler Design and Optimization1 CMPUT680 - Fall 2003 Topic B: Open Research Compiler José Nelson Amaral
CMPUT Compiler Design and Optimization1 Borrowed from J. N. Amaral, slightly modified LIVE-IN: k j.
F28DM Database Management Systems Query Optimisation
Pointer Analysis B. Steensgaard: Points-to Analysis in Almost Linear Time. POPL 1996 M. Hind: Pointer analysis: haven't we solved this problem yet? PASTE.
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
Interprocedural Analysis. Currently, we only perform data-flow analysis on procedures one at a time. Such analyses are called intraprocedural analyses.
CMPUT Compiler Design and Optimization
Scalable Points-to Analysis. Rupesh Nasre. Advisor: Prof. R. Govindarajan. Comprehensive Examination. Jun 22, 2009.
SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Intermediate Code Generation
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Pointer Analysis.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Programming Languages and Paradigms
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
CSI 3120, Implementing subprograms, page 1 Implementing subprograms The environment in block-structured languages The structure of the activation stack.
Analysis of programs with pointers. Simple example What are the dependences in this program? Problem: just looking at variable names will not give you.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Pointer Analysis.
1 CS 201 Compiler Construction Lecture Interprocedural Data Flow Analysis.
1 Introduction to Data Flow Analysis. 2 Data Flow Analysis Construct representations for the structure of flow-of-data of programs based on the structure.
Names and Bindings.
(1) ICS 313: Programming Language Theory Chapter 10: Implementing Subprograms.
Chapter 9 Subprogram Control Consider program as a tree- –Each parent calls (transfers control to) child –Parent resumes when child completes –Copy rule.
Parameterized Object Sensitivity for Points-to Analysis for Java Presented By: - Anand Bahety Dan Bucatanschi.
Context-Sensitive Inter-procedural Points-to Analysis in the Presence of Function Pointers Aurangzeb.
Interprocedural analysis © Marcelo d’Amorim 2010.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
CMPUT680 - Fall 2006 Topic A: Data Dependence in Loops José Nelson Amaral
Interprocedural pointer analysis for C We’ll look at Wilson & Lam PLDI 95, and focus on two problems solved by this paper: –how to represent pointer information.
X := 11; if (x == 11) { DoSomething(); } else { DoSomethingElse(); x := x + 1; } y := x; // value of y? Phase ordering problem Optimizations can interact.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Run time vs. Compile time
Intraprocedural Points-to Analysis Flow functions:
Data Flow Analysis Compiler Design Nov. 8, 2005.
Swerve: Semester in Review. Topics  Symbolic pointer analysis  Model checking –C programs –Abstract counterexamples  Symbolic simulation and execution.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Putting Pointer Analysis to Work Rakesh Ghiya and Laurie J. Hendren Presented by Shey Liggett & Jason Bartkowiak.
Compiler Construction
Fast Points-to Analysis for Languages with Structured Types Michael Jung and Sorin A. Huss Integrated Circuits and Systems Lab. Department of Computer.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Detecting Equality of Variables in Programs Bowen Alpern, Mark N. Wegman, F. Kenneth Zadeck Presented by: Abdulrahman Mahmoud.
ESEC/FSE-99 1 Data-Flow Analysis of Program Fragments Atanas Rountev 1 Barbara G. Ryder 1 William Landi 2 1 Department of Computer Science, Rutgers University.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Points-To Analysis in Almost Linear Time Josh Bauman Jason Bartkowiak CSCI 3294 OCTOBER 9, 2001.
Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.
CS 598 Scripting Languages Design and Implementation 9. Constant propagation and Type Inference.
Constructs for Data Organization and Program Control, Scope, Binding, and Parameter Passing. Expression Evaluation.
Run-Time Environments Chapter 7
Names and Attributes Names are a key programming language feature
Organization of Programming Languages
Pointer analysis.
UNIT V Run Time Environments.
Presentation transcript:

CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic P: Reference Analysis José Nelson Amaral

CMPUT Compiler Design and Optimization2 References zRyder, Barbara G., “Dimensions of Precision in Reference Analysis of Object-Oriented Programming Languages,” Compiler Construction, pp , Warsaw, Poland, April, zShapiro, Marc and Horwitz, Susan, “Fast and Accurate Flow- Insensitive Points-To Analysis,” Symposium on Principles of Programming Languages, pp. 1-14, Paris, France, zEmami, Maryam, Ghiya, Rakesh, and Hendren, Laurie J., “Context- Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers,” Programming Language Design & Implementation, pp , Orlando, FL, zSteensgaard, Bjarne, “Points-to Analysis in Almost Linear Time,” Symposium on Principles of Programming Languages, pp , zLandi, William, and Ryder, Barbara G., “A Safe Approximate Algorithm for Interprocedural Pointer Aliasing, Programming Language Design & Implementation, pp , 1992.

CMPUT Compiler Design and Optimization3 Motivation … [1] x=0; [2] *p = 1; [3] write(x); … Example of Optimization Problems: (1)Draw a Data Dependence Graph for statements 1-3; (2)Does definition [1] reaches statement [3]? (3)Can the constant 0 be propagated to statement [3]? (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization4 Motivation … [1] x=0; [2] *p = 1; [3] write(x); … There are three situations to consider: p must point to x  x=1 in [3]. p must not point to x  x=0 in [3]. p may point to x  the compiler does not know the value of x in [3]. (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization5 Flow-sensitivity zFlow-sensitive analysis: takes into account the order in which statements are executed; zFlow-insensitive analysis: assumes that statements can be executed in any order; (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization6 Context-sensitivity zContext-sensitive analysis: takes into account the fact that a function must return to the site of the most recent call; zContext-insensitive analysis:propagates information from a call site, through the called function, and back to all call sites. zA context-insensitive analysis constructs a single approximation to a procedure’s effect on all of its callers (Ruf, PLDI95). (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization7 Andersen’s X Steensgaard’s Analysis zBoth analysis are flow-insensitive and context- insensitive. zBoth build an alias graph (Burke, Carini, Choi, Hind, LCPC94) or a storage shape graph (Chase, Wegman, Zadeck, PLDI90). zAndersen: each node can have an arbitrary number of out-edges  each node represents one variable. zSteensgaard: each node has at most one out- edge  each node may represent more than one variable. (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization8 Andersen’s X Steensgaard’s (Example) a = &b; Program: Steensgaard: Andersen: S = {(a,b)} a b b a (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization9 Andersen’s X Steensgaard’s (Example) a = &b; b = &c; Program: Steensgaard: Andersen: S = {(a,b); (b,c)} c a b c b a (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization10 Andersen’s X Steensgaard’s (Example) a = &b; b = &c; a = &d; Program: Steensgaard: Andersen: S = {(a,b); (b,c)} c a b c b a (Shapiro/Horwitz, PPL97) What should happen in each analysis?

CMPUT Compiler Design and Optimization11 Andersen’s X Steensgaard’s (Example) a = &b; b = &c; a = &d; Program: Steensgaard: Andersen: S = {(a,b); (b,c); (a,d); (d,c)} S = {(a,b); (b,c); (a,d)} c a b d c (b,d) a (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization12 Andersen’s X Steensgaard’s (Example) a = &b; b = &c; a = &d; d = &e; Program: Steensgaard: Andersen: S = {(a,b); (b,c); (a,d); (d,c)} S = {(a,b); (b,c); (a,d)} c a b d c (b,d) a (Shapiro/Horwitz, PPL97) And now?

CMPUT Compiler Design and Optimization13 Andersen’s X Steensgaard’s (Example) a = &b; b = &c; a = &d; d = &e; Program: Steensgaard: Andersen: S = {(a,b); (b,c); (a,d); (d,c); (d,e); (b,e)} S = {(a,b); (b,c); (a,d); (d,e)} c a b d e (c,e) (b,d) a (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization14 Steensgaard’s Algorithm zBased on non-standard type-system. The “type” of a variable describes a set of locations possibly pointed-to by the variable. zAt initialization, each variable is described by a different type. zFast union-find structures are used to provide constant-time access to the type associated with a variable name. zProcess each statement exactly once, joining type variables as necessary to ensure that the program is well-typed. (Steensgaard, PPL96)

CMPUT Compiler Design and Optimization15 Comparison of Steensgaard and Andersen zFor small programs (up to 3,000 lines) both analyses are very fast. zFor some large programs, Andersen’s may take a very long time (from more than 10 to more than 100 times as long as Steensgaard’s). zFor 37 out of 61 programs (21 out of 25 “large” programs), Andersen’s points-to-set is less than half the size of Steengaard’s (thus Andersen’s is more precise). (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization16 Shapiro/Horwitz Algorithm 1 zAllows each node in the alias graph to have out- degree k (k is an input to the algorithm). zEach variable is assigned to one of k categories. Only merge nodes of variables in the same category. zPartition of variables into categories is an input for the algorithm: yAll variables in one category  Steensgaard yEach variable in a separate category  Andersen (Shapiro/Horwitz, PPL97)

Shapiro/Horwitz Algorithm 1 (Example) a = &b; a = &c; a = &d c = &d Program: Categories: {a, b, c, d} Categories: {a, b} {c, d} Categories: {a, c}, {b, d} Categories: {a, b}, {c}, {d} (b,c,d) a S = {(a, b); (a,c); (a,d); (b,b); (b,c); (b,d); (c,b); (c,c); (c,d); (d,b); (d,c), (d,d)} S = {(a, b); (a,c); (a,d); (c,c); (c,d); (d,c), (d,d)} b a (c,d) (b,d) a c S = {(a, b); (a,c); (a,d); (c,b); (c,d)} b a c d S = {(a, b); (a,c); (a,d); (c,d)} (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization18 Shapiro/Horwitz Algorithm 2 zInsight: Run algorithm 1 multiple times with k categories and a different category partition each time: yTrue points-to relationships are found by taking the intersection of the points-to sets computed by each run. zGoal: Select a set of runs such that for every pair of variables (x, y), there is at least one run in which x and y are in different categories. (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization19 Shapiro/Horwitz Algorithm 2 zSolution: R =  log k N  runs, where N is the number of variables. zAssign each variable a unique number, in base k, in the range 0 to N-1 using R digits. zUse this encoding to select the variables categories. (Shapiro/Horwitz, PPL97)

CMPUT Compiler Design and Optimization20 Shapiro/Horwitz Algorithm 2 (Example) (Shapiro/Horwitz, PPL97) a = &b; a = &c; a = &d c = &d Program: The program has four variables, for k=2: a 00 b 01 c 10 d 11 Categories for run 1: {a,c}, {b,d} Categories for run 2: {a,b}, {c,d} S1 = {(a, b); (a,c); (a,d); (c,b); (c,d)} Run 1: S1  S2 = {(a, b); (a,c); (a,d); (c,d)} Run 2: S2 = {(a, b); (a,c); (a,d); (c,c); (c,d); (d,c), (d,d)}

CMPUT Compiler Design and Optimization21 May X (Possible or Definite) Points-to Relations zPossible and definite points-to information can be important:  p = x (Emami, Ghiya, Hendren, PLDI 94) Then all previous point-to relations from y are now killed. x p y w z After the statement If p definitely points to y at this point. p y w z Before the statement

CMPUT Compiler Design and Optimization22 May X (Possible or Definite) Points-to Relations zPossible and definite points-to information can be important: x =  q If q definitely points to y at this point. Then the statement can be replaced by x=y. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization23 Stack-based and Heap- based aliasing zThree varieties of aliases: yAliases between variable references to the stack; yAliases between references to the heap; yAliases between two references to the same array. zAnalysis of stack-based aliases and heap- based aliases should be decoupled. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization24 Stack-based and Heap- based aliasing zStack-base analysis: yA name exist for each location of interest; yCompute an approximation of the relationship between these locations; zHeap-base analysis: yThere are no natural names for locations; yWe don’t know how many locations will exist; zSolution: consider the entire heap a single location in the stack analysis. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization25 The Reference Analysis Problem Statement zThe reference analysis problem is a generalization of alias analysis: yGiven two references p and q, is it possible that p and q may: xpoint to the same memory location? xrefer to the same object? xdispatch to the same method?

CMPUT Compiler Design and Optimization26 Alias Pairs zSets of alias pairs (Landi & Ryder PLDI93): two variable references may be aliased if they may refer to the same memory location. q After statement p=q, the following alias pairs are created:  p,  q ,  p->next, q->next ,  p->next->next, q->next->next , …

CMPUT Compiler Design and Optimization27 Points-to Abstraction zCreate an abstract representation of the stack: xEach real stack location that is involved in a points-to relationship is represented by exactly one named abstract location; xEach named abstract location represents one or more real stack locations. (Emami, Ghiya, Hendren, PLDI 94) y x loc i loc j Abstract Locations Real Locations loc k

CMPUT Compiler Design and Optimization28 Abstract Stack Location zAn abstract stack location corresponds to one of this: yThe name of a local variable, global variable, or parameter. yA symbolic name that represents a location not in the scope of the procedure under analysis. yThe symbolic name heap. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization29 Definitely Points-to zAn abstract location x definitely points to abstract location y, noted (x, y, D), for a given invocation context, if: yx and y each represent exactly one real location in that context; and ythe real location corresponding to x contains the address of the real location y. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization30 Possibly Points-to zAn abstract location x possibly points to abstract location y, noted (x, y, P), for a given invocation context, if: yIt is possible that one of the real locations corresponding to x contains the address of one of the real locations represented by y. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization31 Safe Approximation zLet S be the points-to set at point p. zConsider all pairs of real locations loc i and loc j. Let x be the abstract location of loc i and y be the abstract location of loc j. y x loc i loc j Abstract Locations Real Locations (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization32 Safe Approximation zS is a safe approximation at p if: yS contains (x,y,D) or (x,y,P) when loc i points to loc j on all valid execution paths to p. yS contains (x,y,P) when loc i points to loc j in some, but not all, execution paths to p. yIf S contains (x,y,D) then loc i must point to loc j in all paths to p. y x loc i loc j Abstract Locations Real Locations ? (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization33 Safety and Precision zAn approximation is unsafe if: yA real points-to relationship is not in S; yA spurious definite points-to relation is in S; zThe following approximation is safe, just not very precise: yEvery abstract location possibly points-to every other abstract location. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization34 L-locations and R-locations zProblem: Given the input points-to set S in of an statement S, generate its output points-to set S out. zSolution: yCompute an abstract representation of the locations represented in the left-hand side (L-locations) and in the right-hand side (R-locations) of S. yNotation: x(x,D): abstract location x is definitely in the set x(x,P): abstract location x is possibly in the set

CMPUT Compiler Design and Optimization35 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = ? (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(x,d) | (q,x,d)  S} L-locations(*q) = ? q y w z Before the statement v R-locations(w) = {(x,d) | (w,x,d)  S} R-locations(w) = ?

CMPUT Compiler Design and Optimization36 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = ? (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(x,d) | (q,x,d)  S} R-locations(w) = {(x,d) | (w,x,d)  S} L-locations(*q) = {(y,D)} R-locations(w) = {(v,D)} q y w z Before the statement v S change = {(p,x,D) | (p,P)  L-locations(*q)  (p,x,D)  S input } = ? S gen = {(p,x,d1  d2) | (p,d1)  L-locations(*q)  (x,d2)  R-locations(w)} = ? S kill = {(p,x,d) | (p,D)  L-locations(*q)  (p,x,d)  S input } = ? The clause in green above is not in the original paper, but it is necessary.

CMPUT Compiler Design and Optimization37 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = ? (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(x,d) | (q,x,d)  S} R-locations(w) = {(x,d) | (w,x,d)  S} L-locations(*q) = {(y,D)} R-locations(w) = {(v,D)} q y w z Before the statement v S kill = {(p,x,d) | (p,D)  L-locations(*q)  (p,x,d)  S input } = {(y,w,P),(y,z,P)} S change = {(p,x,D) | (p,P)  L-locations(*q)  (p,x,D)  S input } = { } S gen = {(p,x,d1  d2) | (p,d1)  L-locations(*q)  (x,d2)  R-locations(w)} = {(y,v,D)} The clause in green above is not in the original paper, but it is necessary.

CMPUT Compiler Design and Optimization38 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = ? (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(y,D)} R-locations(w) = {(v,D)} q y w z Before the statement v S kill = {(y,w,P),(y,z,P)} S change = { } S gen = {(y,v,D)} S inputchanged = (S input - S change )  {(p,x,D)  S change } =

CMPUT Compiler Design and Optimization39 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = ? (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(y,D)} R-locations(w) = {(v,D)} q y w z Before the statement v S kill = {(y,w,P),(y,z,P)} S change = { } S gen = {(y,v,D)} S inputchanged = (S input - S change )  {(p,x,D)  S change } = S input

CMPUT Compiler Design and Optimization40 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = ? (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(y,D)} R-locations(w) = {(v,D)} q y w z Before the statement v S kill = {(y,w,P),(y,z,P)} S change = { } S gen = {(y,v,D)} S inputchanged = (S input - S change )  {(p,x,D)  S change } = S input S output = (S inputchanged - S kill )  S gen = ?

CMPUT Compiler Design and Optimization41 L-locations and R-locations (example)  q = w S input = {(q,y,D); (y,w,P); (y,z,P); (w,v,D)} S output = {(q,y,D); (w,v,D); (y,v,D)} (Emami, Ghiya, Hendren, PLDI 94) L-locations(*q) = {(y,D)} R-locations(w) = {(v,D)} q y w z Before the statement v S kill = {(y,w,P),(y,z,P)} S change = { } S gen = {(y,v,D)} S inputchanged = S input S output = (S inputchanged - S kill )  S gen = ? q y w z After the statement v

CMPUT Compiler Design and Optimization42 Invocation Graph zPrograms without recursion: Invocation graph built with a depth-first traversal of the call structure, starting with main. zPrograms with recursion: Invocation structure not known at compile time. Invocation graph approximates all unrollings of recursions. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization43 Invocation Graph (examples) main() { … g(); } g() { … f(); … } main g f g f main() { … f(); } f() { g(); if (y) f(); } g() { if (e) f(); } f-R g f-A main f-A: An approximate node where a stored approximation for the function should be used (instead of an evaluation of the function call). f-R: A recursive node where a fixed-point computation must be performed. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization44 Advantages of using an Invocation Graph zSeparates inter-procedural analysis from calling contexts. zCreates places to deposit context- sensitive information for subsequent analysis. zCreates places to store IN/OUT pairs summarizing the effects of a function call. zAllows simple compositional fixed-point computations for recursions. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization45 Inter-procedural Context- sensitivy analysis f() { … g(a); } g(x) { … } Caller Callee Map Process Unmap Process Function Analysis Special care in the Map process: Formal parameters and global variables that are multi- level pointers. Invisible variables: formals and globals that point to variables outside of the scope of the callee. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization46 Inter-procedural Context- sensitivy analysis f() { … g(a); } g(x) { … } Caller Callee Map Process Unmap Process Function Analysis Solutions to the Map process: Multi-level pointers: apply the mapping process recursively to all levels of pointer type. Invisible variables: generate special symbolic names to represent each level of indirection of pointer variables (see paper for details). (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization47 Approximate and Recursive Nodes zA recursive node f-R stores an input, an output, and a list of pending inputs. yThe input and output pairs approximate the effect of the call to f. yThe fixed-point computation generalizes the stored input and output until it summarizes all invocations of f. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization48 Function Pointer zWhen a function is called through a function pointer a set of functions may be called. zSome safe approximations for this set are: yAll the functions in the program. yAll functions which have had their addresses taken. yThe set of functions that the pointer can point to at the program point where the call is. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization49 A cyclic dependence zTo obtain the points-to set for the function pointer, we need to perform points-to analysis. zBut points-to analysis needs the invocation graph of the program because it is context sensitive and inter- procedural. zThe solution is to construct the invocation graph while performing points-to analysis. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization50 Handling Function Pointers 1.Build the invocation graph, leaving it incomplete where a function pointer call is encountered. 2.Perform points-to analysis using the incomplete invocation graph. 3.When an indirect call through a function pointer is encountered, find the set P of all functions it can point to according to current information. 4.Update the invocation graph to indicate that the indirect call may call any function in P. 5.Analyze each function f  P in the context of the call --- while analyzing f assume that the function pointer definitely points to f. 6.Merge the output points-to sets of all functions in P. (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization51 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } S = ? (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization52 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foobar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = ? fp foo bar pc c (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization53 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foobar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo bar pc c (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization54 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foo-Rbar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo-A S = ? fp foo bar pc c (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization55 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foo-Rbar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo-A S = {(fp,foo,D); (pc,c,D); (pa,a,D)} S = ? fp foo bar pc c pa a (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization56 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foo-Rbar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo-A S = {(fp,foo,D); (pc,c,D); (pa,a,D)} S = {(fp,bar,D); (pc,c,D)} S = ? fp foo bar pc c pa a (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization57 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foo-Rbar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo-A S = {(fp,foo,D); (pc,c,D); (pa,a,D)} S = {(fp,bar,D); (pc,c,D)} S = {(fp,bar,D); (pc,c,D); (pb,b,D)} fp foo bar pc c pa a pb b (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization58 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foo-Rbar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo-A S = {(fp,foo,D); (pc,c,D); (pa,a,D)} S = {(fp,bar,D); (pc,c,D)} S = {(fp,bar,D); (pc,c,D); (pb,b,D)} fp foo bar pc c pa a pb b S = ? (Emami, Ghiya, Hendren, PLDI 94)

CMPUT Compiler Design and Optimization59 Function Pointers (Example) int a, b, c; int *pa, *pb, *pc; int (*fp)(); main() { … pc = &c; if (cond) fp = foo; else fp = bar; fp(); } foo() { … pa = &a; if (cond) fp(); } bar() {... pb = &b; } main fp foo-Rbar S = {(fp,foo,P); (fp,bar,P); (pc,c,D)} S = {(fp,foo,D); (pc,c,D)} fp foo-A S = {(fp,foo,D); (pc,c,D); (pa,a,D)} S = {(fp,bar,D); (pc,c,D)} S = {(fp,bar,D); (pc,c,D); (pb,b,D)} fp foo bar pc c pa a pb b S = {(fp,foo,P); (fp,bar,P); (pc,c,D); (pa,a,P); (pb,b,P)} (Emami, Ghiya, Hendren, PLDI 94)