# Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India.

## Presentation on theme: "Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India."— Presentation transcript:

Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India

Andersen’s Analysis A flow-insensitive analysis –computes a single points-to solution valid at all program points –ignores control-flow – treats program as a set of statements –equivalent to merging all vertices into one (and applying algorithm A) –equivalent to adding an edge between every pair of vertices (and applying algo. A) –a solution R such that R  IdealMayPT(u) for every vertex u

Example (Flow-Sensitive Analysis) x = &a; y = x; x = &b; z = x; 1 2 3 x = &a y = x 4 5 x = &b z = x

Example: Andersen’s Analysis x = &a; y = x; x = &b; z = x; 1 2 3 x = &a y = x 4 5 x = &b z = x

Andersen’s Analysis Strong updates? Initial state?

Why Flow-Insensitive Analysis? Reduced space requirements –a single points-to solution Reduced time complexity –no copying individual updates more efficient –no need for joins –number of iterations? –a cubic-time algorithm Scales to millions of lines of code –most popular points-to analysis

Andersen’s Analysis A Set-Constraints Formulation Compute PT x for every variable x StatementConstraint x = null x = &y x = y x = *y *x = y

Steensgaard’s Analysis Unification-based analysis Inspired by type inference –an assignment “lhs := rhs” is interpreted as a constraint that lhs and rhs have the same type –the type of a pointer variable is the set of variables it can point-to “Assignment-direction-insensitive” –treats “lhs := rhs” as if it were both “lhs := rhs” and “rhs := lhs” An almost-linear time algorithm –single-pass algorithm; no iteration required

Example: Andersen’s Analysis x = &a; y = x; y = &b; b = &c; 1 2 3 x = &a y = x 4 5 y = &b b = &c

Example: Steensgaard’s Analysis x = &a; y = x; y = &b; b = &c; 1 2 3 x = &a y = x 4 5 y = &b b = &c

Steensgaard’s Analysis Can be implemented using Union-Find data-structure Leads to an almost-linear time algorithm

Exercise x = &a; y = x; y = &b; b = &c; *x = &d;

May-Point-To Analyses Ideal-May-Point-To Algorithm A Andersen’s Steensgaard’s more efficient / less precise ??? more efficient / less precise

Ideal Points-To Analysis: Definition Recap A sequence of states s 1 s 2 … s n is said to be an execution (of the program) iff – s 1 is the Initial-State –s i | s i+1 for 1 <= I < n A state s is said to be a reachable state iff there exists some execution s 1 s 2 … s n is such that s n = s. RS(u) = { s | (u,s) is reachable } IdealMayPT (u) = { (p,x) |  s  RS(u). s(p) == x } IdealMustPT (u) = { (p,x) |  s  RS(u). s(p) == x }

Does Algorithm A Compute The Most Precise Solution?

Ideal Algorithm A Abstract away correlations between variables –relational analysis vs. –independent attribute x: &by: &x x: &yy: &z x: {&y,&b}y: {&x,&z}   x: &yy: &x x: &by: &z x: &yy: &z x: &by: &x

Does Algorithm A Compute The Most Precise Solution?

Is The Precise Solution Computable? Claim: The set RS(u) of reachable concrete states (for our language) is computable. Note: This is true for any collecting semantics with a finite state space.

Precise Points-To Analysis: Decidability Corollary: Precise may-point-to analysis is computable. Corollary: Precise (demand) may-alias analysis is computable. –Given ptr-exp1, ptr-exp2, and a program point u, identify if there exists some reachable state at u where ptr-exp1 and ptr-exp2 are aliases. Ditto for must-point-to and must-alias … for our restricted language!

Precise Points-To Analysis: Computational Complexity What’s the complexity of the least-fixed point computation using the collecting semantics? The worst-case complexity of computing reachable states is exponential in the number of variables. –Can we do better? Theorem: Computing precise may-point-to is PSPACE-hard even if we have only two-level pointers.

May-Point-To Analyses Ideal-May-Point-To Algorithm A Andersen’s Steensgaard’s more efficient / less precise

Precise Points-To Analysis: Caveats Theorem: Precise may-alias analysis is undecidable in the presence of dynamic memory allocation. –Add “x = new/malloc ()” to language –State-space becomes infinite Digression: Integer variables + conditional- branching also makes any precise analysis undecidable.

May-Point-To Analyses Ideal (no Int, no Malloc) Algorithm A Andersen’s Steensgaard’s Ideal (with Int, with Malloc) Ideal (with Int) Ideal (with Malloc)

Dynamic Memory Allocation s: x = new () / malloc () Assume, for now, that allocated object stores one pointer –s: x = malloc ( sizeof(void*) ) Introduce a pseudo-variable V s to represent objects allocated at statement s, and use previous algorithm –treat s as if it were “x = &V s ” –also track possible values of V s –allocation-site based approach Key aspect: V s represents a set of objects (locations), not a single object –referred to as a summary object (node)

Dynamic Memory Allocation: Example x = new; y = x; *y = &b; *y = &a; 1 2 3 x = new y = x 4 5 *y = &b *y = &a

Dynamic Memory Allocation: Object Fields Field-sensitive analysis class Foo { A* f; B* g; } s: x = new Foo() x->f = &b; x->g = &a;

Dynamic Memory Allocation: Object Fields Field-insensitive analysis class Foo { A* f; B* g; } s: x = new Foo() x->f = &b; x->g = &a;

Other Aspects Context-sensitivity Indirect (virtual) function calls and call- graph construction Pointer arithmetic Object-sensitivity

Andersen’s Analysis: Further Optimizations and Extensions Fahndrich et al., Partial online cycle elimination in inclusion constraint graphs, PLDI 1998. Rountev and Chandra, Offline variable substitution for scaling points-to analysis, 2000. Heintze and Tardieu, Ultra-fast aliasing analysis using CLA: a million lines of C code in a second, PLDI 2001. M. Hind, Pointer analysis: Haven’t we solved this problem yet?, PASTE 2001. Hardekopf and Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007. Hardekopf and Lin, Exploiting pointer and location equivalence to optimize pointer analysis, SAS 2007. Hardekopf and Lin, Semi-sparse flow-sensitive pointer analysis, POPL 2009.

Context-Sensitivity Etc. Liang & Harrold, Efficient computation of parameterized pointer information for interprocedural analyses. SAS 2001. Lattner et al., Making context-sensitive points-to analysis with heap cloning practical for the real world, PLDI 2007. Zhu & Calman, Symbolic pointer analysis revisited. PLDI 2004. Whaley & Lam, Cloning-based context-sensitive pointer alias analysis using BDD, PLDI 2004. Rountev et al. Points-to analysis for Java using annotated constraints. OOPSLA 2001. Milanova et al. Parameterized object sensitivity for points-to and side-effect analyses for Java. ISSTA 2002.

Applications Compiler optimizations Verification & Bug Finding –use in preliminary phases –use in verification itself

Dynamic Memory Allocation: Summary Object Update 4 5 *y = &a

Abstract Transformers: Weak/Strong Update AS[stmt] : AbsDataState -> AbsDataState AS[ *x = y ] s =

Correctness & Precision How can we formally reason about the correctness & precision of abstract transformers? Can we systematically derive a correct abstract transformer?

Enter: The French Recipe (Abstract Interpretation) 2 Data-State 2 Var x Var’ Concrete Domain Concrete states: C Semantics: For every statement st, CS[st] : C -> C  

Points-To Analysis (Abstract Interpretation)  (Y) = { (p,x) | exists s in Y. s(p) == x } RS(u) 2 Data-State 2 Var x Var’ IdealMayPT(u) MayPT(u)    IdealMayPT (u) =  ( RS(u) )

Approximating Transformers: Correctness Criterion CA correctly approximated by c1 c2 f a1 a2 f#f# correctly approximated by c is said to be correctly approximated by a iff  (c)   a

Approximating Transformers: Correctness Criterion CA c1 c2 f a1 a2 f#f# concretization  abstraction  requirement: f # (a1) ≥  (f(  (a1))

Concrete Transformers CS[stmt] : Data-State -> Data-State CS[ x = y ] s = s[x  s(y)] CS[ x = *y ] s = s[x  s(s(y))] CS[ *x = y ] s = s[s(x)  s(y)] CS[ x = null ] s = s[x  null] CS*[stmt] : 2 Data-State -> 2 Data-State CS*[st] X = { CS[st]s | s  X }

Abstract Transformers AS[stmt] : AbsDataState -> AbsDataState AS[ x = y ] s = s[x  s(y)] AS[ x = null ] s = s[x  {null}] AS[ x = *y ] s = s[x  s*(s(y))] where s*({v 1,…,v n }) = s(v 1 )  …  s(v n ) AS[ *x = y ] s = ???

Algorithm A: Tranformers Weak/Strong Update x: {&y}y: {&x,&z}z: {&a} x: &by: &xz: &a x: &yy: &zz: &b x: {&y,&b}y: {&x,&z}z: {&a,&b} x: &yy: &xz: &a x: &yy: &zz: &a *y = &b; f#f# f  

Algorithm A: Tranformers Weak/Strong Update x: {&y}y: {&x,&z}z: {&a} x: &yy: &bz: &a x: &yy: &bz: &a x: {&y}y: {&b}z: {&a} x: &yy: &xz: &a x: &yy: &zz: &a *x = &b; f#f# f  

Dynamic Memory Allocation: Summary Object Update 4 5 *y = &a

Download ppt "Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India."

Similar presentations