Presentation on theme: "Flow-Insensitive Points-to Analysis with Term and Set Constraints Presentation by Kaleem Travis Patrick."— Presentation transcript:
Flow-Insensitive Points-to Analysis with Term and Set Constraints Presentation by Kaleem Travis Patrick
Two methods: Andersen vs Steensgaard Foster claims these systems are nearly identical, and may actually be combined in their implementation.
Andersen: For an assignment e 1 = e 2 anything in the points-to set for e 2 must also be in the points-to set for e 1. Steensgaard: For an assignment e 1 = e 2 the points-to set for e 2 must be equal to the points-to set for e 1.
Foster’s Framework Foster's type systems are designed using Term and Set constraints: Set constraints define inclusion relationships between types; we use set constraints to describe Andersen's analysis. Term constraints define equality relationships between types; we use term equations to describe Steensgaard's analysis.
What’s so important about their similarity? The main difference between the Steensgaard and Andersen is Steensgaard uses term constraints as opposed set constraints.Term constraints describe equality. Set constraints describe inclusion By carefully defining our inference rules for both methods, the implementation is vastly simplified. This is because both methods will be combined into one set of inference rules. The difference in set constraints is minimal in the implementation.
Const - Int S: _ is a wildcard - a fresh, unconstrained variable Var S: variables are elevated to references for simplicity
Addr S: &e points to e Deref S: if e is a reference to then * e is of type
Asst S: unifies the equivalence classes for the points- to sets of e 1 and e 2 In other words, if e 1 is of type 1 and e 2 is of type 2 then e 1 = e 2 is of type 2 This is where Steensgaard uses his time-saving, conservative merging.
Const - Int A: assigns the empty set for integers. Foster uses 0 instead of “bottom” 0 stands for the “least set” Var A: lifts regular variables to a pointer type for simplicity, as with Steensgaard. But we now have to take into account covariance/contravariance.
Addr A: &e points to e Deref A: *e is an upper bound on the type of whatever e points to. In other words, this is nearly the inverse of Addr A.
Asst A: illustrates the difference between Andersen and Steensgaard - in the assignment e 1 =e 2, e 1 could potentially point to anything e 2 can, so the type of the expression is the type of e 2
Constructor Signatures The constructor signatures (section 3) merely describe a key difference between the two algorithms. Set constraints describe Andersen's analysis. Term constraints describe Steensgaard's analysis. This difference must also be handled when combining both algorithms
Combining And/Ste Foster combines the type languages for And and Ste by redefining their constructor signatures to yield a reference with two p fields and a tag field: ref (p get, p set, t) (page 11) For Andersen analysis, the Pget fields are covariant, the Pset fields are contravariant, and the t (tag) field is ignored. For Steensgaard analysis, all the subfields are Term fields, and we can assure that Pget=Pset.
After redefining the signatures for constructors, Foster combines And+Common with Ste+Common to arrive at the final set of inference rules, named Comb At this point, we no longer need to worry about separate And and Ste inference rules. Comb+Common represents both at once. This vastly simplifies the implementation of both algorithms.
The only difference between Comb and And/Ste is the use of the tag field t and the definition of a general-purpose symbol for the constraints. First, the tag t is shown in Ste+Common. It is used to identify equivalence classes. And+Common deals with inclusion rather than equivalence, so Comb’s tag field is simply ignored when we wish to use it for Andersen-style results. How does Comb work?
Second, changing the interpretation of the general- purpose constraint symbol (subset-iota) yields the two different algorithms. If it is used as a subset constraint, the rules compute Andersen's analysis. Steensgaard instead treats this constraint as conditional unification. Also, Pget=Pset, because the distinction is not used in Ste+Common How does Comb work?
Implementation There are 3 major problems with using C for the implementation.
Problem 1 We must determine how library functions affect the points-to graph without looking at their source. First, assume that most undefined functions have no effect on the analysis. Second, for those functions that do have an effect (such as strcpy(char* s1, char s2), we write a false stub of the function that provides enough information to the analysis to determine how the real function behaves.
Problem 2 Some functions can take a variable number of arguments. For the most part, C implementations of varargs do not affect the points-to set. But some implementations accomplish varargs by treating the first argument as a pointer to any subsequent arguments. None of these algorithms handle this correctly. Foster manually modified the vararg functions to take a fixed number of arguments
Problem 3 When a multidimensional array is allocated, C actually uses a contiguous block of memory. So if b is two-dimensional and a is one-dimensional, the statement: b = (int**) a; results in b being an alias to a. Dealing with this added complexity involves determining the C types for each expression, adding more overhead to the existing algorithms.