Flow-Insensitive Points-to Analysis with Term and Set Constraints Presentation by Kaleem Travis Patrick.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Type Checking, Inference, & Elaboration CS153: Compilers Greg Morrisett.
Cs776 (Prasad)L4Poly1 Polymorphic Type System. cs776 (Prasad)L4Poly2 Goals Allow expression of “for all types T” fun I x = x I : ’a -> ’a Allow expression.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Programming Languages and Paradigms
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
What is a pointer? First of all, it is a variable, just like other variables you studied So it has type, storage etc. Difference: it can only store the.
Composition CMSC 202. Code Reuse Effective software development relies on reusing existing code. Code reuse must be more than just copying code and changing.
Chapter 5: Elementary Data Types Properties of types and objects –Data objects, variables and constants –Data types –Declarations –Type checking –Assignment.
6/10/2015C++ for Java Programmers1 Pointers and References Timothy Budd.
Road Map Introduction to object oriented programming. Classes
Topic 9 – Introduction To Arrays. CISC105 – Topic 9 Introduction to Data Structures Thus far, we have seen “simple” data types. These refers to a single.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
ML: a quasi-functional language with strong typing Conventional syntax: - val x = 5; (*user input *) val x = 5: int (*system response*) - fun len lis =
Type Checking- Contd Compiler Design Lecture (03/02/98) Computer Science Rensselaer Polytechnic.
Main Index Contents 11 Main Index Contents Container Types Container Types Sequence Containers Sequence Containers Associative Containers Associative Containers.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Intraprocedural Points-to Analysis Flow functions:
Copyright © 2008 Pearson Addison-Wesley. All rights reserved. Chapter 9 Pointers and Dynamic Arrays.
1 Type Type system for a programming language = –set of types AND – rules that specify how a typed program is allowed to behave Why? –to generate better.
Describing Syntax and Semantics
1 The first step in understanding pointers is visualizing what they represent at the machine level. In most modern computers, main memory is divided into.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Chapter 12 Pointers and linked structures. 2 Introduction  The data structures that expand or contract as required during the program execution is called.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
C How to Program, 6/e Summary © by Pearson Education, Inc. All Rights Reserved.
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
2.1 The Addition Property of Equality
Memory Allocation CS Introduction to Operating Systems.
Using Classes Object-Oriented Programming Using C++ Second Edition 5.
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Copyright © 2012 Pearson Education, Inc. Chapter 8 Two Dimensional Arrays.
Microsoft Visual C++.NET Chapter 61 Memory Management.
Low-Level Detailed Design SAD (Soft Arch Design) Mid-level Detailed Design Low-Level Detailed Design Design Finalization Design Document.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
Array in C++ / review. An array contains multiple objects of identical types stored sequentially in memory. The individual objects in an array, referred.
Section 2.7 Solving Inequalities. Objectives Determine whether a number is a solution of an inequality Graph solution sets and use interval notation Solve.
CS-1030 Dr. Mark L. Hornick 1 C++ Language Basic control statements and data types.
Chapter 6 Introduction to Defining Classes. Objectives: Design and implement a simple class from user requirements. Organize a program in terms of a view.
Chapter 3 Part II Describing Syntax and Semantics.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Chapter 13: Structures. In this chapter you will learn about: – Single structures – Arrays of structures – Structures as function arguments – Linked lists.
Chapter 3 Top-Down Design with Functions Part II J. H. Wang ( 王正豪 ), Ph. D. Assistant Professor Dept. Computer Science and Information Engineering National.
CHAPTER 2 C++ SYNTAX & SEMANTICS #include using namespace std; int main() { cout
C How to Program, 7/e © by Pearson Education, Inc. All Rights Reserved.
Using Types to Analyze and Optimize Object-Oriented Programs By: Amer Diwan Presented By: Jess Martin, Noah Wallace, and Will von Rosenberg.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Computer Organization and Design Pointers, Arrays and Strings in C Montek Singh Sep 18, 2015 Lab 5 supplement.
Types and Programming Languages Lecture 11 Simon Gay Department of Computing Science University of Glasgow 2006/07.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
1 Becoming More Effective with C++ … Day Two Stanley B. Lippman
ICOM 4035 – Data Structures Dr. Manuel Rodríguez Martínez Electrical and Computer Engineering Department Lecture 2 – August 23, 2001.
Type Systems CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
M1G Introduction to Programming 2 2. Creating Classes: Game and Player.
CS 598 Scripting Languages Design and Implementation 9. Constant propagation and Type Inference.
CS162 - Topic #12 Lecture: –Arrays with Structured Elements defining and using arrays of arrays remember pointer arithmetic Programming Project –Any questions?
More Pointers in C Computer Organization I 1 August 2009 © McQuain, Feng & Ribbens Allocating Arrays Dynamically You allocate an array by.
1 Languages and Compilers (SProg og Oversættere) Semantic Analysis.
Manuel Fahndrich Jakob Rehof Manuvir Das
Pointer Analysis Lecture 2
BY GAWARE S.R. COMPUTER SCI. DEPARTMENT
Algebra 1 Section 2.3 Subtract real numbers
Chapter 6 Intermediate-Code Generation
5-4 Operations with Complex Numbers SWBAT
Pointer analysis.
Chapter 2 Section 1.
Data Structures and Algorithms Introduction to Pointers
The C Language: Intro.
Introduction to Programming
Assignments and Procs w/Params
Presentation transcript:

Flow-Insensitive Points-to Analysis with Term and Set Constraints Presentation by Kaleem Travis Patrick

Two methods: Andersen vs Steensgaard Foster claims these systems are nearly identical, and may actually be combined in their implementation.

Andersen: For an assignment e 1 = e 2 anything in the points-to set for e 2 must also be in the points-to set for e 1. Steensgaard: For an assignment e 1 = e 2 the points-to set for e 2 must be equal to the points-to set for e 1.

Foster’s Framework Foster's type systems are designed using Term and Set constraints:  Set constraints define inclusion relationships between types; we use set constraints to describe Andersen's analysis.  Term constraints define equality relationships between types; we use term equations to describe Steensgaard's analysis.

What’s so important about their similarity? The main difference between the Steensgaard and Andersen is Steensgaard uses term constraints as opposed set constraints.Term constraints describe equality. Set constraints describe inclusion By carefully defining our inference rules for both methods, the implementation is vastly simplified. This is because both methods will be combined into one set of inference rules. The difference in set constraints is minimal in the implementation.

Steensgaard

Const - Int S: _ is a wildcard - a fresh, unconstrained variable Var S: variables are elevated to references for simplicity

Addr S: &e points to e Deref S: if e is a reference to  then * e is of type 

Asst S: unifies the equivalence classes for the points- to sets of e 1 and e 2 In other words, if e 1 is of type  1  and e 2 is of type  2 then e 1 = e 2 is of type  2 This is where Steensgaard uses his time-saving, conservative merging.

Andersen

Const - Int A: assigns the empty set for integers. Foster uses 0 instead of “bottom” 0 stands for the “least set” Var A: lifts regular variables to a pointer type for simplicity, as with Steensgaard. But we now have to take into account covariance/contravariance.

Addr A: &e points to e Deref A: *e is an upper bound on the type of whatever e points to. In other words, this is nearly the inverse of Addr A.

Asst A: illustrates the difference between Andersen and Steensgaard - in the assignment e 1 =e 2, e 1 could potentially point to anything e 2 can, so the type of the expression is the type of e 2

Constructor Signatures The constructor signatures (section 3) merely describe a key difference between the two algorithms.  Set constraints describe Andersen's analysis.  Term constraints describe Steensgaard's analysis.  This difference must also be handled when combining both algorithms

Combining And/Ste Foster combines the type languages for And and Ste by redefining their constructor signatures to yield a reference with two p fields and a tag field: ref (p get, p set, t) (page 11) For Andersen analysis, the Pget fields are covariant, the Pset fields are contravariant, and the t (tag) field is ignored. For Steensgaard analysis, all the subfields are Term fields, and we can assure that Pget=Pset.

After redefining the signatures for constructors, Foster combines And+Common with Ste+Common to arrive at the final set of inference rules, named Comb At this point, we no longer need to worry about separate And and Ste inference rules. Comb+Common represents both at once. This vastly simplifies the implementation of both algorithms.

The only difference between Comb and And/Ste is the use of the tag field t and the definition of a general-purpose symbol for the constraints. First, the tag t is shown in Ste+Common. It is used to identify equivalence classes. And+Common deals with inclusion rather than equivalence, so Comb’s tag field is simply ignored when we wish to use it for Andersen-style results. How does Comb work?

Second, changing the interpretation of the general- purpose constraint symbol (subset-iota) yields the two different algorithms. If it is used as a subset constraint, the rules compute Andersen's analysis. Steensgaard instead treats this constraint as conditional unification. Also, Pget=Pset, because the distinction is not used in Ste+Common How does Comb work?

Implementation There are 3 major problems with using C for the implementation.

Problem 1 We must determine how library functions affect the points-to graph without looking at their source. First, assume that most undefined functions have no effect on the analysis. Second, for those functions that do have an effect (such as strcpy(char* s1, char s2), we write a false stub of the function that provides enough information to the analysis to determine how the real function behaves.

Problem 2 Some functions can take a variable number of arguments. For the most part, C implementations of varargs do not affect the points-to set. But some implementations accomplish varargs by treating the first argument as a pointer to any subsequent arguments. None of these algorithms handle this correctly. Foster manually modified the vararg functions to take a fixed number of arguments

Problem 3 When a multidimensional array is allocated, C actually uses a contiguous block of memory. So if b is two-dimensional and a is one-dimensional, the statement: b = (int**) a; results in b[0][0] being an alias to a[0]. Dealing with this added complexity involves determining the C types for each expression, adding more overhead to the existing algorithms.