Solving Shape-Analysis Problems with Languages with Destructive updating By M. Sagiv et al. Presentation by Justin Bronn Travis Pinney Josh Bauman.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Chapter 22 Implementing lists: linked implementations.
CHP-5 LinkedList.
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
Functional Programming. Pure Functional Programming Computation is largely performed by applying functions to values. The value of an expression depends.
Program Representations. Representing programs Goals.
Linked Lists Compiled by Dr. Mohammad Alhawarat CHAPTER 04.
Various languages….  Could affect performance  Could affect reliability  Could affect language choice.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Relational Inductive Shape Analysis Bor-Yuh Evan Chang University of California, Berkeley Xavier Rival INRIA POPL 2008.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
Intraprocedural Points-to Analysis Flow functions:
Overview of program analysis Mooly Sagiv html://
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Improving Code Generation Honors Compilers April 16 th 2002.
C++ Programming: Program Design Including Data Structures, Fifth Edition Chapter 17: Linked Lists.
Chapter 3: Arrays, Linked Lists, and Recursion
Names and Bindings Introduction Names Variables The concept of binding Chapter 5-a.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Data Structures Using C++ 2E
Pointer Data Type and Pointer Variables
1 Abstraction  Identify important aspects and ignore the details  Permeates software development programming languages are abstractions built on hardware.
1 Records Record aggregate of data elements –Possibly heterogeneous –Elements/slots are identified by names –Elements in same fixed order in all records.
Functional Programming With examples in F#. Pure Functional Programming Functional programming involves evaluating expressions rather than executing commands.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Program analysis with dynamic change of precision. Philippe Giabbanelli CMPT 894 – Spring 2008.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
Convergence of Model Checking & Program Analysis Philippe Giabbanelli CMPT 894 – Spring 2008.
CS535 Programming Languages Chapter - 10 Functional Programming With Lists.
1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6
Using Types to Analyze and Optimize Object-Oriented Programs By: Amer Diwan Presented By: Jess Martin, Noah Wallace, and Will von Rosenberg.
How to execute Program structure Variables name, keywords, binding, scope, lifetime Data types – type system – primitives, strings, arrays, hashes – pointers/references.
COMP 412, FALL Type Systems II C OMP 412 Rice University Houston, Texas Fall 2000 Copyright 2000, Robert Cartwright, all rights reserved. Students.
Quantified Data Automata on Skinny Trees: an Abstract Domain for Lists Pranav Garg 1, P. Madhusudan 1 and Gennaro Parlato 2 1 University of Illinois at.
Chapter 17: Linked Lists. Objectives In this chapter, you will: – Learn about linked lists – Learn the basic properties of linked lists – Explore insertion.
Constructs for Data Organization and Program Control, Scope, Binding, and Parameter Passing. Expression Evaluation.
Linked list: a list of items (nodes), in which the order of the nodes is determined by the address, called the link, stored in each node C++ Programming:
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 17: Linked Lists.
C++ Programming: From Problem Analysis to Program Design, Fourth Edition Chapter 18: Linked Lists.
CHAPTER 51 LINKED LISTS. Introduction link list is a linear array collection of data elements called nodes, where the linear order is given by means of.
LINKED LISTS.
Memory Management.
Insertion sort Loop invariants Dynamic memory
Chapter 16: Linked Lists.
Object Lifetime and Pointers
C++ Programming:. Program Design Including
Top 50 Data Structures Interview Questions
Spring 2016 Program Analysis and Verification
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Data Structure and Algorithms
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Graph-Based Operational Semantics
Arrays and Linked Lists
Linked Lists.
Pointer analysis.
Presentation transcript:

Solving Shape-Analysis Problems with Languages with Destructive updating By M. Sagiv et al. Presentation by Justin Bronn Travis Pinney Josh Bauman

Introduction  Addresses problems with pointer, alias, sharing, and storage analysis, and type- checking problems.  Storage analysis is emphasized.

Shape Analysis  Give a conservative, finite characterization of the possible “shapes” that a programs heap- allocated data structures can have at each program point

New Algorithm  Verifies shape-preservation properties of lists, trees, and certain programs that update circular lists.  Differences from Previous methods: Deliberately drops information about concrete locations. Run-time locations that are not pointed by variables are clustered in a single summary node. Removes edges on the tail of lists. Sharing through variables: When two variables point to the same cons cell – it is represented directly by shape-graph edges.

The is # (is shared) variable  Each SG # (Static-Shaped Graph) has a boolean value associated with it {false, true}. When true, it indicates the cons-cells represented by n may be the target of pointers emanating from two or more distinct cons-cell fields.

is # Exercise  What is the is # value of this graph?

is # Exercise  What is the is # value of this graph?

is # Exercise  What is the is # value of this graph?

Example Uses a program that performs a list reversal via destructive updating. Invariant is something that does not change. In the program reverse(x,y) these always hold true 1.Variable x points to an unshared, acyclic, singly linked list. 2.Variable y points to an unshared, acyclic, singly linked list, and t may point to the second element of the y-list (if such an element exists). 3.The lists pointed to by x and y are disjoint

Iteration 1

Iteration 1 (Continued)

Iteration 2

Iteration 2 (Continued)

Iteration 3

Iteration 3 (Continued)

Iteration 4

Iteration 4 (Continued)

Tracking and Aliasing Functions  “t:=y” Tracking the aliasing of configurations using “names” attached to shape-nodes.  Liquidization and Renaming  Liquidizing, when a variable t no longer points to a cons-cell, we remove the t from the name of n{t}  This is done because the “name” of the shape-node goes into phi (falls into the primordial soup)  Renaming occurs when a statement is processed that can increase the amount of sharing in a concrete store.  Ex: “y := x”  | n{t,y} -> n{t}  | n{x} -> n{y,x}  The use of sets of variables to name the nodes in SSGs can result in an exponential number of shape-nodes. Techniques to sidestep this problem is discussed in the Optimization section (Section 2.2)

Node Materialization  When x := x.cdr occurs, the affect has been to “materialize” a new non-summary from n- phi. This materialization conservatively covers all possible configurations of the storage.

Cutting the List  “y.cdr := t” cuts the y list at the head, (produces the fifth SSG from the fourth SSG}  The cdr edge of shape-node n{y} is first removed. This cuts the y list at the head, seperating the first element n{y}, from the tail, which x points to.  A cdr edge from n{y} to n{t} is then added, which concatenates shape-node n{y} at the head of the list that t points to.

Normalization  Only one constructor or selector is applied per assignment statement  A selector is something like x.car or x.cdr  An expression cons(x,y) is executed in three steps 1. An unitialized cons cell is allocated, and its address is assigned into a new temp variable 2. The car component of the temp is initialized with the value of x 3. The cdr component of temp is initialized with a value of y  All allocation statements must be in the form x := new, you can’t use x.sel := new  !!! Each assignment statement, the same variable cannot occur on the left and right hand side  Each assignment of the form lhs:=rhs in which rhs ==/ nil is  An assignment statement of the form temp := nil is placed at the end of the program for each temporary variable introduced in the normalization.

Shape Graphs – Basic Terminology  E v is defined as the graph’s set of variable edges: From the start to the first node. It is denoted [x,n] where x is a pointer variable (PVar) and n is a shape node.  E s is defined as the graph’s set of selector edges: Between nodes. It is denoted where s and t are the shape-nodes and sel is of the set {car, cdr}  When E v (x) is overloaded, it represents  When E s (s, sel) is overloaded, it represents  In concrete semantics, the result of an execution sequence is a shape-graph that represents the state of heap-allocated storage in memory.

Concrete Semantic Operations 1.Variable Edge  Null 2.Selector Edge  Null 3.Variable Edge  New 4.Variable Edge  Variable Edge 5.Selector Edge  Variable Edge 6.Variable Edge  Selector Edge

Deterministic (DSG) vs. Nondeterministic Shape Graphs (SSG)  Deterministic Shape Graphs (DSG):

Deterministic (DSG) vs. Nondeterministic Shape Graphs (SSG)  Nondeterministic Shape Graph (SSG):

Concrete Semantics  Nonstandard in the following ways 1.The only parts of the store that the concrete semantics keeps track of are the pointer variables and the cons-cells of heap-allocated storage. 2.Rather than causing an “abnormal termination” of the program, dereferences of nil pointers and uninitialized pointers are treated as no-ops. 3.The concrete semantics does not interpret predicates, read statements, and assignment statements that do not perform manipulation A small amount of abstraction is built in which may associate a control-flow-graph vertex with more concrete stores.

Abstract Semantics – Static Shape-Graphs (Definition 5.1.1)  A static shape-graph is a pair where: SG # = A Shape Graph is # = a function of type shape_nodes(SG # )  {false, true} The class of static shape-graphs is denoted by

Join Operation ( ) (Definition 5.1.2)  (Just two Static Shape-Graphs)  Then, the Join Operation would be :

Abstract Semantics – The Abstraction Function (Definition 5.2.1)  Let SG # =, be a shape graph in DSG, and let l, l 1, and l 2, be shape-nodes.  Then: π[E v ](l) is also written as π (l)  Yikes! (What this means is that π(l)  list of all variables belonging to L)

Abstract Semantics – The Abstraction Function (Definition 5.2.1)  Beta Functions (Author Overloads β 6 times!)

Abstract Semantics – The Abstraction Function (Definition 5.2.1)  Beta Functions (continued)

Abstract Semantics – The Abstraction Function (Definition 5.2.1)  Finally, we get to the abstraction function:  What’s going on? He is pruning down the DSG  SSG, and building up a naming scheme (via β functions). The abstraction function merges multiple DSG’s into one SSG

iis function (induced is-shared)  Checks whether a cons-cell l is the target of pointers emanating from two or more distinct cons-cell fields

Concretization  This function goes from an SSG to a set of DSGs (think of it as the ‘reverse’ of the abstraction function)

Examples (Abstract Semantics)

Abstract Interpretation  Figure 8

Concrete Predicates

Strong Nullification  When the algorithm processes a statement like x.sel := nil, it always removes the sel edges emanating from the shape nodes.

If-Then-Else

Merging of Two Lists program mergesort(x, y) // sort the lists x and y into foo foo := nil while x.cdr != nil and y.cdr != nil do if x.car > y.car then t := x x := x.cdr else t := y y := y.cdr fi t.cdr := foo foo := t od

Merging of Two Lists - normalized program mergesort(x, y) // sort the lists x and y into foo foo := nil while x.cdr != nil and y.cdr != nil do t := nil if x.car > y.car then t := x x := nil x := t.cdr else t := x x := nil x := t.cdr fi t.cdr := nil t.cdr := foo foo := nil foo := t od

Inconsistencies  Does not give running times are program length  Relevance to Current Times  Published in January 1998  Seems to slow to be run on a whole program 2^2^|Pvar| for the number of shape nodes in the worst case scenario. Believe that running a single SSG per vertex would be more practical. It is still very slow  Programs must be normalized to be analyzed

Extensions  Inaccuracy caused by by n-phi representing unrelated cons-cells, which is particulary a problem when is#(n-phi) = true.  Possible ways of dealing with this 1. Using two separate summary nodes: is#(n-phi) = false and is#(n-phi) = true 2. Using allocation-sites to identify shape-nodes: shape nodes have names like n_s,x, where s is an allocation site, and X is a set of program variables. 3. Using type information: having a unique n-phi for every declared data type These extensions do not work in all cases though. One example when it will cause problems is…

Reducing the Number of Shape-Nodes  The number of shape-nodes is an SSG is bounded by 2^|PVar|. Reaches this limit is unlikely in practice, because of the main reason being that the number of possible aliasing configurations is normally small. (It his difficult to determine whether this is true, because the author never goes shows any results from running the analysis on programs. )  Widening can be used to eliminate the possibility of exponential blow-up.

Widening The basic idea of widening is to use discard an arbitrary amount of For various points in loops, we can widen an SSG into a less precise, but ussually more compact, SSG by merging shape-nodes Ex. n_z_1 and n_z_2 into n_z_1 union n_z_2 combining all the variable and selector edges. A weaker property that would enable widening would be

Narrowing  SSG can be narrowed to be more “precise”  x.sel := nil

Refining Concrete Semantics  In Scheme, when a cons cell is no longer reachable, the interpreter incorporates garbage- collection to efficiently free the memory of the cons cell. The shape-analyis can benefit from this same idea when shape-nodes are no longer reachable by variables.

May Alias-Problem  The may-alias problem is a fundamental problem in optimizing compilers generating code for scalar, superscalar, and parallel architectures

Possible Enhancements  Current code works for a small lisp-like language, which conservatively approximates the possible shapes that the program can have.  Could not be used for a language such as C++ or C  Becomes more complex when the SSG is non-deterministic, which will occur on any type of control structure. eg. if-then-else  May be able to used for Java because of its stronger typing, but the analysis would have to be greatly extended  Code can be run on different sections on a program, such as broken up into classes, or subprograms.  A linking mechanism would have to be created to be able to see how all the analyses for each subpart can be used.

Insertion into a List  Listness and Circular Listness are preserved. In other words, when a list or a circular list goes through the shape-analysis algorithm, a list or circular list is still there at the end of the program.

The End  Happy Thanksgiving!!