Prof. Aiken CS 294 Lecture 11 Program Analysis. Prof. Aiken CS 294 Lecture 12 The Purpose of this Course How are the following related? –Program analysis.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
A Fixpoint Calculus for Local and Global Program Flows Swarat Chaudhuri, U.Penn (with Rajeev Alur and P. Madhusudan)
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
CS 536 Spring Global Optimizations Lecture 23.
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Chair of Software Engineering Fundamentals of Program Analysis Dr. Manuel Oriol.
Data Flow Analysis Compiler Design Nov. 3, 2005.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Static Program Analysis Xiangyu Zhang The slides are compiled from Alex Aiken’s Michael D. Ernst’s Sorin Lerner’s.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
CS5371 Theory of Computation Lecture 1: Mathematics Review I (Basic Terminology)
Intraprocedural Points-to Analysis Flow functions:
Data Flow Analysis Compiler Design Nov. 8, 2005.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Prof. Aiken CS 294 Lecture 21 Abstract Interpretation Part 2.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Precision Going back to constant prop, in what cases would we lose precision?
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Iterative Dataflow Problems Taken largely from notes of Alex Aiken (UC Berkeley) and Martin Rinard (MIT) Dataflow information used in optimization Several.
Dataflow Analysis CS What I s Dataflow Analysis? Static analysis reasoning about flow of data in program Different kinds of data: constants, variables,
Data Flow Analysis Suman Jana
Iterative Dataflow Problems
Fall Compiler Principles Lecture 8: Loop Optimizations
Topic 10: Dataflow Analysis
University Of Virginia
Fall Compiler Principles Lecture 10: Loop Optimizations
Data Flow Analysis Compiler Design
Pointer analysis.
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Presentation transcript:

Prof. Aiken CS 294 Lecture 11 Program Analysis

Prof. Aiken CS 294 Lecture 12 The Purpose of this Course How are the following related? –Program analysis –Model checking (as applied to software) –Theorem proving (as applied to software) But program analysis itself has sub-disciplines...

Prof. Aiken CS 294 Lecture 13 What is Program Analysis? A collection of communities: –Dataflow analysis –Abstract interpretation –Type inference –Constraint-based analysis The relationships among these are not completely clear

Prof. Aiken CS 294 Lecture 14 What is Program Analysis For? Historically: Optimizing compilers More recently: –Influencing language design –Finding bugs

Prof. Aiken CS 294 Lecture 15 Culture Emphasis on low-complexity techniques –Because of emphasis on usage in tools –High-complexity techniques also studied, but often don’t survive Emphasis on complete automation Driven by language features –Particular languages and features give rise to their own sub-disciplines

Prof. Aiken CS 294 Lecture 16 Dataflow Analysis Part 1

Prof. Aiken CS 294 Lecture 17 Control-Flow Graphs x := a + b; y := a * b; while y > a + b { a := a + 1; x := a + b } if y > a + b a := a + 1 x := a + b y := a * b x := a + b Control-flow graphs are state-transisition systems.

Prof. Aiken CS 294 Lecture 18 Notation s is a statement succ(s) = { successor statements of s } pred(s) = { predecessor statements of s } write(s) = { variables written by s } read(s) = { variables read by s } Note: In literature write = kill and read = gen

Prof. Aiken CS 294 Lecture 19 Available Expressions For each program point p, which expressions must have already been computed, and not later modified, on all paths to p. if y > a + b a := a + 1 x := a + b y := a * b x := a + b Optimization: Where available, expressions need not be recomputed. a+b is available here

Prof. Aiken CS 294 Lecture 110 Dataflow Equations

Prof. Aiken CS 294 Lecture 111 Example if y > a + b a := a + 1 x := a + b y := a * b x := a + ba+ba+b, a*ba+b, a*b, y > a+ba+b a+b, y > a+b

Prof. Aiken CS 294 Lecture 112 Liveness Analysis For each program point p, which of the variables defined at that point are used on some execution path? if y > a + b a := a + 1 x := a + b y := a * b x := a + b Optimization: If a variable is not live, no need to keep it in a register. x is not live here

Prof. Aiken CS 294 Lecture 113 Dataflow Equations

Prof. Aiken CS 294 Lecture 114 Example if y > a + b a := a + 1 x := a + b y := a * b x := a + bx,a,bx,y,a,by,a,b x,y,a,ba,bx

Prof. Aiken CS 294 Lecture 115 Available Expressions Again

Prof. Aiken CS 294 Lecture 116 Available Expressions: Schematic Must analysis: property holds on all paths Forwards analysis: from inputs to outputs Transfer function:

Prof. Aiken CS 294 Lecture 117 Live Variables Again

Prof. Aiken CS 294 Lecture 118 Live Variables: Schematic May analysis: property holds on some path Backwards analysis: from outputs to inputs Transfer function:

Prof. Aiken CS 294 Lecture 119 Very Busy Expressions An expression e is very busy at program point p if every path from p must evaluate e before any variable in e is redefined Optimization: hoisting expressions A must-analysis A backwards analysis

Prof. Aiken CS 294 Lecture 120 Reaching Definitions For a program point p, which assignments made on paths reaching p have not been overwritten Connects definitions with uses (use-def chains) A may-anlaysis A forwards analysis

Prof. Aiken CS 294 Lecture 121 One Cut at the Dataflow Design Space MayMust ForwardsReaching definitions Available expressions BackwardsLive variablesVery busy expressions

Prof. Aiken CS 294 Lecture 122 The Literature Vast literature of dataflow analyses 90+% can be described by –Forwards or backwards –May or must Some oddballs, but not many –Bidirectional analyses

Prof. Aiken CS 294 Lecture 123 Another Cut at Dataflow Design What theory are we dealing with? Review our schemas:

Prof. Aiken CS 294 Lecture 124 Essential Features Set variables L in (s), L out (S) Set operations: union, intersection –Restricted complement (- constant) Domain of atoms –E.g., variable names Equations with single variable on lhs

Prof. Aiken CS 294 Lecture 125 Dataflow Problems Many dataflow equations are described by the grammar: v is a variable a is an atom Note: More general than most problems...

Prof. Aiken CS 294 Lecture 126 Solving Dataflow Equations Simple worklist algorithm: –Initially let S(v) = 0 for all v –Repeat until S(v) = S(E) for all equations Pick any v = E such that S(v)  S(E) Set S := S[v/S(E)]

Prof. Aiken CS 294 Lecture 127 Termination How do we know the algorithm terminates? Because –operations are monotonic –the domain is finite

Prof. Aiken CS 294 Lecture 128 Monotonicity Operation f is monotonic if X  Y  f(x)  f(y) We require that all operations be monotonic –Easy to check for the set operations –Easy to check for all transfer functions; recall:

Prof. Aiken CS 294 Lecture 129 Termination again To see the algorithm terminates –All variables start empty –Variables and rhs’s only increase with each update By induction on # of updates, using monotonicity –Sets can only grow to a max finite size Together, these imply termination

Prof. Aiken CS 294 Lecture 130 The Rest of the Lecture Distributive Problems Flow Sensitivity Context Sensitivity –Or interprocedural analysis What are the limits of dataflow analysis?

Prof. Aiken CS 294 Lecture 131 Distributive Dataflow Problems Monotonicity implies for a transfer function f: Distributive dataflow problems satisfy a stronger property: f(x  y)  f(x)  f(y) f(x  y) =f(x)  f(y)

Prof. Aiken CS 294 Lecture 132 Distributivity Example h gf k k(h(f(0)  g(0))) = k(h(f(0))  h(g(0))) = k(h(f(0)))  k(h(g(0))) The analysis of the graph is equivalent to combining the analysis of each path!

Prof. Aiken CS 294 Lecture 133 Meet Over All Paths If a dataflow problem is distributive, then the (least) solution of the dataflow equations is equivalent to the analyzing every path (including infinite ones) and combining the results Says joins cause no loss of information

Prof. Aiken CS 294 Lecture 134 Distributivity Again Obtaining the meet over all paths solution is a very powerful guarantee Says that dataflow analysis is really as good as you can do for a distributive problem. Alternatively, can be viewed as saying distributive problems are very easy indeed...

Prof. Aiken CS 294 Lecture 135 What Problems are Distributive? Many analyses of program structure are distributive –E.g., live variables, available expressions, reaching definitions, very busy expressions –Properties of how the program computes

Prof. Aiken CS 294 Lecture 136 Liveness Example Revisited if y > a + b a := a + 1 x := a + b y := a * b x := a + bx,a,bx,y,a,by,a,b x,y,a,ba,bx

Prof. Aiken CS 294 Lecture 137 Constant Folding Ordering i<  for any integer i j  k=  if j  k Example transfer function: Consider

Prof. Aiken CS 294 Lecture 138 What Problems are Not Distributive? Analyses of what the program computes –The output is (a constant, positive, …)

Prof. Aiken CS 294 Lecture 139 Flow Sensitivity Flow sensitive analyses –The order of statements matters –Need a control flow graph Or transition system, …. Flow insensitive analyses –The order of statements doesn’t matter –Analysis is the same regardless of statement order

Prof. Aiken CS 294 Lecture 140 Example Flow Insensitive Analysis What variables does a program fragment modify? Note G(s 1 ;s 2 ) = G(s 2 ;s 1 )

Prof. Aiken CS 294 Lecture 141 The Advantage Flow-sensitive analyses require a model of program state at each program point –E.g., liveness analysis, reaching definitions, … Flow-insensitive analyses require only a single global state –E.g., for G, the set of all variables modified

Prof. Aiken CS 294 Lecture 142 Notes on Flow Sensitivity Flow insensitive analyses seem weak, but: Flow sensitive analyses are hard to scale to very large programs –Additional cost: state size X # of program points Beyond 1000’s of lines of code, only flow insensitive analyses have been shown to scale

Prof. Aiken CS 294 Lecture 143 Context-Sensitive Analysis What about analyzing across procedure boundaries? Def f(x){…} Def g(y){…f(a)…} Def h(z){…f(b)…} Goal: Specialize analysis of f to take advantage of f is called with a by g f is called with b by h

Prof. Aiken CS 294 Lecture 144 Control-Flow Graphs Again How do we extend control-flow graphs to procedures? Idea: Model procedure call f(a) by: –Edge from point before call to entry of f –Edge from exit(s) of f to point after call

Prof. Aiken CS 294 Lecture 145 Example Edges from –before f(a) to entry of f –Exit of f to after f(a) –Before f(b) to entry of f –Exit of f to after f(b) f(x){…} g(y){…f(a)…}h(z){…f(b)…}

Prof. Aiken CS 294 Lecture 146 Example Edges from –before f(a) to entry of f –Exit of f to after f(a) –Before f(b) to entry of f –Exit of f to after f(b) Has the correct flows for g f(x){…} g(y){…f(a)…}h(z){…f(b)…}

Prof. Aiken CS 294 Lecture 147 Example Edges from –before f(a) to entry of f –Exit of f to after f(a) –Before f(b) to entry of f –Exit of f to after f(b) Has the correct flows for h f(x){…} g(y){…f(a)…}h(z){…f(b)…}

Prof. Aiken CS 294 Lecture 148 Example But also has flows we don’t want –One path captures a call to g returning at h! So-called “infeasible paths” f(x){…} g(y){…f(a)…}h(z){…f(b)…}

Prof. Aiken CS 294 Lecture 149 What to do? Must distinguish calls to f in different contexts Three techniques –Assumptions later –Context-free reachability Later –Call strings Today

Prof. Aiken CS 294 Lecture 150 Call Strings Observation: –At run time, different calls to f are distinguished by the call stack Problem: –The stack is unbounded Idea: –Use the last k calls on the stack to distinguish context –Represent a call by the name of the calling procedure

Prof. Aiken CS 294 Lecture 151 Example Revisited Use call strings of length 1 Context is name of calling procedure f(x){…} g(y){…f(a)…}h(z){…f(b)…} g g h h Note: labels on edges are part of the state: tag a call with “g” on call of f() from g(), filter out all but that portion of the state with call string “g” on return from g() to f()

Prof. Aiken CS 294 Lecture 152 Experience with Call Strings Very expensive –Multiplies # of abstract values by (# of procedures ** length of call string) –Hard to contemplate call strings > 1 Fragile –Very sensitive to organization of procedures Well-studied, but not much used in practice

Prof. Aiken CS 294 Lecture 153 Review of Terminology Must vs. May Forwards vs. Backwards Flow-sensitive vs. Flow-insensitive Context-sensitive vs. Context-insensitive Distributive vs. non-Distributive

Prof. Aiken CS 294 Lecture 154 Where is Dataflow Analysis Useful? Best for flow-sensitive, context-insensitive, distributive problems on small pieces of code –E.g., the examples we’ve seen and many others Extremely efficient algorithms are known –Use different representation than control-flow graph, but not fundamentally different –More on this in a minute...

Prof. Aiken CS 294 Lecture 155 Where is Dataflow Analysis Weak? Lots of places

Prof. Aiken CS 294 Lecture 156 Data Structures Not good at analyzing data structures Works well for atomic values –Labels, constants, variable names Not easily extended to arrays, lists, trees, etc. –Work on shape analysis

Prof. Aiken CS 294 Lecture 157 The Heap Good at analyzing flow of values in local variables No notion of the heap in traditional dataflow applications In general, very hard to model anonymous values accurately –Aliasing –The “strong update” problem

Prof. Aiken CS 294 Lecture 158 Context Sensitivity Standard dataflow techniques for handling context sensitivity don’t scale well Brittle under common program edits E.g., call strings

Prof. Aiken CS 294 Lecture 159 Flow Sensitivity (Beyond Procedures) Flow sensitive analyses are standard for analyzing single procedures Not used (or not aware of uses) for whole programs –Too expensive

Prof. Aiken CS 294 Lecture 160 The Call Graph Dataflow analysis requires a call graph –Or something close Inadequate for higher-order programs –First class functions –Object-oriented languages with dynamic dispatch Call-graph hinders algorithmic efficiency –Desire to keep executable specification is limiting

Prof. Aiken CS 294 Lecture 161 Forwards vs. Backwards Restriction to forwards/backwards reachability –Very constraining –Many important problems not easy to fit into this mold

Prof. Aiken CS 294 Lecture 162 Next Time: Abstract Interpretation Theory –Lots Examples –Lots Focus on contrast with traditional dataflow analysis