Program Analysis via Graph Reachability

Slides:



Advertisements
Similar presentations
Overview Structural Testing Introduction – General Concepts
Advertisements

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
SSA.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
 Program Slicing Long Li. Program Slicing ? It is an important way to help developers and maintainers to understand and analyze the structure.
A Fixpoint Calculus for Local and Global Program Flows Swarat Chaudhuri, U.Penn (with Rajeev Alur and P. Madhusudan)
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000
Interprocedural Slicing using Dependence Graphs Susan Horwitz, Thomas Reps, and David Binkley University of Wisconsin-Madison.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Semi-Sparse Flow-Sensitive Pointer Analysis Ben Hardekopf Calvin Lin The University of Texas at Austin POPL ’09 Simplified by Eric Villasenor.
1 Lecture 06 – Inter-procedural Analysis Eran Yahav.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Lecture 8 Recursively enumerable (r.e.) languages
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
Set Constraint-Based Program Analysis Manuel Fähndrich CS590 UW Spring 2001.
12/06/20051 Software Configuration Management Configuration management (CM) is the process that controls the changes made to a system and manages the different.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv Tel Aviv University Textbook Chapter 2.5.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Loops Guo, Yao.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.
Survey of program slicing techniques
Model Checking Lecture 5. Outline 1 Specifications: logic vs. automata, linear vs. branching, safety vs. liveness 2 Graph algorithms for model checking.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
From last class. The above is Click’s solution (PLDI 95)
Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.
February 2001OASIS --- Norfolk1 Dependence Graphs for Information Assurance Paul Anderson GrammaTech, Inc. Ithaca, NY
Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency.
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Precise Interprocedural Dataflow Analysis via Graph Reachibility
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Formal Methods Program Slicing & Dataflow Analysis February 2015.
Bug Localization with Machine Learning Techniques Wujie Zheng
Names and Scope. Scope Suppose that a name is used many times for different entities in text of the program, or in the course of execution. When the name.
224 3/30/98 CSE 143 Recursion [Sections 6.1, ]
1 Program Slicing Amir Saeidi PhD Student UTRECHT UNIVERSITY.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Program Analysis and Verification
1 The System Dependence Graph and its use in Program Slicing.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Interprocedural Path Profiling David Melski Thomas Reps University of Wisconsin.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Reachability Analysis for Callbacks 北京大学 唐浩
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
Computational Divided Differencing Thomas Reps University of Wisconsin Joint work with Louis B. Rall.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Program Analysis via Graph Reachability
Amir Kamil and Katherine Yelick
Software Construction and Evolution - CSSE 375 Program Understanding
How is a PDG Created? Control Flow Graph (CFG) PDG is union of:
Program Slicing Baishakhi Ray University of Virginia
Program Analysis via Graph Reachability
Program Analysis via Graph Reachability
Slicing Java Programs that Throw and Catch Exceptions
Program Analysis via Graph Reachability
Amir Kamil and Katherine Yelick
Presentation transcript:

Program Analysis via Graph Reachability Thomas Reps University of Wisconsin http://www.cs.wisc.edu/~reps/ PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000

PLDI 00 Registration Form Tutorial (morning): …………… $ ____ Tutorial (afternoon): ………….. $ ____ Tutorial (evening): ……………. $ – 0 –

Applications Program optimization Program-understanding and software-reengineering Security information flow Verification model checking security of crypto-based protocols for distributed systems

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

. . . As Well As . . . Flow-insensitive points-to analysis Complexity results Linear . . . cubic . . . undecidable variants PTIME-completeness Model checking of recursive hierarchical finite-state machines “infinite”-state systems linear-time and cubic-time algorithms

. . . And Also Analysis of attribute grammars Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Formal-language problems CFL-recognition (given G and , is   L(G)?) 2DPDA- and 2NPDA-simulation Given M and , is   L(M)? String-matching problems

Unifying Conceptual Model for Dataflow-Analysis Literature Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] Linear-time bidirectional gen-kill [Dhamdhere 94] Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]

Collaborators Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski David Binkley Michael Benedikt Patrice Godefroid

Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear . . . cubic . . . undecidable Beyond CFL-reachability

Program Slicing variable v at point p. The backward slice w.r.t variable v at program point p The program subset that may influence the value of variable v at point p. The forward slice w.r.t variable v at program point p The program subset that may be influenced by the value of variable v at point p.

Backward slice with respect to “printf(“%d\n”,i)” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Backward slice with respect to “printf(“%d\n”,i)” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Backward slice with respect to “printf(“%d\n”,i)” Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Forward slice with respect to “sum = 0” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

Forward slice with respect to “sum = 0” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

Who Cares About Slices? Understanding programs Restructuring Programs Program Specialization and Reuse Program Differencing Testing (and Retesting) Year 2000 Problem Automatic Differentiation

What Are Slices Useful For? Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Character-Count Program void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Line-Count Program void line_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line2(FILE *f, BOOL *bptr, int *iptr); scan_line2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Specialization Via Slicing wc -lc wc -c wc -l Not partial evaluation! void line_count(FILE *f);

How are Slices Computed? Reachability in a Dependence Graph Program Dependence Graph (PDG) Dependences within one procedure Intraprocedural slicing is reachability in one PDG System Dependence Graph (SDG) Dependences within entire system Interprocedural slicing is reachability in the SDG

How is a PDG Created? Control Flow Graph (CFG) PDG is union of: Control Dependence Graph Flow Dependence Graph computed from CFG

Control Flow Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

Flow Dependence Graph Flow dependence p q Value of variable int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

Control Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG) int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (2) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (3) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (4) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

CodeSurfer

CodeSurfer

Browsing a Dependence Graph Pretend this is your favorite browser What does clicking on a link do? You get a new page Or you move to an internal tag

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

How is an SDG Created? Each PDG has nodes for entry point procedure parameters and function result Each call site has nodes for call arguments and function result Appropriate edges entry node to parameters call node to arguments call node to entry node arguments to parameters

System Dependence Graph (SDG) Enter main Call p Call p Enter p

SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1 Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

Interprocedural Backward Slice Enter main Call p Call p Enter p

Interprocedural Backward Slice (2) Enter main Call p Call p Enter p

Interprocedural Backward Slice (3) Enter main Call p Call p Enter p

Interprocedural Backward Slice (4) Enter main Call p Call p Enter p

Interprocedural Backward Slice (5) Enter main Call p Call p Enter p

Interprocedural Backward Slice (6) Enter main Call p Call p ) ( [ ] Enter p

Matched-Parenthesis Path ) ( ) [

Interprocedural Backward Slice (6) Enter main Call p Call p Enter p

Interprocedural Backward Slice (7) Enter main Call p Call p Enter p

Slice Extraction Enter main Call p Enter p

Slice of the Sum Program Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

CFL-Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A context-free language L-path from s to t iff Running time: O(N 3)

Interprocedural Slicing via CFL-Reachability Graph: System dependence graph L: L(matched) [roughly] Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94] CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

Ordinary Graph Reachability ( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

CFL-Reachability via Dynamic Programming Graph Grammar A  B C B C A

Degenerate Case: CFL-Recognition exp  id | exp + exp | exp * exp | ( exp )  “(a + b) * c”  L(exp) ? ) ( a c b + * s t

Degenerate Case: CFL-Recognition exp  id | exp + exp | exp * exp | ( exp ) “a + b) * c +”  L(exp) ? * a + ) b c s t

CYK: Context-Free Recognition M  M M | ( M ) | [ M ] | ( ) | [ ]  = “( [ ] ) [ ]” Is   L(M)?

CYK: Context-Free Recognition M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M M  M M | ( M ) | [ M ] | ( ) | [ ]

Is “( [ ] ) [ ]”  L(M)? length start M  [ ] LPM  ( M ( [ ] ) [ ] ( [ ] ) [ ] start { ( } { [ } { ] } { [ } { ) } { ] } LPM  ( M M  [ ]  {M}  {M} {LPM}   {M} 

 Is “( [ ] ) [ ]”  L(M)? length start M  M M ( [ ] ) [ ] { (} { [ } ( [ ] ) [ ] start { (} { [ } { ] } { [ } { ) } { ] } M  M M  {M}  {M} {LPM}   {M}   M? {M}

 CYK: Graphs vs. Tables Is “( [ ] ) [ ]”  L(M)? s t ( [ ] ) [ ] M ( [ ] ) [ ] M LPM M M M  M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M

CFL-Reachability via Dynamic Programming Graph Grammar A  B C B C A

Dynamic Transitive Closure ?! Aiken et al. Set-constraint solvers Points-to analysis Henglein et al. type inference But a CFL captures a non-transitive reachability relation [Valiant 75]

Program Chopping Given source S and target T, what program points transmit effects from S to T? S T Intersect forward slice from S with backward slice from T, right?

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”  Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; }  Chop with respect to “sum = 0” and “printf(“%d\n”,i)”

Non-Transitivity and Slicing Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout ( ] Enter add x = xin y = yin x = x + y xout = x

“Precise interprocedural chopping” Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]

CF-Recognition vs. CFL-Reachability Chain graphs General grammar: sub-cubic time [Valiant75] LL(1), LR(1): linear time CFL-Reachability General graphs: O(N3) LL(1): O(N3) LR(1): O(N3) Certain kinds of graphs: O(N+E) Regular languages: O(N+E) Gen/kill IDFA GMOD IDFA

Regular-Language Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A regular language L-path from s to t iff Running time: O(N+E) Ordinary reachability (= transitive closure) Label each edge with e L is e* vs. O(N3)

Security of Crypto-Based Protocols for Distributed System “Ping-pong” protocols (1) X —EncryptY(M X) Y (2) Y —EncryptX(M) X [Dolev & Yao 83] O(N8) algorithm [Dolev, Even, & Karp 83] Less well known than [Dolev & Yao 83] O(N3) algorithm

[Dolev, Even, & Karp 83] Id  EncryptX Id DecryptX Id  DecryptX Id EncryptX Id  . . . Message Saboteur EY AX AZ Id ?

Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear . . . cubic . . . undecidable Beyond CFL-reachability

Relationship to Other Analysis Paradigms Dataflow analysis reachability versus equation solving Deduction Set constraints

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Dataflow Analysis Demand Algorithms Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

Dataflow Analysis Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution Examples Constant propagation Reaching definitions Live variables Possibly uninitialized variables

Useful For . . . Optimizing compilers Parallelizing compilers Tools that detect possible logical errors Tools that show the effects of a proposed modification

Possibly Uninitialized Variables {} Start {w,x,y} x = 3 {w,y} if . . . {w,y} y = x {w,y} y = w {w} w = 8 {w,y} {} printf(y) {w,y}

Precise Intraprocedural Analysis start n

( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Precise Interprocedural Analysis ret start n ( ) [Sharir & Pnueli 81]

Representing Dataflow Functions b c Identity Function a b c Constant Function

Representing Dataflow Functions b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Composing Dataflow Functions b c a b c a a b c

( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3 Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Off Limits! matched  matched matched | (i matched )i 1  i  CallSites | edge |  stack ) ( stack Off Limits!

Off Limits! unbalLeft  matched unbalLeft | (i unbalLeft 1  i  CallSites |  stack ) ( stack Off Limits! (

Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

Asymptotic Running Time [Reps, Horwitz, & Sagiv 95] CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)

Why Bother? “We’re only interested in million-line programs” Know thy enemy! “Any” algorithm must do these operations Avoid pitfalls (e.g., claiming O(N2) algorithm) The essence of “context sensitivity” Special cases “Gen/kill” problems: O(ED) Compression techniques Basic blocks SSA form, sparse evaluation graphs Demand algorithms

Relationship to Other Analysis Paradigms Dataflow analysis reachability versus equation solving Deduction Set constraints

The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); int add(int x, int y) { return x + y; }

The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); int add(int x, int y) { return x + y; }

The Need for Pointer Analysis int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; }

Flow-Sensitive Points-To Analysis q p = &q; p q p r1 r2 q p r1 r2 q p = q; r1 r2 q s1 s2 s3 p r1 r2 q s1 s2 s3 p p = *q; p s1 s2 q r1 r2 p s1 s2 q r1 r2 *p = q;

Flow-Sensitive  Flow-Insensitive start main exit main 3 2 1 4 5 3 2 1 4 5

Flow-Insensitive Points-To Analysis [Andersen 94, Shapiro & Horwitz 97] p = &q; p q p r1 r2 q p = q; r1 r2 q s1 s2 s3 p p = *q; p s1 s2 q r1 r2 *p = q;

Flow-Insensitive Points-To Analysis a = &e; b = a; c = &f; *b = c; d = *a; e b c f d

Flow-Insensitive Points-To Analysis Andersen [Thesis 94] Formulated using set constraints Cubic-time algorithm Shapiro & Horwitz (1995; [POPL 97]) Re-formulated as a graph-grammar problem Reps (1995; [unpublished]) Re-formulated as a Horn-clause program Melski (1996; see [Reps, IST98]) Re-formulated via CFL-reachability

CFL-Reachability via Dynamic Programming Graph Grammar A  B C B C A

CFL-Reachability = Chain Programs Graph Grammar A  B C x y B C z A a(X,Z) :- b(X,Y), c(Y,Z).

Base Facts for Points-To Analysis p = &q; assignAddr(p,q). p = q; assign(p,q). p = *q; assignStar(p,q). *p = q; starAssign(p,q).

Rules for Points-To Analysis (I) p = &q; p q pointsTo(P,Q) :- assignAddr(P,Q). p = q; p r1 r2 q pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).

Rules for Points-To Analysis (II) p = *q; r1 r2 q s1 s2 s3 p pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). *p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).

Rules for Points-To Analysis (II) p = *q; r1 r2 q s1 s2 s3 p pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). *p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S).

Creating a Chain Program *p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,P) :- pointsTo(P,R).

Base Facts for Points-To Analysis p = &q; assignAddr(p,q). assignAddr(q,p). p = q; assign(p,q). assign(q,p). p = *q; assignStar(p,q). assignStar(q,p). *p = q; starAssign(p,q). starAssign(q,p).

Creating a Chain Program pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(Q,P) :- assignAddr(Q,P). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P). pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R).

. . . and now to CFL-Reachability pointsTo  assign pointsTo pointsTo  assignStar pointsTo pointsTo pointsTo  assignAddr pointsTo  pointsTo starAssign pointsTo pointsTo  pointsTo pointsTo assignStar pointsTo  pointsTo assign

Points-To Analysis as CFL-Reachability: Consequences Points-to analysis solvable in time cubic in the number of variables Known previously [Andersen 94] Demand algorithms: What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

Relationship to Other Analysis Paradigms Dataflow analysis reachability versus equation solving Deduction Set constraints

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences Set Constraints Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

Structure-Transmitted Dependences [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

Set Constraints w = cons(x,y); v = car(w); McCarthy’s Equations Revisited Semantics of Set Constraints

CFL-Reachability versus Set Constraints Lazy languages: CFL-reachability is more natural car(cons(X,Y)) = X Strict languages: Set constraints are more natural car(cons(X,Y)) = X, provided I(Y) g v But . . . SC and CFL-reachability are equivalent! [Melski & Reps 97]

Solving Set Constraints W is “inhabited” X is “inhabited” Y is “inhabited” W is “inhabited” Y is “inhabited” X is “inhabited”

Simulating “Inhabited” W

Simulating “Inhabited” X Y W inhab

Simulating “Provided I(Y) g v” inhab X Y W provided I(Y) g v V

SC = CFL-Reachability: Consequences Demand algorithm for SC SC is log-space complete for PTIME Limitations on ability to parallelize algorithms for solving set-constraint problems

Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear . . . cubic . . . undecidable Beyond CFL-reachability

Exhaustive Versus Demand Analysis Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest

Exhaustive Versus Demand Analysis Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold? Demand analysis via CFL-reachability single-source/single-target CFL-reachability single-source/multi-target CFL-reachability multi-source/single-target CFL-reachability

All “appropriate” demands x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

Experimental Results [Horwitz , Reps, & Sagiv 1995] 53 C programs (200-6,700 lines) For a single fact of interest: demand always better than exhaustive All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

A Related Result [Sagiv, Reps, & Horwitz 1996] [Uses a generalized analysis technique] 38 C programs (300-6,000 lines) copy-constant propagation linear-constant propagation All “appropriate” demands always beats exhaustive factor of 1.14 to about 6

Exhaustive Versus Demand Analysis Demand algorithms for Interprocedural dataflow analysis Set constraints Points-to analysis

Demand Analysis and LP Queries (I) Flow-insensitive points-to analysis Does variable p point to q? Issue query: ?- pointsTo(p, q). Solve single-source/single-target L(pointsTo)-reachability problem What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

Demand Analysis and LP Queries (II) Flow-sensitive analysis Does a given fact f hold at a given point p? ?- dfFact(p, f). Which facts hold at a given point p? ?- dfFact(p, F). At which points does a given fact f hold? ?- dfFact(P, f). E.g., flow-sensitive points-to analysis ?- dfFact(p, pointsTo(x, Y)). ?- dfFact(P, pointsTo(x, y)). etc.

Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear . . . cubic . . . undecidable Beyond CFL-reachability

Interprocedural Backward Slice Enter main Call p Call p [ ] ) ( Enter p

( [ ) ] x y start p(a,b) a b start main if . . . x = 3 b = a p(x,y) return from p return from p y may be uninitialized here printf(y) printf(b) exit main exit p

Structure-Transmitted Dependences [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

Dependences + Matched Paths? Enter main x y hd hd-1 [ ] tl w=cons(x,y) Call p Call p w w ( ) Enter p w v = car(w)

Undecidable! [Reps, TOPLAS 00] hd hd-1 ( ) Interleaved Parentheses!

Themes Harnessing CFL-reachability Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear . . . cubic . . . undecidable Beyond CFL-reachability

CFL-Reachability via Dynamic Programming Graph Grammar A  B C B C A

Beyond CFL-Reachability: Composition of Linear Functions x.3x+5 x.2x+1 x.6x+11 (x.2x+1)  (x.3x+5) = x.6x+11

Beyond CFL-Reachability: Composition of Linear Functions Interprocedural constant propagation [Sagiv, Reps, & Horwitz TCS 96] Interprocedural path profiling The number of path fragments contributed by a procedure is a function [Melski & Reps CC 99]

Ball-Larus Intraprocedural Path Profiling Counting paths in the CFG Exit w1 w2 wk v NumPathsToExit(Exit) = 1 NumPathsToExit(v) =  NumPathsToExit(w) wsucc(v)

Melski-Reps Interprocedural Path Profiling Exit(P) = x. x Exit vertex GExit(P) = x.1 GExit vertex c = Exit(Q)  r Call vertex to Q with return vertex r wsucc(v) v =  w Otherwise Sharir-Pnueli Interprocedural Dataflow Analysis  Exit(P) = x. x Exit vertex c = Exit(Q)  r Call vertex to Q with return vertex r wsucc(v) v =  w Otherwise

Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)] Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit

T-Cyclicity in Hierarchical Kripke Structures SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k|3) rec: ? SN/SX SN/MX MN/SX MN/MX O(|k|) O(|k|) O(|k|) O(|k|3) O(|k||t|) [lin rec] O(|k|) [det]

Recursive HFSMs: Data Complexity SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL* O(|k|2) [L2] bad ? bad

Recursive HFSMs: Data Complexity SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k|3) O(|k||t|) [lin rec] O(|k|) [det] CTL O(|k|) bad O(|k|) bad CTL* O(|k|) bad O(|k|) bad Not Dual Problems!

CFL-Reachability: Scope of Applicability Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Verification Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Model-checking recursive HFSMs Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis

CFL-Reachability: Benefits Algorithms Exhaustive & demand Complexity Linear-time and cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .

But . . .  Model checking Dataflow analysis Huge graphs (10100 reachable states) Reachability/circularity queries Represent implicitly (OBDDs) Dataflow analysis Large graphs e.g., Stmts Vars ( 1011) CFL-reachability queries [Reps,Horwitz,Sagiv 95] OBDDs blew up [Siff & Reps 95 (unpub.)] . . . yes, we tried the usual tricks . . .

Most Significant Contributions: 1987-2000 Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms Interprocedural dataflow analysis [CC94,FSE95] All “appropriate” demands beats exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 75,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.)

Most Significant Contributions: 1987-2000 Unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks Cubic-time “barrier” Litmus test: quadratic-time algorithm?! PTIME-complete  limits to parallelizability Existence proofs for new algorithms Demand algorithm for set constraints Demand algorithm for points-to analysis

References Papers by Reps and collaborators: CFL-reachability http://www.cs.wisc.edu/~reps/ CFL-reachability Yannakakis, M., Graph-theoretic methods in database theory, PODS 90. Reps, T., Program analysis via graph reachability, Inf. and Softw. Tech. 98.

References Slicing, chopping, etc. Dataflow analysis Horwitz, Reps, & Binkley, TOPLAS 90 Reps, Horwitz, Sagiv, & Rosay, FSE 94 Reps & Rosay, FSE 95 Dataflow analysis Reps, Horwitz, & Sagiv, POPL 95 Horwitz, Reps, & Sagiv, FSE 95, TR-1283 Structure dependences; set constraints Reps, PEPM 95 Melski & Reps, Theor. Comp. Sci. 00

References Complexity Verification Beyond CFL-reachability Undecidability: Reps, TOPLAS 00? PTIME-completeness: Reps, Acta Inf. 96. Verification Dolev, Even, & Karp, Inf & Control 82. Benedikt, Godefroid, & Reps, In prep. Beyond CFL-reachability Sagiv, Reps, Horwitz, Theor. Comp. Sci 96 Melski & Reps, CC 99, TR-1382

Automatic Differentiation

Automatic Differentiation double F(double x) { int i; double ans = 1.0; for(i = 1; i <= n; i++) { ans = ans * f[i](x); } return ans; double delta = . . .; /* small constant */ double F’(double x) { return (F(x+delta) - F(x)) / delta; }

Automatic Differentiation double F (double x) { int i; double ans = 1.0; for(i = 1; i <= n; i++) { ans = ans * f[i](x); } return ans’;

Automatic Differentiation double F’(double x) { int i; double ans’ = 0.0; double ans = 1.0; for(i = 1; i <= n; i++) { ans’ = ans * f’[i](x) + ans’ * f[i](x); ans = ans * f[i](x); } return ans’;

Automatic Differentiation x1 y1 x2 xi+1 y2 yj x2 xi+1 y2 yj Program Chopping xi yj+1 xm yn