Program Analysis via Graph Reachability

Program Analysis via Graph Reachability
Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000

PLDI 00 Registration Form
Tutorial (morning): …………… $ ____ Tutorial (afternoon): ………….. $ ____ Tutorial (evening): ……………. $ – 0 –

Applications Program optimization
Program-understanding and software-reengineering Security information flow Verification model checking security of crypto-based protocols for distributed systems

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

. . . As Well As . . . Flow-insensitive points-to analysis
Complexity results Linear cubic undecidable variants PTIME-completeness Model checking of recursive hierarchical finite-state machines “infinite”-state systems linear-time and cubic-time algorithms

. . . And Also Analysis of attribute grammars
Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Formal-language problems CFL-recognition (given G and , is   L(G)?) 2DPDA- and 2NPDA-simulation Given M and , is   L(M)? String-matching problems

Unifying Conceptual Model for Dataflow-Analysis Literature
Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] Linear-time bidirectional gen-kill [Dhamdhere 94] Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]

Collaborators Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski
David Binkley Michael Benedikt Patrice Godefroid

Themes Harnessing CFL-reachability
Relationship to other analysis paradigms Exhaustive alg.  Demand alg. Understanding complexity Linear cubic undecidable Beyond CFL-reachability

Backward slice with respect to “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Backward slice with respect to “printf(“%d\n”,i)”
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Forward slice with respect to “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

What Are Slices Useful For?
Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

Line-Character-Count Program
void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Character-Count Program
void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Line-Character-Count Program
void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Line-Count Program void line_count(FILE *f) { int lines = 0;
int chars; BOOL eof_flag = FALSE; int n; extern void scan_line2(FILE *f, BOOL *bptr, int *iptr); scan_line2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Specialization Via Slicing
wc -lc wc -c wc -l Not partial evaluation! void line_count(FILE *f);

Control Flow Graph int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

Flow Dependence Graph Flow dependence p q Value of variable
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

Control Dependence Graph
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG)
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG)
int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (2) int main() { int sum = 0; int i = 1;

Slice Extraction int main() { int i = 1; while (i < 11) {
i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

CodeSurfer

Browsing a Dependence Graph
Pretend this is your favorite browser What does clicking on a link do? You get a new page Or you move to an internal tag

Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

System Dependence Graph (SDG)
Enter main Call p Call p Enter p

SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1
Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

Interprocedural Backward Slice

Interprocedural Backward Slice (2)

Enter main Call p Call p ) ( [ ] Enter p

Matched-Parenthesis Path
) ( ) [

Slice Extraction Enter main Call p Enter p

Slice of the Sum Program
Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

CFL-Reachability [Yannakakis 90]
G: Graph (N nodes, E edges) L: A context-free language L-path from s to t iff Running time: O(N 3)

Interprocedural Slicing via CFL-Reachability
Graph: System dependence graph L: L(matched) [roughly] Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]
CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

Ordinary Graph Reachability
( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

CFL-Reachability via Dynamic Programming
Graph Grammar A  B C B C A

Degenerate Case: CFL-Recognition
exp  id | exp + exp | exp * exp | ( exp )  “(a + b) * c”  L(exp) ? ) ( a c b + * s t

Degenerate Case: CFL-Recognition
exp  id | exp + exp | exp * exp | ( exp ) “a + b) * c +”  L(exp) ? * a + ) b c s t

CYK: Context-Free Recognition
M  M M | ( M ) | [ M ] | ( ) | [ ]  = “( [ ] ) [ ]” Is   L(M)?

CYK: Context-Free Recognition
M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M M  M M | ( M ) | [ M ] | ( ) | [ ]

Is “( [ ] ) [ ]”  L(M)? length start M  [ ] LPM  ( M ( [ ] ) [ ]
( [ ] ) [ ] start { ( } { [ } { ] } { [ } { ) } { ] } LPM  ( M M  [ ]  {M}  {M} {LPM}   {M} 

 Is “( [ ] ) [ ]”  L(M)? length start M  M M ( [ ] ) [ ] { (} { [ }
( [ ] ) [ ] start { (} { [ } { ] } { [ } { ) } { ] } M  M M  {M}  {M} {LPM}   {M}   M? {M}

 CYK: Graphs vs. Tables Is “( [ ] ) [ ]”  L(M)? s t ( [ ] ) [ ] M
( [ ] ) [ ] M LPM M M M  M  M M | LPM ) | LBM ] | ( ) | [ ] LPM  ( M LBM  [ M

Dynamic Transitive Closure ?!
Aiken et al. Set-constraint solvers Points-to analysis Henglein et al. type inference But a CFL captures a non-transitive reachability relation [Valiant 75]

Program Chopping Given source S and target T, what program points transmit effects from S to T? S T Intersect forward slice from S with backward slice from T, right?

Non-Transitivity and Slicing
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”  Backward slice with respect to “printf(“%d\n”,i)”

int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; }  Chop with respect to “sum = 0” and “printf(“%d\n”,i)”

Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout ( ] Enter add x = xin y = yin x = x + y xout = x

“Precise interprocedural chopping”
Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]

CF-Recognition vs. CFL-Reachability
Chain graphs General grammar: sub-cubic time [Valiant75] LL(1), LR(1): linear time CFL-Reachability General graphs: O(N3) LL(1): O(N3) LR(1): O(N3) Certain kinds of graphs: O(N+E) Regular languages: O(N+E) Gen/kill IDFA GMOD IDFA

Regular-Language Reachability [Yannakakis 90]
G: Graph (N nodes, E edges) L: A regular language L-path from s to t iff Running time: O(N+E) Ordinary reachability (= transitive closure) Label each edge with e L is e* vs. O(N3)

Security of Crypto-Based Protocols for Distributed System
“Ping-pong” protocols (1) X —EncryptY(M X) Y (2) Y —EncryptX(M) X [Dolev & Yao 83] O(N8) algorithm [Dolev, Even, & Karp 83] Less well known than [Dolev & Yao 83] O(N3) algorithm

[Dolev, Even, & Karp 83] Id  EncryptX Id DecryptX
Id  DecryptX Id EncryptX Id  Message Saboteur EY AX AZ Id ?

Relationship to Other Analysis Paradigms
Dataflow analysis reachability versus equation solving Deduction Set constraints

Dataflow Analysis Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution Examples Constant propagation Reaching definitions Live variables Possibly uninitialized variables

Useful For . . . Optimizing compilers Parallelizing compilers
Tools that detect possible logical errors Tools that show the effects of a proposed modification

Possibly Uninitialized Variables
{} Start {w,x,y} x = 3 {w,y} if . . . {w,y} y = x {w,y} y = w {w} w = 8 {w,y} {} printf(y) {w,y}

Precise Intraprocedural Analysis
start n

( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b)
return from p return from p printf(y) printf(b) exit main exit p

Precise Interprocedural Analysis
ret start n ( ) [Sharir & Pnueli 81]

Representing Dataflow Functions
b c Identity Function a b c Constant Function

Representing Dataflow Functions
b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Composing Dataflow Functions
b c a b c a a b c

( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3
Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Off Limits! matched  matched matched
| (i matched )i  i  CallSites | edge |  stack ) ( stack Off Limits!

Off Limits! unbalLeft  matched unbalLeft
| (i unbalLeft  i  CallSites |  stack ) ( stack Off Limits! (

Interprocedural Dataflow Analysis via CFL-Reachability
Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]
CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)

Why Bother? “We’re only interested in million-line programs”
Know thy enemy! “Any” algorithm must do these operations Avoid pitfalls (e.g., claiming O(N2) algorithm) The essence of “context sensitivity” Special cases “Gen/kill” problems: O(ED) Compression techniques Basic blocks SSA form, sparse evaluation graphs Demand algorithms

The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (*q < 11) { *p = (*f)(*p,*q); *q = (*f)(*q,1); } printf(“%d\n”,*p); printf(“%d\n”,*q); int add(int x, int y) { return x + y; }

The Need for Pointer Analysis
int main() { int sum = 0; int i = 1; int *p = ∑ int *q = &i; int (*f)(int,int) = add; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; }

Flow-Sensitive Points-To Analysis
q p = &q; p q p r1 r2 q p r1 r2 q p = q; r1 r2 q s1 s2 s3 p r1 r2 q s1 s2 s3 p p = *q; p s1 s2 q r1 r2 p s1 s2 q r1 r2 *p = q;

Flow-Sensitive  Flow-Insensitive
start main exit main 3 2 1 4 5 3 2 1 4 5

Flow-Insensitive Points-To Analysis [Andersen 94, Shapiro & Horwitz 97]
p = &q; p q p r1 r2 q p = q; r1 r2 q s1 s2 s3 p p = *q; p s1 s2 q r1 r2 *p = q;

Flow-Insensitive Points-To Analysis
a = &e; b = a; c = &f; *b = c; d = *a; e b c f d

Flow-Insensitive Points-To Analysis
Andersen [Thesis 94] Formulated using set constraints Cubic-time algorithm Shapiro & Horwitz (1995; [POPL 97]) Re-formulated as a graph-grammar problem Reps (1995; [unpublished]) Re-formulated as a Horn-clause program Melski (1996; see [Reps, IST98]) Re-formulated via CFL-reachability

CFL-Reachability = Chain Programs
Graph Grammar A  B C x y B C z A a(X,Z) :- b(X,Y), c(Y,Z).

Base Facts for Points-To Analysis
p = &q; assignAddr(p,q). p = q; assign(p,q). p = *q; assignStar(p,q). *p = q; starAssign(p,q).

Rules for Points-To Analysis (I)
p = &q; p q pointsTo(P,Q) :- assignAddr(P,Q). p = q; p r1 r2 q pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R).

Rules for Points-To Analysis (II)
p = *q; r1 r2 q s1 s2 s3 p pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). *p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S).

Creating a Chain Program
*p = q; p s1 s2 q r1 r2 pointsTo(R,S) :- starAssign(P,Q),pointsTo(P,R),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(P,R),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(R,P) :- pointsTo(P,R).

Base Facts for Points-To Analysis
p = &q; assignAddr(p,q). assignAddr(q,p). p = q; assign(p,q). assign(q,p). p = *q; assignStar(p,q). assignStar(q,p). *p = q; starAssign(p,q). starAssign(q,p).

Creating a Chain Program
pointsTo(P,Q) :- assignAddr(P,Q). pointsTo(Q,P) :- assignAddr(Q,P). pointsTo(P,R) :- assign(P,Q), pointsTo(Q,R). pointsTo(R,P) :- pointsTo(R,Q), assign(Q,P). pointsTo(P,S) :- assignStar(P,Q),pointsTo(Q,R),pointsTo(R,S). pointsTo(S,P) :- pointsTo(S,R),pointsTo(R,Q),assignStar(Q,P). pointsTo(R,S) :- pointsTo(R,P),starAssign(P,Q),pointsTo(Q,S). pointsTo(S,R) :- pointsTo(S,Q),starAssign(Q,P),pointsTo(P,R).

. . . and now to CFL-Reachability
pointsTo  assign pointsTo pointsTo  assignStar pointsTo pointsTo pointsTo  assignAddr pointsTo  pointsTo starAssign pointsTo pointsTo  pointsTo pointsTo assignStar pointsTo  pointsTo assign

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences Set Constraints Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

Set Constraints w = cons(x,y); v = car(w);
McCarthy’s Equations Revisited Semantics of Set Constraints

CFL-Reachability versus Set Constraints
Lazy languages: CFL-reachability is more natural car(cons(X,Y)) = X Strict languages: Set constraints are more natural car(cons(X,Y)) = X, provided I(Y) g v But SC and CFL-reachability are equivalent! [Melski & Reps 97]

Solving Set Constraints
W is “inhabited” X is “inhabited” Y is “inhabited” W is “inhabited” Y is “inhabited” X is “inhabited”

Simulating “Inhabited”
W

Simulating “Inhabited”
X Y W inhab

Simulating “Provided I(Y) g v”
inhab X Y W provided I(Y) g v V

Exhaustive Versus Demand Analysis
Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest

Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold? Demand analysis via CFL-reachability single-source/single-target CFL-reachability single-source/multi-target CFL-reachability multi-source/single-target CFL-reachability

All “appropriate” demands
x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

Experimental Results [Horwitz , Reps, & Sagiv 1995]
53 C programs (200-6,700 lines) For a single fact of interest: demand always better than exhaustive All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

A Related Result [Sagiv, Reps, & Horwitz 1996]
[Uses a generalized analysis technique] 38 C programs (300-6,000 lines) copy-constant propagation linear-constant propagation All “appropriate” demands always beats exhaustive factor of 1.14 to about 6

Demand algorithms for Interprocedural dataflow analysis Set constraints Points-to analysis

Demand Analysis and LP Queries (I)
Flow-insensitive points-to analysis Does variable p point to q? Issue query: ?- pointsTo(p, q). Solve single-source/single-target L(pointsTo)-reachability problem What does variable p point to? Issue query: ?- pointsTo(p, Q). Solve single-source L(pointsTo)-reachability problem What variables point to q? Issue query: ?- pointsTo(P, q). Solve single-target L(pointsTo)-reachability problem

Demand Analysis and LP Queries (II)
Flow-sensitive analysis Does a given fact f hold at a given point p? ?- dfFact(p, f). Which facts hold at a given point p? ?- dfFact(p, F). At which points does a given fact f hold? ?- dfFact(P, f). E.g., flow-sensitive points-to analysis ?- dfFact(p, pointsTo(x, Y)). ?- dfFact(P, pointsTo(x, y)). etc.

Interprocedural Backward Slice
Enter main Call p Call p [ ] ) ( Enter p

( [ ) ] x y start p(a,b) a b start main if . . . x = 3 b = a p(x,y)
return from p return from p y may be uninitialized here printf(y) printf(b) exit main exit p

Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

Dependences + Matched Paths?
Enter main x y hd hd-1 [ ] tl w=cons(x,y) Call p Call p w w ( ) Enter p w v = car(w)

Undecidable! [Reps, TOPLAS 00]
hd hd-1 ( ) Interleaved Parentheses!

Beyond CFL-Reachability: Composition of Linear Functions
x.3x+5 x.2x+1 x.6x+11 (x.2x+1)  (x.3x+5) = x.6x+11

Beyond CFL-Reachability: Composition of Linear Functions
Interprocedural constant propagation [Sagiv, Reps, & Horwitz TCS 96] Interprocedural path profiling The number of path fragments contributed by a procedure is a function [Melski & Reps CC 99]

Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]
Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit

T-Cyclicity in Hierarchical Kripke Structures
SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k|3) rec: ? SN/SX SN/MX MN/SX MN/MX O(|k|) O(|k|) O(|k|) O(|k|3) O(|k||t|) [lin rec] O(|k|) [det]

Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL* O(|k|2) [L2] bad ? bad

Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k|3) O(|k||t|) [lin rec] O(|k|) [det] CTL O(|k|) bad O(|k|) bad CTL* O(|k|) bad O(|k|) bad Not Dual Problems!

CFL-Reachability: Scope of Applicability
Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Verification Security of crypto-based protocols for distributed systems [Dolev, Even, & Karp 83] Model-checking recursive HFSMs Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis

CFL-Reachability: Benefits
Algorithms Exhaustive & demand Complexity Linear-time and cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .

Most Significant Contributions: 1987-2000
Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms Interprocedural dataflow analysis [CC94,FSE95] All “appropriate” demands beats exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 75,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.)

Most Significant Contributions: 1987-2000
Unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks Cubic-time “barrier” Litmus test: quadratic-time algorithm?! PTIME-complete  limits to parallelizability Existence proofs for new algorithms Demand algorithm for set constraints Demand algorithm for points-to analysis

References Papers by Reps and collaborators: CFL-reachability
CFL-reachability Yannakakis, M., Graph-theoretic methods in database theory, PODS 90. Reps, T., Program analysis via graph reachability, Inf. and Softw. Tech. 98.

References Slicing, chopping, etc. Dataflow analysis
Horwitz, Reps, & Binkley, TOPLAS 90 Reps, Horwitz, Sagiv, & Rosay, FSE 94 Reps & Rosay, FSE 95 Dataflow analysis Reps, Horwitz, & Sagiv, POPL 95 Horwitz, Reps, & Sagiv, FSE 95, TR-1283 Structure dependences; set constraints Reps, PEPM 95 Melski & Reps, Theor. Comp. Sci. 00

References Complexity Verification Beyond CFL-reachability
Undecidability: Reps, TOPLAS 00? PTIME-completeness: Reps, Acta Inf. 96. Verification Dolev, Even, & Karp, Inf & Control 82. Benedikt, Godefroid, & Reps, In prep. Beyond CFL-reachability Sagiv, Reps, Horwitz, Theor. Comp. Sci 96 Melski & Reps, CC 99, TR-1382

Program Analysis via Graph Reachability

Similar presentations

Presentation on theme: "Program Analysis via Graph Reachability"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Program Analysis via Graph Reachability

Similar presentations

Presentation on theme: "Program Analysis via Graph Reachability"— Presentation transcript:

Similar presentations

About project

Feedback