Program Representations Xiangyu Zhang. CS590Z Software Defect Analysis Program Representations  Static program representations Abstract syntax tree;

Slides:



Advertisements
Similar presentations
Overview Structural Testing Introduction – General Concepts
Advertisements

A Survey of Program Slicing Techniques A Survey of Program Slicing Techniques Sections 3.1,3.6 Swathy Shankar
Program Slicing – Based Techniques
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
 Program Slicing Long Li. Program Slicing ? It is an important way to help developers and maintainers to understand and analyze the structure.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Program Representations. Representing programs Goals.
Program Slicing. 2 CS510 S o f t w a r e E n g i n e e r i n g Outline What is slicing? Why use slicing? Static slicing of programs Dynamic Program Slicing.
Program Slicing Mark Weiser and Precise Dynamic Slicing Algorithms Xiangyu Zhang, Rajiv Gupta & Youtao Zhang Presented by Harini Ramaprasad.
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Presented By: Krishna Balasubramanian
1 Cost Effective Dynamic Program Slicing Xiangyu Zhang Rajiv Gupta The University of Arizona.
CS590F Software Reliability What is a slice? S: …. = f (v)  Slice of v at S is the set of statements involved in computing v’s value at S. [Mark Weiser,
Introduction to Program Slicing Presenter: M. Amin Alipour Software Design Laboratory
Program Slicing Xiangyu Zhang. CS590F Software Reliability What is a slice? S: …. = f (v)  Slice of v at S is the set of statements involved in computing.
Program Representations Xiangyu Zhang. CS590F Software Reliability Why Program Representations  Initial representations Source code (across languages).
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
PSUCS322 HM 1 Languages and Compiler Design II Basic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
ECE1724F Compiler Primer Sept. 18, 2002.
Chair of Software Engineering Fundamentals of Program Analysis Dr. Manuel Oriol.
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Comparison Caller precisionCallee precisionCode bloat Inlining context-insensitive interproc Context sensitive interproc Specialization.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Reps Horwitz and Sagiv 95 (RHS) Another approach to context-sensitive interprocedural analysis Express the problem as a graph reachability query Works.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Survey of program slicing techniques
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
From last class. The above is Click’s solution (PLDI 95)
Precision Going back to constant prop, in what cases would we lose precision?
ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Bug Localization with Machine Learning Techniques Wujie Zheng
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
1 Program Slicing Amir Saeidi PhD Student UTRECHT UNIVERSITY.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Chapter 11: Dynamic Analysis Omar Meqdadi SE 3860 Lecture 11 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
CASE/Re-factoring and program slicing
1 The System Dependence Graph and its use in Program Slicing.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Program Representations. Representing programs Goals.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
Program Slicing Techniques CSE 6329 Spring 2013 Parikksit Bhisay
CS 614: Theory and Construction of Compilers Lecture 15 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Control Flow Analysis Compiler Baojian Hua
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
Static Slicing Static slice is the set of statements that COULD influence the value of a variable for ANY input. Construct static dependence graph Control.
Program Representation
White-Box Testing.
Dataflow Testing G. Rothermel.
A Survey of Program Slicing Techniques: Section 4
Program Slicing Baishakhi Ray University of Virginia
Factored Use-Def Chains and Static Single Assignment Forms
Dynamic Program Analysis
White-Box Testing.
Program Slicing Xiangyu Zhang.
Control Flow Analysis (Chapter 7)
Pointer analysis.
Static Single Assignment
Program Representation
Presentation transcript:

Program Representations Xiangyu Zhang

CS590Z Software Defect Analysis Program Representations  Static program representations Abstract syntax tree; Control flow graph; Program dependence graph; Call graph; Points-to relations.  Dynamic program representations Control flow trace, address trace and value trace; Dynamic dependence graph; Whole execution trace;

CS590Z Software Defect Analysis (1) Abstract syntax tree  An abstract syntax tree (AST) is a finite, labeled, directed tree, where the internal nodes are labeled by operators, and the leaf nodes represent the operands of the operators. Program chipping.

CS590Z Software Defect Analysis (2) Control Flow Graph (CFG)  Consists of basic blocks and edges A maximal sequence of consecutive instructions such that inside the basic block an execution can only proceed from one instruction to the next (SESE). Edges represent potential flow of control between BBs. Program path. B1 B2B3 B4  CFG =  V = Vertices, nodes (BBs)  E = Edges, potential flow of control E  V × V  Entry, Exit  V, unique entry and exit

CS590Z Software Defect Analysis (2) An Example of CFG 1: sum=0 2: i=1 3: while ( i<N) do 4:i=i+1 5:sum=sum+i endwhile 6: print(sum) 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: sum=sum+i 6: print (sum) BB- A maximal sequence of consecutive instructions such that inside the basic block an execution can only proceed from one instruction to the next (SESE).

CS590Z Software Defect Analysis (3) Program Dependence Graph (PDG)– Data Dependence  S data depends T if there exists a control flow path from T to S and a variable is defined at T and then used at S

CS590Z Software Defect Analysis (3) PDG – Control Dependence  X dominates Y if every possible program path from the entry to Y has to pass X. Strict dominance, dominator, immediate dominator. 1: sum=0 2: i=1 3: while ( i<N) do 4:i=i+1 5:sum=sum+i endwhile 6: print(sum) 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: sum=sum+i 6: print (sum) DOM(6)={1,2,3,6} IDOM(6)=3

CS590Z Software Defect Analysis (3) PDG – Control Dependence  X post-dominates Y if every possible program path from Y to EXIT has to pass X. Strict post-dominance, post-dominator, immediate post- dominance. 1: sum=0 2: i=1 3: while ( i<N) do 4:i=i+1 5:sum=sum+i endwhile 6: print(sum) 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: sum=sum+i 6: print (sum) PDOM(5)={3,5,6} IPDOM(5)=3

CS590Z Software Defect Analysis (3) PDG – Control Dependence  Intuitively, Y is control-dependent on X iff X directly determines whether Y executes (statements inside one branch of a predicate are usually control dependent on the predicate) there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y X is not strictly post-dominated by Y Sorin Lerner X Y Not post-dominated by Y Every node is post-dominated by Y

CS590Z Software Defect Analysis (3) PDG – Control Dependence 1: sum=0 2: i=1 3: while ( i<N) do 4:i=i+1 5:sum=sum+i endwhile 6: print(sum) 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: sum=sum+i 6: print (sum) CD(5)=3 CD(3)=3, tricky! A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y X is not strictly post-dominated by Y

CS590Z Software Defect Analysis (3) PDG – Control Dependence is not Syntactically Explicit 1: sum=0 2: i=1 3: while ( i<N) do 4:i=i+1 5:if (i%2==0) 6: continue; 7:sum=sum+i endwhile 8: print(sum) 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: if (i%2==0) 8: print (sum) 7: print (sum) A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y X is not strictly post-dominated by Y

CS590Z Software Defect Analysis (3) PDG – Control Dependence is Tricky! A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y X is not strictly post-dominated by Y  Can a statement control depends on two predicates?

CS590Z Software Defect Analysis (3) PDG – Control Dependence is Tricky! A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y X is not strictly post-dominated by Y 1: if ( p1 || p2 ) 2: s1; 3: s2; 1: ? p1  Can one statement control depends on two predicates? 1: ? p2 2: s1 3: s2 What if ? 1: if ( p1 && p2 ) 2: s1; 3: s2; Interprocedural CD, CD in case of exception,…

CS590Z Software Defect Analysis (3) PDG  A program dependence graph consists of control dependence graph and data dependence graph  Why it is so important to software reliability? In debugging, what could possibly induce the failure? In security p=getpassword( ); … if (p==“zhang”) { send (m); }

CS590Z Software Defect Analysis (4) Points-to Graph  Aliases: two expressions that denote the same memory location.  Aliases are introduced by: pointers call-by-reference array indexing C unions

CS590Z Software Defect Analysis (4) Points-to Graph  Aliases: two expressions that denote the same memory location.  Aliases are introduced by: pointers call-by-reference array indexing C unions

CS590Z Software Defect Analysis (4) Why Do We Need Points-to Graphs  Debugging x.lock();... y.unlock(); // same object as x?  Security F(x,y) { x.f=password; … print (y.f); } F(a,a); disaster!

CS590Z Software Defect Analysis (4) Points-to Graph  Points-to Graph at a program point, compute a set of pairs of the form p -> x, where p MAY/MUST points to x. m(p) { r = new C(); p->f = r; t = new C(); if (…) q=p; r->f = t; } r

CS590Z Software Defect Analysis (4) Points-to Graph  Points-to Graph at a program point, compute a set of pairs of the form p->x, where p MAY/MUST points to x. m(p) { r = new C(); p->f = r; t = new C(); if (…) q=p; r->f = t; } p r f

CS590Z Software Defect Analysis (4) Points-to Graph  Points-to Graph at a program point, compute a set of pairs of the form p->x, where p MAY/MUST points to x. m(p) { r = new C(); p->f = r; t = new C(); if (…) q=p; r->f = t; } p r f t

CS590Z Software Defect Analysis (4) Points-to Graph  Points-to Graph at a program point, compute a set of pairs of the form p->x, where p MAY/MUST points to x. m(p) { r = new C(); p->f = r; t = new C(); if (…) q=p; r->f = t; } p q r f t

CS590Z Software Defect Analysis (4) Points-to Graph  Points-to Graph at a program point, compute a set of pairs of the form p->x, where p MAY/MUST points to x. m(p) { r = new C(); p->f = r; t = new C(); if (…) q=p; r->f = t; } p q r f t f p->f->f and t are aliases

CS590Z Software Defect Analysis (5) Call Graph  Call graph nodes are procedures edges are calls  Hard cases for building call graph calls through function pointers Can the password acquired at A be leaked at G?

CS590Z Software Defect Analysis How to acquire and use these representations?  Will be covered by later lectures.

CS590Z Software Defect Analysis Program Representations  Static program representations Abstract syntax tree; Control flow graph; Program dependence graph; Call graph; Points-to relations.  Dynamic program representations Control flow trace; Address trace, Value trace; Dynamic dependence graph; Whole execution trace;

CS590Z Software Defect Analysis (1) Control Flow Trace 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: sum=sum+i 6: print (sum) 1 1 : sum=0 3 1 : while ( i<N) do 5 1 : sum=sum+i 3 2 : while ( i<N) do 4 2 : i=i : while ( i<N) do 6 1 : print (sum) N=2: 2 1 : i=1 4 1 : i=i : sum=sum+i x is a program point, x i is an execution point

CS590Z Software Defect Analysis (1) Control Flow Trace 3: while ( i<N) do 1: sum=0 2: i=1 4: i=i+1 5: sum=sum+i 6: print (sum) 1 1 : sum=0 i=1 3 1 : while ( i<N) do 4 1 : i=i+1 sum=sum+i 3 2 : while ( i<N) do 4 2 : i=i+1 sum=sum+i 3 3 : while ( i<N) do 6 1 : print (sum) N=2: A More Compact CFT:

CS590Z Software Defect Analysis (2) Dynamic Dependence Graph (DDG) Input: N=2 5 1 : for i=1 to N do 6 1 : if (i%2==0) then 8 1 : a=a : z=0 2 1 : a=0 3 1 : b=2 4 1 : p=&b 1: z=0 2: a=0 3: b=2 4: p=&b 5: for i = 1 to N do 6: if ( i %2 == 0) then 7: p=&a endif endfor 8: a=a+1 9: z=2*(*p) 10: print(z)

CS590Z Software Defect Analysis (2) Dynamic Dependence Graph (DDG) Input: N=2 5 1 : for i=1 to N do 6 1 : if (i%2==0) then 7 1 : p=&a 8 1 : a=a : z=2*(*p) 10 1 : print(z) 1 1 : z=0 2 1 : a=0 3 1 : b=2 4 1 : p=&b 5 2 : for I=1 to N do 6 2 : if (i%2==0) then 8 2 : a=a : z=2*(*p) 1: z=0 2: a=0 3: b=2 4: p=&b 5: for i = 1 to N do 6: if ( i %2 == 0) then 7: p=&a endif endfor 8: a=a+1 9: z=2*(*p) 10: print(z) One use has only one definition at runtime; One statement instance control depends on only one predicate instance.

CS590Z Software Defect Analysis (3) Whole Execution Trace 5:for i=1 to N 6:if (i%2==0) then 7: p=&a 8: a=a+1 9: z=2*(*p) 10: print(z) T F 1: z=0 2: a=0 3: b=2 4: p=&b T Input: N=2 1 1 : z=0 2 1 : a=0 3 1 : b=2 4 1 : p=&b 5 1 : for i = 1 to N do 6 1 : if ( i %2 == 0) then 8 1 : a=a : z=2*(*p) 5 2 : for i = 1 to N do 6 2 : if ( i %2 == 0) then 7 1 : p=&a 8 2 : a=a : z=2*(*p) 10 1 : print(z) T (3,8) (2,7) (7,12) (11,13) (13,14) (4,8) (12,13) (5,6)(9,10) (10,11) (5,7)(9,12) (5,8)(9,13) ,9 6, ,12 8,13 14 F &b &a ,2 F,T 1,2 4,4 4

CS590Z Software Defect Analysis (3) Whole Execution Trace S1S1 Multiple streams of numbers.

CS590Z Software Defect Analysis Program Representations  Static program representations Abstract syntax tree; Control flow graph; Program dependence graph; Call graph; Points-to relations.  Dynamic program representations Control flow trace, address trace and value trace; Dynamic dependence graph; Whole execution trace;

CS590Z Software Defect Analysis What is a slice? S: …. = f (v)  Slice of v at S is the set of statements involved in computing v’s value at S. [Mark Weiser, 1982] Data dependence Control dependence Void main ( ) { int I=0; int sum=0; while (I<N) { sum=add(sum,I); I=add(I,1); } printf (“sum=%d\n”,sum); printf(“I=%d\n”,I);

CS590Z Software Defect Analysis How to do slicing?  Static analysis Input insensitive May analysis  Dependence Graph  Characteristics Very fast Very imprecise b=0 a=2 1 <=i <=N if ((i++)%2= =1) a=a+1 b=a*2 z=a+b print(z) TF T F

CS590Z Software Defect Analysis Why is a static slice imprecise? S1:a=…S2:b=… L1:…=*p Use of Pointers – static alias analysis is very imprecise Use of function pointers – hard to know which function is called, conservative expectation results in imprecision S1:x=…S2:x=… L1:…=x All possible program paths

CS590Z Software Defect Analysis Dynamic Slicing  Korel and Laski, 1988  Dynamic slicing makes use of all information about a particular execution of a program and computes the slice based on an execution history (trace) Trace consists control flow trace and memory reference trace  A dynamic slice query is a triple  Smaller, more precise, more helpful to the user

CS590Z Software Defect Analysis Dynamic Slicing Example - background 1: b=0 2: a=2 3: for i= 1 to N do 4: if ((i++)%2==1) then 5: a = a+1 else 6: b = a*2 endif done 7: z = a+b 8: print(z) For input N=2, 1 1 : b=0 [b=0] 2 1 : a=2 3 1 : for i = 1 to N do [i=1] 4 1 : if ( (i++) %2 == 1) then [i=1] 5 1 : a=a+1 [a=3] 3 2 : for i=1 to N do [i=2] 4 2 : if ( i%2 == 1) then [i=2] 6 1 : b=a*2 [b=6] 7 1 : z=a+b [z=9] 8 1 : print(z) [z=9]

CS590Z Software Defect Analysis Issues about Dynamic Slicing  Precision – perfect  Running history – very big ( GB )  Algorithm to compute dynamic slice - slow and very high space requirement.

CS590Z Software Defect Analysis Backward vs. Forward 1 main( ) 2 { 3 int i, sum; 4 sum = 0; 5 i = 1; 6 while(i <= 10) 7 { 8sum = sum + 1; 9++ i; 10 } 11Cout<< sum; 12Cout<< i; 13} An Example Program & its forward slice w.r.t.

CS590Z Software Defect Analysis Comments  Want to know more? Frank Tip’s survey paper (1995)  Static slicing is very useful for static analysis Code transformation, program understanding, etc. Points-to analysis is the key challenge Not as useful in reliability as dynamic slicing  Dynamic slicing Precise  good for defect analysis. Solution space is much larger. There exist hybrid techniques.

CS590Z Software Defect Analysis Efficiency  How are dynamic slices computed? Execution traces  control flow trace -- dynamic control dependences  memory reference trace -- dynamic data dependences Construct a dynamic dependence graph Traverse dynamic dependence graph to compute slices

CS590Z Software Defect Analysis How to Detect Dynamic Dependence  Dynamic Data Dependence Shadow space (SS)  Addr  Abstract State Virtual SpaceShadow Space [r2] s1 x : ST r1, [r2] SS(r2)=s1 x s1 x s2 y : LD [r1], r2 [r1] s2 y  SS(r1)=s1 x Dynamic control dependence is more tricky!