Program Representation

Program Representation
CS 510/10 Program Representation Xiangyu Zhang

Why Program Representations
Original representations Source code (across languages). Binaries (across machines and platforms). Source code / binaries + test cases. They are hard for machines to analyze. Software are translated into certain representations before analyses are applied. As we said earlier, 2/3 of the semester will be spent on algorithmic software engineering, which is to algorithmically analyze software to achieve goals such as dependability and productivity. That is to say, we want to look into software and understand its working, and try to transform it if needed. In contrast, religious software engineering conveys principles and experience, more in a form of rules, achieving the same goals by telling people do the right things. In order to do algorithmic software engineering, machines have to understand software in depth. What is software, from a machine point view. It is a piece of source code, which may be written in multiple languages. In many cases, software products are simply binary executables. For example, most close-source software provides only executables. In some occasions, a software is provided with a test suite too. Well, it is enough for a machine to run the software. But it is mostly not sufficient to analyze software. For example, assume I am concerned about if a software has any vulnerabilities, or hidden malicious logic like viruses, trojans. It is not adequate. Assume the software often hangs or runs very slow, inspecting the root causes of these problems demands a better view of the software. What do people do? Software are translated into more informative format which makes understanding and analyzing more amenable.

Outline Control flow graphs. Program dependence graphs.
Super control flow graphs.

Control Flow Graph Chapter 1.14 of “Foundations of Software Engineering” Available on the course website. The most commonly used program representation. It is widely used in program analysis, malware analysis, testing.

Program representation: Basic blocks
A basic block in program P is a sequence of consecutive statements with a single entry and a single exit point. Thus a block has unique entry and exit points. Control always enters a basic block at its entry point and exits from its exit point. There is no possibility of exit or a halt at any point inside the basic block except at its exit point. The entry and exit points of a basic block coincide when the block contains only one statement. © Aditya P. Mathur 2005 5

Basic blocks: Example Example: Computing x raised to y
Write the code on the board, and explain basic blocks and control flow graph © Aditya P. Mathur 2005 6

Basic blocks: Example (contd.)
© Aditya P. Mathur 2005 7

Control Flow Graph (CFG)
A control flow graph (or flow graph) G is defined as a finite set N of nodes and a finite set E of edges. An edge (i, j) in E connects two nodes ni and nj in N. We often write G= (N, E) to denote a flow graph G with nodes given by N and edges by E. © Aditya P. Mathur 2005 8

Control Flow Graph (CFG)
In a flow graph of a program, each basic block becomes a node and edges are used to indicate the flow of control between blocks. An edge (i, j) connecting basic blocks bi and bj implies that control can go from block bi to block bj. We also assume that there is a node labeled Start in N that has no incoming edge, and another node labeled End, also in N, that has no outgoing edge. © Aditya P. Mathur 2005 9

CFG Example N={Start, 1, 2, 3, 4, 5, 6, 7, 8, 9, End}
E={(Start,1), (1, 2), (1, 3), (2,4), (3, 4), (4, 5), (5, 6), (6, 5), (5, 7), (7, 8), (7, 9), (9, End)} © Aditya P. Mathur 2005 10

CFG Example N={Start, 1, 2, 3, 4, 5, 6, 7, 8, 9, End}
Same CFG with statements removed. N={Start, 1, 2, 3, 4, 5, 6, 7, 8, 9, End} E={(Start,1), (1, 2), (1, 3), (2,4), (3, 4), (4, 5), (5, 6), (6, 5), (5, 7), (7, 8), (7, 9), (9, End)} © Aditya P. Mathur 2005 11

Paths Consider a flow graph G= (N, E). A sequence of k edges, k>0, (e_1, e_2, … e_k) , denotes a path of length k through the flow graph if the following sequence condition holds. Given that np, nq, nr, and ns are nodes belonging to N, and 0< i<k, if ei = (np, nq) and ei+1 = (nr, ns) then nq = nr. } © Aditya P. Mathur 2005 12

Paths: sample paths through the exponentiation flow graph
Two feasible and complete paths: p1= ( Start, 1, 2, 4, 5, 6, 5, 7, 9, End) p2= (Start, 1, 3, 4, 5, 6, 5, 7, 9, End) Specified unambiguously using edges: p1= ( (Start, 1), (1, 2), (2, 4), (4, 5), (5, 6), (6, 5), (5, 7), (7, 9), (9, End)) Bold edges: complete path. Dashed edges: subpath. © Aditya P. Mathur 2005 13

Paths: infeasible paths
A path p through a flow graph for program P is considered feasible if there exists at least one test case which when input to P causes p to be traversed. p1= ( Start, 1, 3, 4, 5, 6, 5, 7, 8, 9, End) p2= (Start, 1, 2, 4, 5, 7, 9, End) © Aditya P. Mathur 2005 14

Number of paths There can be many distinct paths through a program. A program with no condition contains exactly one path that begins at node Start and terminates at node End. Use two diamonds to explain the idea of path explosion. Each additional condition in the program can increases the number of distinct paths by at least one. Depending on their location, conditions can have a multiplicative effect on the number of paths. © Aditya P. Mathur 2005 15

A Simplified Version of CFG
Each statement is represented by a node For readibility. Not for efficient implementation.

Dominator X dominates Y if all possible program path from START to Y has to pass X. 1: sum=0 2: i=1 1: sum=0 2: i=1 3: while ( i<N) do 4: i=i+1 5: sum=sum+i endwhile 6: print(sum) Note that a basic block is identified by the first statement in the block. 3: while ( i<N) do 4: i=i+1 5: sum=sum+i DOM(6)={1,3, 6} 6: print (sum)

Dominator X strictly dominates Y if X dominates Y and X!=Y 1: sum=0
3: while ( i<N) do 4: i=i+1 5: sum=sum+i endwhile 6: print(sum) 3: while ( i<N) do 4: i=i+1 5: sum=sum+i SDOM(6)={1,3} 6: print (sum)

Dominator X is the immediate dominator of Y if X is the last dominator of Y along a path from Start to Y. 1: sum=0 2: i=1 1: sum=0 2: i=1 3: while ( i<N) do 4: i=i+1 5: sum=sum+i endwhile 6: print(sum) Is it possible X is not the last dominator of Y along another path? Or is the immediate dominator of a node X unique? Yes 3: while ( i<N) do 4: i=i+1 5: sum=sum+i IDOM(6)={3} 6: print (sum)

Postdominator X post-dominates Y if every possible program path from Y to End has to pass X. Strict post-dominator, immediate post-dominance. 1: sum=0 2: i=1 1: sum=0 2: i=1 3: while ( i<N) do 4: i=i+1 5: sum=sum+i endwhile 6: print(sum) 3: while ( i<N) do 4: i=i+1 5: sum=sum+i SPDOM(4)={3,6} IPDOM(4)=3 6: print (sum)

Back Edges A back edge is an edge whose head dominates its tail
Back edges often identify loops 1: sum=0 2: i=1 1. while (p1) { 2. if (p2) { continue; } else { if (p3) s1 } 3: while ( i<N) do 4: i=i+1 5: sum=sum+i 6: print (sum)

Program Dependence Graph
Read Chapter 1.15 and 1.16 of “Foundations of Software Testing” The second widely used program representation. Nodes are constituted by statements instead of basic blocks. Two types of dependences between statements Data dependence Control dependence CFG focuses on the flow of control. PDG focuses on dependences. Used in debugging, security.

Data Dependence X is data dependent on Y if (1) there is a variable v that is defined at Y and used at X and (2) there exists a path of nonzero length from Y to X along which v is not re-defined. 1: sum=0 2: i=1 5 is data dependent on 1, 4, and 5. 3: while ( i<N) do 4: i=i+1 5: sum=sum+i 6: print (sum)

Computing Data Dependence is Hard in General
Aliasing A variable can refer to multiple memory locations/objects. 1: foo (ClassX x, ClassY y) { 2: x.field= …; 3: …=y.field; 4: } 1: int x, y, z …; 2: int * p; 3: x=…; 4: y=…; 5: p = & x; 6: p=p +z; 7: … = *p; Later in this class, we will see how alias information is computed and then used in computing data dependence. foo ( o, o); o1=new ClassX( ); o2= new ClassY( ); foo ( o1, o2);

Control Dependence Intuitively, Y is control-dependent on X iff X directly determines whether Y executes (statements inside one branch of a predicate are usually control dependent on the predicate) X is not strictly post-dominated by Y there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y There is a path from X to End that does not pass Y or X==Y The definition in Foundations of Software Engineering is not accurate. The intuition: X can control not to execute Y; X can direct the control along a path that Y is promised to be executed. No such paths for nodes in a path between X and Y. X Not post-dominated by Y Every node is post-dominated by Y Y

Control Dependence - Example
Y is control-dependent on X iff X directly determines whether Y executes X is not strictly post-dominated by Y there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y 1: sum=0 2: i=1 1: sum=0 2: i=1 3: while ( i<N) do 4: i=i+1 5: sum=sum+i endwhile 6: print(sum) On the right hand side, 3 is control dependent on 2, but not 1. Note that 2 is control dependent on 1. So the fact that 1 controls the execution of 3 is reflected by the transitive dependences between 1 and 3. 3: while ( i<N) do 4: i=i+1 5: sum=sum+i CD(5)=3 6: print (sum) CD(3)=3, tricky!

Note: Control Dependence is not Syntactically Explicit
Y is control-dependent on X iff X directly determines whether Y executes X is not strictly post-dominated by Y there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y 1: sum=0 2: i=1 1: sum=0 2: i=1 3: while ( i<N) do 4: i=i+1 5: if (i%2==0) 6: continue; 7: sum=sum+i endwhile 8: print(sum) 3: while ( i<N) do 4: i=i+1 5: if (i%2==0) 7: sum=sum+i 8: print (sum)

Control Dependence is Tricky!
Y is control-dependent on X iff X directly determines whether Y executes X is not strictly post-dominated by Y there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y Can a statement control depends on two predicates?

Control Dependence is Tricky!
Y is control-dependent on X iff X directly determines whether Y executes X is not strictly post-dominated by Y there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y Can one statement control depends on two predicates? 1: ? p1 1: if ( p1 || p2 ) 2: s1; 3: s2; 1: ? p2 What if ? 1: if ( p1 && p2 ) 2: s1; 3: s2; 2: s1 3: s2

The Use of PDG A program dependence graph consists of control dependence graph and data dependence graph Why it is so important to software reliability? In debugging, what could possibly induce the failure? In security Why I spend so much time in explaining control dependence. Control dependence still has a lot of open problems to solve. p=getpassword( ); … send (p); p=getpassword( ); … if (p==“zhang”) { send (m); }

Super Control Flow Graph (SCFG)
Besides the normal intraprocedural control flow graph, additional edges are added connecting Each call site to the beginning of the procedure it calls. The return statement back to the call site. It is also called interprocedural control flow graph (ICFG). The goal of SCFG/ICFG is to connect CFGs for individual functions into a big graph to reflects control flow both inside functions and between functions. The blue edges are called the calling edges. The green edges are called the return edges. 1 1: for (i=0; i<n; i++) { 2: t1= f(0); 3: t2 = f(243); 4: x[i] = t1 + t2 + t3; 5: } 6: int f (int v) { 7: return (v+1); 8: } 7 2 3 4

The Use of SCFG When reasoning across function boundaries is needed.
A mouse click suddenly drives a desktop application into a coma, and the operating system declares it “not responding”. While the application usually responds eventually, no user actions can be taken during the wait. Intuitively, if a time-critical invocation such as a UI action runs into a blocking function such as a synchronous socket operation, then this is a hang point. In other words, entry functions of time-critical threads cannot reach blocking functions in a call graph. The problem is reduced to finding a pattern in a SCFG.

Many Other Representations
Points-to Graph. Static single assignment (SSA).

Program Representation

Similar presentations

Presentation on theme: "Program Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Program Representation

Similar presentations

Presentation on theme: "Program Representation"— Presentation transcript:

Similar presentations

About project

Feedback