Control Flow Analysis (Chapter 7)

Control Flow Analysis (Chapter 7)
Mooly Sagiv (with Contributions by Hanne Riis Nielson) Lecture #3, 22-Mar-2001 Scribe by: Shay Raz The Control Flow Analysis (CFA) is the first step of the optimization. We will use the Intermediate Representations (IRs) that we learned in previous lesson.

Outline What is Control Flow Analysis? Motivating Example
Structure of an optimizing compiler A motivating example Constructing basic blocks Depth first search Finding dominators Reducibility Interval and Structural Analysis Conclusions

Control Flow Analysis Input: A sequence of IR Output:
A partition of the IR into basic blocks A control flow graph The loop structure The object is to analyze the flow of control, to know to where each sentence can reach. The input is a sequence of IR structures (usually MIR) and built the control graph. It is simple to know the control flow in High Languages because its in the source but in Low Languages its harder. The question risen here is: why not use the high language? We will answer this question in the following pages.

Compiler Structure String of characters Scanner tokens Parser
Symbol table and access routines AST OS Interface Semantic analyzer IR Code Generator Object code

Optimizing Compiler Structure
String of characters Front-End IR Control Flow Analysis CFG Data Flow Analysis CFG+information We saw in previous lesson the creation of IR in the front-end we will learn in this lesson the creation of the Control flow Graph (CFG). We will learn in future lesson how to use the CFG for optimizations. We will also replace the CFG with a more efficient representation that will consume less CPU time. The Control flow analysis can start with MIR or LIR, it doesn’t matter. Program Transformations Object code IR instruction selection

An Example Reaching Definitions
A definition --- an assignment to variable An assignment d reaches a program point block if there exists an execution path to the this point in which the value assigned at d is still active We will start with some motivation example for all the technical algorithms to come. We will try to answer the “Reaching problem”. The question is: To which point, in all execution paths, an assignment can reach? i.e. A: x:= : : B: x:= : : C: The assignment of 5 to x reaches the B label and no further because the assignment of 7 to x in B ‘cancels’ the prior assignment. Regarding the question on branches like an ‘If Branch’ a more sophisticated analysis can estimate which branch will taken. Compilers are conservative and will prefer to create a safe and correct code instead of an efficient code,

Running Example unsigned int fib(unsigned int m) {unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m 1 1, 2 1, 2, 3 1, 2, 3, 5 1, 2, 3, 5, 8, 9, 10, 11 1, 2, 3, 5, 8, 9, 10, 11 1, 3, 5, 8, 9, 10, 11 We will lock on a C code for creation of fibunachi numbers on the left. On the right side a MIR representation that the front-end produced. The number on the far right are assignment line numbers that reach the line. We can see that on line 7 we added the numbers This is because we are conservative and we assume that line 7 can be reached from the loop by the goto command in line 12. On line 9 there is a change, we omitted the number 2. This is due to the fact that line 9 assigns value to f0. This assignment masks the assignment in line 2. The same happens in line 10,11. We will talk more about the use of this analysis in the following lessons. An example for a use is detection of assignment to un initialized variable. The compiler can assign a dummy values to each of the variables and if the compiler discover a usages of the dummy value it initiates a warning. This warning might be a false one because the compiler is conservative and checks all execution paths. Another usage can be detection of loop-variant variables that can be moved before the loop and increase performance. A loop-variant is characterized by assignment outside loops only. The assumption that a significant part of execution is done inside loops implies that this optimization can be very effective. 1, 5, 8, 9, 10, 11 1, 8, 9, 10, 11

entry  1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m  2, 3 2, 3, 5,8,9, 10, 11 2, 3, 5, 8,9, 10, 11 2, 3, 5, 8,9, 10, 11 We will concern ourselves only in control flow and not on values. We will analyze Basic Blocks which are defined as a sequence of commands that executes together with no jumps from and to the block. It is a sad fact that the more advanced the program languages is (I.e. object oriented) the sorter the basic blocks are because the use of excessive delegation of operation between objects and modules. We will do the same analysis like before this time on blocks. The problem is how does the compiler ‘knows’ the structure of the program? 2, 3, 5,8,9, 10, 11 2,3 exit

Approaches for Data Flow Analysis
Iterative Compute natural loops and iterate on CFG Interval Based Reduce the CFG to single node Inductively define the data flow solution Structural Identify control flow structures in the CFG There are different approaches for Data Flow Analysis. The intuitive one is iteration on Basic Blocks graph. We initialize that nothing can reach and iterate until we come to a ‘fix-point’. The iteration will eventually stop because there is a finite number of assignments. This approach is simple and straight forward but it’s a very heavy computation. Second approach is to solve a set of equations. To reduce the graph to one node and then reverse the operations and calculate. The third approach is to move the computation to the creation of the compiler and teach him to find structures in the CFG.

entry  1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m  2, 3 2, 3, 5 ,8, 9, 10, 11 2, 3, 5, 8,9, 10, 11 2, 3, 5 , 8, 9, 10, 11 Example of the iterative solution. Navigating the nodes in a ‘reverse post order’. In the worst case every assignment reaches every statement. At the end we reach a fix-point. The main disadvantage of this solution is the price in terms of running time, this is a very heavy process. 2, 3, 5, 8,9, 10, 11 2,3 exit

entry exit {9, 10}, {1, 2, 3} {11}, {5} {2, 3, 5}, {8, 9, 10, 11}
1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m {9, 10}, {1, 2, 3} {11}, {5} {2, 3, 5}, {8, 9, 10, 11} Example of the second approach. We calculate the effect of the program and then do the reverse process. We mark on the side of each blocks the instruction lines that the block creates and kill. In the right parenthesis are the assignments that the block create and the left parenthesis are the assignments that the block kills or masks. For instance, the first block creates assignments 1,2,3 and kills assignments 9,10. The assumption is that we can calculate it from the front-end information. In the next slides we will reduce the CFG to a node. exit

entry exit {11}, {5} , {8, 9, 10, 11} {9, 10}, {1, 2, 3}
We observe that we can eliminate the loop and replace it with only one line. We assume that if we don’t execute the loop it can’t kill assignments so we replace it with the creation of the loop assignments and no kills by it. exit

entry {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5} We can join the two blocks by saying that they don’t kill assignment 11 because the second block creates it. exit

entry {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5} This join is with no effect. exit

entry {9, 10}, {1, 2, 3} , {8, 9, 10, 11, 5} This join is also with no effect. exit

entry , {8, 9, 10, 11, 5} {9, 10}, {1, 2, 3} We have to paths that we will replace with one. The joined path will create the join of the two paths and kills the disjunction of them. exit

entry , {1, 2, 3, 8, 9, 10, 11, 5} A trivial join. exit

entry , {1, 2, 3, 8, 9, 10, 11, 5} This solution is different from the previous one. It’s the reaching from the entry to the exit, the value at the exit as a function of the entry. The advantage is that we don’t do any computation. We transferred the computation to the building of the compiler. We will discuss the operation on the CFG in order to do this process. We will identify loops and analyze the importance of the loops in order to find an effective solution. To work … exit

Finding Basic Blocks A basic block is the maximal sequence of straight-line IR instructions no fork-join A leader IR instruction the entry of a routine a target of a branch instruction immediately following branch A Basic Block is the maximal sequence of IR instruction with no jump to and from the block. The way to identify a Basic Block is to use the concept of ‘Leader’. The assumption is that we analyze only one procedure at a time. This I simple to do but it creates less effective code then if we analyze more then one procedure.

Constructing basic blocks
Input: a sequence of MIR instructions Output: a list of basic blocks where each MIR instruction occurs in exactly one block Method: determine the leaders of the basic blocks: - the first instruction in the procedure is a leader - any instruction that is the target of a jump is a leader - any instruction after branch is a leader for each leader its basic block consists of - the leader and - all instructions up to but not including the next leader or the end of the program Following this algorithm we can construct the Basic Blocks. We start by identifying the Leaders, then a Basic Block is defined as starting with a leder with all instruction untill the next leader.

Running Example unsigned int fib(unsigned int m)
{unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m Lets find the Leaders in this example.

Running Example unsigned int fib(unsigned int m)
{unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m The Leaders are marked in red. Line 1 begins the procedure. Lines 5,7 are Leaders because they are after a branch. Lines 6,8,13 are targets of a jump.

Running Example B1 B2 B3 B4 B5 B6 unsigned int fib(unsigned int m)
{unsigned int f0=0, f1=1, f2, i; if (m <= 1) { return m; } else { for (i=2, i <=m, i++) { f2=f0+f1; f0=f1; f1 =f2;} return f2; } 1: receive m(val) 2: f0  0 3: f1  1 4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m B1 B2 B3 B4 These are the Basic Blocks we have found. The only thing left now is to draw the edges between the blocks. B5 B6

Constructing Control Flow Graph (CFG)
Special entry block r without successors Special exit block without predecessors There is an edge m  n m= entry and the first instruction in n begins the procedure n=exit and the last instruction in m is return or the last instruction in the procedure there is a branch from the last instruction in m into the first instruction in n the first instruction in n immediately follows the last non-branch instruction in m To simplify thing we will add two special blocks, one for the Entry and one for Exit. Then we will add edges following the rules.

Running Example B1 B2 B3 B4 B5 B6 1: receive m(val) 2: f0  0
4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m B1 B2 B3 B4 We will do it again on the example. B5 B6

entry exit 1: receive m(val) 2: f0  0 3: f1  1
4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m And this are the edges we got following the rules. exit

How to treat call instructions?
A call is an atomic instruction A call ends a basic block Replace the call by the procedure body (inline) A call is a “goto” into the procedure A call is handled in a special way According to our previous definitions a function call doesn’t change the Basic Blocks, but is this a good treatment for the call or should we end a Basic Block after a call? The simplest way is to treat a call as an atomic operation that doesn’t change the basic block. We count on the the compiler writers that a call will not effect the registers or variables of a procedure. Some potential difficulties with this assumption are listed in the next slide. Maybe a more efficient way will be to plant the procedure body inside the call like an inline procedure. This effects the size of the created code.

Potential Difficulties
Gotos outside procedure boundaries Exit/Trap calls Exception handling Computed gotos setjump(), lonjump() calls In Pascal goto commands can be to outside of the procedure. Languages with exception handling can be problematic because a call can break a basic block. setjump(), longjump() calls in C can also break a basic block. Most compliers tend not to deal with sophisticated features like that and use the methods describe in previous slide like inline.

Approaches for Data Flow Analysis
Iterative Compute natural loops and iterate on CFG Interval Based Reduce the CFG to single node Inductively define the data flow solution Structural Identify control flow structures in the CFG We will now return to Data Flow Analysis. We will investigate the different approaches.

Identifying Natural Loops
A basic block m dominates a basic block n if every path from entry to n includes m The domination relationship is: reflexive, transitive, and anti-symmetric  can be represented as a tree A back edge m  n  n dominates m The natural loop contains the blocks on the paths from n to m The first approach Iterative: How can we identify a loop. A suggestion is to look for backward jumps. The problem is that this is ambiguous. In the example in slide 34, who jumps backward B2 or B3? So this is not a trivial problem. To identify loops we will define a Dominator. We can build a tree that represents the domination relation. For the slide 34 example it looks like: Now after defining the back-edges we can see that our example is ’bad’ because we can’t decide a loop between B2 to B3. Statistics shows that this cases are very rare. There are no many non-reducible graphs like this one in real software. And we can now define a loop according to back-edges. B1 B2 B3 B4 B5

entry B0 B1 B2 B3 B4 B5 B7 B6 exit 1: receive m(val) 2: f0  0
4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m B1 B2 B3 B4 We can now see that the edge from B5 to B3 is a back edge because B3 dominates B5. B5 B7 B6 exit

Reducible Flow Graphs All the loops are natural
Can be “reduced” into a single node via a sequence of special transformations Example T1, T2 transformations Every loop has a single entry Result from “well structured” programs Most programs compiled into reducible flow graphs The first three statement are equal for Reducible Flow Graphs. The first statement refers to the definition of natural loops from previous slides. The second definition is by Tarjan with the T1,T2 transformations in the next slide. The 3rd definition is an intuitive one. The 4th is a non-formal definition and statistics shows that 98% (100% in languages with no gotos) of programs are compiled into reducible flow graphs.

T1/T2 Transformations T1  T2 
These are the transformations defined by Tarjan.

Bad Example B1 B2 B3 B4 B5 We can see that in this example we can’t use Tarjan transformation because after we reach: We can’t do anything else. Compilers should deal with every possible code so what to do if we have a non reducible graph? One option is to go back to iteration. Most compilers first check if the graph is reducible and if not they just iterate. Another option is to use node splitting. B1 B2 B3

Node Splitting B1 B1 B2 B3 B3 B2 B4 B3a B5 B5 B4
Here we duplicated the code of B3 to B3a. The program created is equal because we have all the executable paths in place. In this way the graph is reducible. By intuition, all loops have only one entry point. By dominators, we can calculate the dominators and see. We will show by transformation: B5 B4 B1 B1 B2 B3 By T2 => B2-B4 B3 B3a B5 B3a B5 B4 B1-B3 B1-B3 By T2 => B2-B4-B3a B2-B4 B3a B5 B5 By T1 B1-B3 B1-B3-B2-B4-B3a B2-B4-B3a By T2 => By T2 => B1-B3-B2-B4-B3a-B5 B5 B5

Why can’t we construct loops from source?
Language dependent Non uniform Source to source transformations Most programming languages support “wild” GOTOs The obvious question is: why not construct loops from source? Why not make everyone work in ‘Java’ where there are no gotos? The answer is that there is no universal agreement regarding software languages and there are advantages in using gotos. If we use the source there are many control statements that we should recognize and solve. In terms of correct software engineer there is a lot more flexibility in working with MIR and separating to front-end and back-end modules and the algorithm we described is not too much heavy or complicated to be used. Using lower languages easier transformations then using rigid high languages.

Depth-first spanning tree
Input: a flow graph G = (N,E,r) Output: a depth-first spanning tree (N,T) Method:T := Ø; for each node n in N do mark n unvisited; call DFS(r) Using: procedure DFS(n) is mark n visited; for each n ® s in E do if s is not visited then add the edge n ® s to T; call DFS(s) We now describe an algorithm to DFS. DFS algorithm is very useful for different stages in compilers. Be build a spanning tree using the DFS. We assume that we have a root to the graph and this is true because we added the special Entry node.

Better DFS Implementations
Explicit stack instead of recursion Pointer reversal We can create efficient DFS algorithms using a simple stack of with more sophisticated algorithm using pointer traversal.

Pre-ordering Input: a flow graph G=(N,E,r)
Output: a depth-first spanning tree (N,T) and ordering Pre of N Method:T := Ø; for each node n in N do mark n unvisited; i := 1; call DFS(r) Using: procedure DFS(n) is mark n visited; Pre(n) := i; i := i + 1; for each n ® s in E do if s is not visited then add the edge n ® s to T; call DFS(s); We use the DFS algorithm to create pre-ordering. We will use this to reduce the amount of iterations in the iterative algorithm.

Computing dominators Input: a flow graph G=(N,E,r)
Output: for each node n, a set DOM(n) of dominators Method:DOM(r) := { r }; for each n in N \ { r } do DOM(n) := N; while changes in some DOM(n) do for each n in N \ { r } do DOM(n) := { n } U  {DOM(p) | p ® n is in E} The following is an algorithm to compute the dominators. It is based on the fact that if all ‘parents’ of a node ‘n’ dominates another node ‘m’ then ‘n’ also dominates ‘m’. We will initialize all the nodes do dominate ‘the world’ and then iterate until we reach a fix-point. The best way is to traverse the nodes in reverse post order, this is because we want to be sure that we analyzed all the parents of a node before analyzing the node itself. The problem with this algorithm is that it can take quadratic time. There are more efficient algorithms using special data structures like a bit vector.

entry B0 B1 B2 B3 B4 B5 B7 B6 exit 1: receive m(val) 2: f0  0
4: if m <= 1 goto L3 5: i  2 6: L1: if i <=m goto L2 7: return f2 8: L2: f2  f0 + f1 9: f0  f1 10: f1  f2 11: i  i + 1 12: goto L1 13: L3: return m B1 B2 B3 B4 In our example its very simple. One iteration will do to compute the dominators. B5 B7 B6 exit

Other Algorithms for Finding Dominators
Lengauer & Tarjan e log n algorithm Harel linear time algorithm Thorup linear time algorithm Alstrup & Lauridsen incremental algorithm In ’81 Lengauer & Tarjan came up with a e log n very tricky algorithm to compute dominators. We must remember that control graphs are usually not full and en. An error was found in Harel’s algorithm and it is more then linear. ~3 years ago Thorup found a linear algorithm but this algorithm in most cases is slower then Tarjan’s. There are also some incremental algorithms that uses information already computed to compute dominators instead of recalculating them.

Computing natural loops
Input: a flow graph G=(N,E,r) and a backedge m® n Output: a set, loop, of the nodes in the natural loop of m ® n Method:stack := empty; loop := {n}; call add(m); while stack is not empty do pop d from the stack; for each p with p ® d in E do call add(p) Using: procedure add(p) is if p is not in loop then loop := loop U {p}; push p on the stack After we computed the dominators, this is an algorithm to compute the natural loops. This algorithm will identify loops inside loops like this one:

Issues Natural loops with disjoint headers are disjoint or nested within each other But what about loops which share a header? Loops with disjoint headers (root) are easy to deal with, but what to do with loops that share the same header. For instance:

Two Loops with the same header
B1: i =1 if (i >= 100) goto B4 else if ((i %10)==0) goto B3 else B2: i++; goto B1 B3: B4: ... B1: if (i < j) goto B2 else if (i > j) goto B3 else goto B4 B2: i++; goto B1 B3: B4: ... In this code example there are loops with joined headers. On the left side code, it is reasonable to assume that B2 is a loop inside B3. On the right side code, the relation between the loops is not clear. Most compilers will not delay on this question they will just treat the two loops as one big loop. The goal of the compiler is to create an efficient code and not delay on the correctness of the loops relations.

Strongly connected components
Input: a flow graph G = (N,E,r) Output: a set of strongly connected components Method: for all n in N do mark n unvisited i := 1; stack := empty while there exists unvisited node n do call SCC(n) Using: procedure SCC(n) is ... This algorithm (continued in next slide) to compute SCCs is a good solution in case of non reducible graphs. The advantage in splitting the graph to SCCs is that we can deal with each SCC separately and later on assume that after dealing with the SCC, the reach of the nodes inside the SCC will never change. There is also an algorithm by Tarjan that is a more efficient algorithm to compute SCCs.

procedure SCC(n) is mark n visited; Pre(n) := i; Low(n) := i; (lowest number for node in SCC) i := i+1; push n on the stack; for each n -> s in E do if s is not visited then call SCC(s); Low(n) := min(Low(n),Low(s)) else if Pre(s) < Pre(n) and s is on the stack (back or cross edge) then Low(n) := min(Low(n),Pre(s)); if Low(n) = Pre(n) (n is the root of an SCC) then SCC := Ø; repeat pop d off the stack; SCC := SCC U {d} until d = n; return SCC

Structural Analysis Identify “common” structures in the control flow graph (even irreducible) Reduce the CFG into “simple-regions” Shift some dataflow analysis from compile-time to compiler-generation-time Can be efficiently implemented via DFS The structural analysis works also on nun reducible graphs. It is based on work of Micha Sharir. The idea is to shift computation from the compiler to the compiler generation. The compiler creator must identify structures in the CFG and ‘teach’ the compiler the meaning of them in terms of data flow. The transformation are done in the compiler generation stage.

Block Schema B1 B2  This is an example of a pattern. This pattern is a sequence of instruction that can be all transform to a single block by a sequence of T2 transformations. Bn

Conditionals B1 B2 B1 B2 B3 B0 B1 B2 Bn
These are conditional patterns. On the left a if-then pattern. On the right a if-then-else pattern. On the bottom a switch-case pattern. There is a rich set of patterns that can be recognized.

Loops B1 B2 B1 B1 B2 B1 B2 B3 Here are some loops patterns.
On the upper left side a simple loop. On the upper right side a while loop. On the bottom left side a while loop with exits. Etc. We can see that even if our programming language allows the use of goto statements we can still recognize information in the CFG. The effectiveness of this is that we can go over the code in one pass and identify all patterns, then in a second pass we can identity the data flow. The main disadvantage is that the implementation is very complicated and error prone. As evidence to that in every version of the course book contains correction to the code of the structure analysis example.

Control Flow Analysis (Chapter 7)

Similar presentations

Presentation on theme: "Control Flow Analysis (Chapter 7)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Control Flow Analysis (Chapter 7)

Similar presentations

Presentation on theme: "Control Flow Analysis (Chapter 7)"— Presentation transcript:

Similar presentations

About project

Feedback