# Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Control Flow Analysis & Local Optimizations.

## Presentation on theme: "Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Control Flow Analysis & Local Optimizations."— Presentation transcript:

Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Control Flow Analysis & Local Optimizations

“Advanced Compiler Techniques” Levels of Optimizations  Local inside a basic block  Global ( intraprocedural ) Across basic blocks Whole procedure analysis  Interprocedural Across procedures Whole program analysis 2

The Golden Rules of Optimization  Premature Optimization is Evil  Donald Knuth, premature optimization is the root of all evil Optimization can introduce new, subtle bugs Optimization usually makes code harder to understand and maintain  Get your code right first, then, if really needed, optimize it Document optimizations carefully Keep the non - optimized version handy, or even as a comment in your code 3 “Advanced Compiler Techniques”

The Golden Rules of Optimization  The 80/20 Rule  In general, 80% percent of a program ’ s execution time is spent executing 20% of the code  90%/10% for performance - hungry programs  Spend your time optimizing the important 10/20% of your program  Optimize the common case even at the cost of making the uncommon case slower 4 “Advanced Compiler Techniques”

The Golden Rules of Optimization  Good Algorithms Rule  The best and most important way of optimizing a program is using good algorithms E. g. O ( n * log ) rather than O ( n 2 )  However, we still need lower level optimization to get more of our programs  In addition, asymptotic complexity is not always an appropriate metric of efficiency Hidden constant may be misleading E. g. a linear time algorithm than runs in 100* n +100 time is slower than a cubic time algorithm than runs in n 3 +10 time if the problem size is small 5 “Advanced Compiler Techniques”

General Optimization Techniques  Strength reduction Use the fastest version of an operation E. g. x >> 2 instead of x / 4 x << 1 instead of x * 2  Common sub expression elimination Eliminate redundant calculations E. g. double x = d * (lim / max) * sx; double y = d * (lim / max) * sy; double depth = d * (lim / max); double x = depth * sx; double y = depth * sy; 6

“Advanced Compiler Techniques” General Optimization Techniques  Code motion Invariant expressions should be executed only once E. g. for (int i = 0; i < x.length; i++) x[i] *= Math.PI * Math.cos(y); double picosy = Math.PI * Math.cos(y); for (int i = 0; i < x.length; i++) x[i] *= picosy; 7

“Advanced Compiler Techniques” General Optimization Techniques  Loop unrolling The overhead of the loop control code can be reduced by executing more than one iteration in the body of the loop. E. g. double picosy = Math.PI * Math.cos(y); for (int i = 0; i < x.length; i++) x[i] *= picosy; double picosy = Math.PI * Math.cos(y); for (int i = 0; i < x.length; i += 2) { x[i] *= picosy; x[i+1] *= picosy; } 8

“Advanced Compiler Techniques” Compiler Optimizations  Compilers try to generate good code i. e. Fast  Code improvement is challenging Many problems are NP - hard  Code improvement may slow down the compilation process In some domains, such as just - in - time compilation, compilation speed is critical 9

“Advanced Compiler Techniques” Phases of Compilation  The first three phases are languag e - depende nt  The last two are machine - depende nt  The middle two depende nt on neither the languag e nor the machine 10

Control Flow  Control transfer = branch ( taken or fall - through )  Control flow Branching behavior of an application What sequences of instructions can be executed  Execution  Dynamic control flow Direction of a particular instance of a branch Predict, speculate, squash, etc.  Compiler  Static control flow Not executing the program Input not known, so what could happen  Control flow analysis Determining properties of the program branch structure Determining instruction execution properties 12 “Advanced Compiler Techniques”

Basic Blocks  A basic block is a maximal sequence of consecutive three - address instructions with the following properties : The flow of control can only enter the basic block thru the 1 st instruction in the block. ( no jumps into the middle of the block ) Control will leave the block without halting or branching, except possibly at the last instruction in the block.  Basic blocks become the nodes of a flow graph, with edges indicating the order. 13

Examples 14 1) i = 1 2) j = 1 3) t 1 = 10 * i 4) t 2 = t 1 + j 5) t 3 = 8 * t 2 6) t 4 = t 3 - 88 7) a [ t 4] = 0.0 8) j = j + 1 9) if j <= 10 goto (3) 10) i = i + 1 11) if i <= 10 goto (2) 12) i = 1 13) t 5 = i - 1 14) t 6 = 88 * t 5 15) a [ t 6] = 1.0 16) i = i + 1 17) if i <= 10 goto (13) for i from 1 to 10 do for j from 1 to 10 do a [ i, j ]=0.0 for i from 1 to 10 do a [ i, i ]=0.0 “Advanced Compiler Techniques”

Identifying Basic Blocks  Input : sequence of instructions instr ( i )  Output : A list of basic blocks  Method : Identify leaders : the first instruction of a basic block Iterate : add subsequent instructions to basic block until we reach another leader 15

“Advanced Compiler Techniques” Identifying Leaders  Rules for finding leaders in code First instr in the code is a leader Any instr that is the target of a ( conditional or unconditional ) jump is a leader Any instr that immediately follow a ( conditional or unconditional ) jump is a leader 16

“Advanced Compiler Techniques” Basic Block Partition Algorithm leaders = {1}// start of program for i = 1 to | n |// all instructions if instr ( i ) is a branch leaders = leaders U targets of instr ( i ) U instr ( i +1) worklist = leaders While worklist not empty x = first instruction in worklist worklist = worklist – { x } block ( x ) = { x } for i = x + 1; i <= | n | && i not in leaders ; i ++ block ( x ) = block ( x ) U { i } 17

“Advanced Compiler Techniques”E A B C D F Basic Block Example Leaders 1.i = 1 2.j = 1 3.t1 = 10 * i 4.t2 = t1 + j 5.t3 = 8 * t2 6.t4 = t3 - 88 7.a[t4] = 0.0 8.j = j + 1 9.if j <= 10 goto (3) 10.i = i + 1 11.if i <= 10 goto (2) 12.i = 1 13.t5 = i - 1 14.t6 = 88 * t5 15.a[t6] = 1.0 16.i = i + 1 17.if i <= 10 goto (13) Basic Blocks 18

“Advanced Compiler Techniques” Control-Flow Graphs  Control - flow graph : Node : an instruction or sequence of instructions ( a basic block )  Two instructions i, j in same basic block iff execution of i guarantees execution of j Directed edge : potential flow of control Distinguished start node Entry & Exit  First & last instruction in program 19

“Advanced Compiler Techniques” Control-Flow Edges  Basic blocks = nodes  Edges : Add directed edge between P and S if :  Jump / branch from last statement of P to first statement of S, or  According to the initial order, S immediately follows P in program order and P does not end with unconditional branch ( goto / return / call ) Definition of predecessor and successor  P is a predecessor of S  S is a successor of P 20

“Advanced Compiler Techniques” Control-Flow Edge Algorithm Input : block ( i ), sequence of basic blocks Output : CFG where nodes are basic blocks for i = 1 to the number of blocks x = last instruction of block ( i ) if instr ( x ) is a branch / jump for each target y of instr ( x ), create edge ( i - > y ) if instr ( x ) is not unconditional branch, create edge ( i -> i +1) 21

Dominator  Defn : Dominator – Given a CFG ( V, E, Entry, Exit ), a node x dominates a node y, if every path from the Entry block to y contains x  In the reverse direction, node x post - dominates block y if every path from y to the exit has to pass through block x.  Some properties of dominators : Reflexivity, transitivity, anti - symmetry If x dominates z and y dominates z, then either x dominates y or y dominates x  Intuition Given some BB, which blocks are guaranteed to have executed prior to executing the BB 22 “Advanced Compiler Techniques”

Dominator Tree  It is said that a block x immediately dominates block y if x dominates y, and there is no intervening block P such that x dominates P and P dominates y. In other words, x is the last dominator on all paths from entry to y. Each block has a unique immediate dominator.  A dominator tree is a tree where each node ' s children are those nodes it immediately dominates. Because the immediate dominator is unique, it is a tree. The start node is the root of the tree. 23 1 3 5 2 4 1 3 5 2 4 {1,5} {1,4} {1,2,3} {1,2} {1} “Advanced Compiler Techniques”

Loops  Loops comes from while, do - while, for, goto……  Many transformation depends on loops  Back edge : An edge is a back edge if its head dominates its tail.  Loop definition : A set of nodes L in a CFG is a loop if 1. There is a node called the loop entry : no other node in L has a predecessor outside L. 2. Every node in L has a nonempty path ( within L ) to the entry of L. 24

Example: Back Edges 25 1 3 5 2 4 {1,5} {1,4} {1,2,3} {1,2} {1} 1 3 5 2 4 {1,5} {1,4} {1,2,3} {1,2} {1} “Advanced Compiler Techniques” DAG （ Directed Acyclic Graph ） CFG （ Control Flow Graph ）

“Advanced Compiler Techniques” Loop Examples  { B 3}  { B 6}  { B 2, B 3, B 4} 26

“Advanced Compiler Techniques” Identifying Loops  Motivation majority of runtime  focus optimization on loop bodies !  remove redundant code, replace expensive operations ) speed up program  Finding loops : easy… for i = 1 to 1000 for j = 1 to 1000 for j = 1 to 1000 for k = 1 to 1000 for k = 1 to 1000 do something do something 1 i = 1; j = 1; k = 1; 2 A1: if i > 1000 goto L1; 3 A2: if j > 1000 goto L2; 4 A3: if k > 1000 goto L3; 5 do something 6 k = k + 1; goto A3; 7 L3: j = j + 1; goto A2; 8 L2: i = i + 1; goto A1; 9 L1: halt or harder (GOTOs) or harder (GOTOs) 27

Interval Analysis(T1/T2 Trans) 28 T 1 Transformation T 2 Transformation “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 29 1 3 5 2 4 T2 “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 30 14 5 23 T1 “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 31 14 5 23 T2 “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 32 12345 T1 12345 “Advanced Compiler Techniques”

Structure Analysis 33 Static Features Desription 1SS_No. 典型子结构唯一标识 2Edge_No. 典型子结构中控制流边的唯一标识 3I_last_of_head 该边首基本块最后一条指令的操作码 4Br_direction 该边首基本块最后一条指令的跳转方向 5I_pre_last 该边首基本块最后一条指令的前一条指令的操作码 “Advanced Compiler Techniques”

Weighted CFG 34  Profiling – Run the application on 1 or more sample inputs, record some behavior Control flow profiling  edge profile  block profile Path profiling Cache profiling Memory dependence profiling  Annotate control flow profile onto a CFG  weighted CFG  Optimize more effectively with profile info !! Optimize for the common case Make educated guess BB1 BB2 BB4 BB3 BB5BB6 BB7 Entry Exit 20 10 200 0 “Advanced Compiler Techniques”

Local Optimization  Optimization of basic blocks  § 8.5 35

“Advanced Compiler Techniques” Transformations on basic blocks  eliminating local common sub - expressions  eliminating dead code  reordering statements that do not depend on one another  applying algebraic laws to reorder operands of three - address instructions  All of the above require symbolic execution of the basic block, to obtain def / use information 36

“Advanced Compiler Techniques” Simple symbolic interpretation: next-use information  If x is computed in statement i, and is an operand of statement j, j > i, its value must be preserved ( register or memory ) until j.  If x is computed at k, k > i, the value computed at i has no further use, and be discarded ( i. e. register reused )  Next - use information is annotated over statements and symbol table.  Computed on one backwards pass over statement. 37

“Advanced Compiler Techniques” Next-Use Information  Definitions 1. Statement i assigns a value to x ; 2. Statement j has x as an operand ; 3. Control can flow from i to j along a path with no intervening assignments to x ;  Statement j uses the value of x computed at statement i.  i. e., x is live at statement i. 38

“Advanced Compiler Techniques” Computing next-use  Use symbol table to annotate status of variables  Each operand in a statement carries additional information : Operand liveness ( boolean ) Operand next use ( later statement )  On exit from block, all temporaries are dead ( no next - use ) 39

“Advanced Compiler Techniques” Algorithm  INPUT : a basic block B  OUTPUT : at each statement i : x = y op z in B, create liveness and next - use for x, y, z  METHOD : for each statement in B ( backward ) Retrieve liveness & next - use info from a table Set x to “ not live ” and “ no next - use ” Set y, z to “ live ” and the next uses of y, z to “ i ”  Note : step 2 & 3 cannot be interchanged. E. g., x = x + y 40

“Advanced Compiler Techniques” Example 1. x = 1 2. y = 1 3. x = x + y 4. z = y 5. x = y + z Exit : x : live, 6 y : not live z : not live 41 Exit : x : live, 6 y : not live z : not live 5: x : not live, no y : live, 5 z : live, 5 4: x : not live, no y : live, 4 z : not live, no 3: x : live, 3 y : live, 3 z : not live, no 2: x : live, 3 y : not live, no z : not live, no 1: x : not live, no y : not live, no z : not live, no

“Advanced Compiler Techniques” Computing dependencies in BB: the DAG  Use directed acyclic graph ( DAG ) to recognize common subexpressions and remove redundant quadruples.  Intermediate code optimization : basic block => DAG => improved block => assembly  Leaves are labeled with identifiers and constants.  Internal nodes are labeled with operators and identifiers 42

DAG Representation of Basic Blocks  Construct a DAG for a basic block 1. There is a node in the DAG for each of the initial values of the variables appearing in the basic block. 2. There is a node N associated with each statement s within the block. The children of N are those nodes corresponding to statements that are the last definitions, prior to s, of the operands used by s. 3. Node N is labeled by the operator applied at s, and also attached to N is the list of variables for which it is the last definition within the block. 4. Certain nodes are designated output nodes. These are the nodes whose variables are live on exit from the block ; that is, their values may be used later, in another block of the flow graph. “Advanced Compiler Techniques” 43

“Advanced Compiler Techniques” DAG construction  Forward pass over basic block  For x = y op z ; Find node labeled y, or create one Find node labeled z, or create one Create new node for op, or find an existing one with descendants y, z ( need hash scheme ) Add x to list of labels for new node Remove label x from node on which it appeared  For x = y ; Add x to list of labels of node which currently holds y 44 a = b + c b = a – d c = b + c d = a - d + b0c0 a — d0 + c b d

“Advanced Compiler Techniques” Finding Local Common Subexpr.  Suppose b is not live on exit. a = b + c b = a – d c = b + c d = a - d + + - b0b0 c0c0 d0d0 a b, d c a = b + c d = a – d c = d + c 45 a = b + c d = a – d b = d c = d + c

“Advanced Compiler Techniques” LCS: another example a = b + c b = b – d c = c + d e = b + c + + - b0b0 c0c0 d0d0 ab e + c 46

“Advanced Compiler Techniques” Dead Code Elimination  Delete any root that has no live variables attached  Repeated application of this transformation will remove all nodes from the DAG that correspond to dead code. a = b + c b = b – d c = c + d e = b + c + + - b0b0 c0c0 d0d0 ab e + c On exit : a, b live c, e not live a = b + c b = b – d 47

The Use of Algebraic Identities  Eliminate computations  Reduction in strength  Constant folding  2*3.14 = 6.28 evaluated at compile time  Other algebraic transformations x * y => y * x x > y => x - y >0 a = b + c ; e = c + d + b ; => a = b + c ; e = a + d ; “Advanced Compiler Techniques” 48

Representation of Array References  x = a [ i ]  a [ j ]= y  killed node “Advanced Compiler Techniques” 49 x = a [ i ] a [ j ] = y z = a [ i ] z = x??

Representation of Array References a is an array. b is a position in the array a. x is killed by b [ j ]= y. “Advanced Compiler Techniques” 50 b = a + 12 x = b [ i ] b [ j ] = y

Pointer Assign. & Proc. Calls  Problem of the following assignments x = * p * q = y we do not know what p or q point to. x = * p is a use of every variable * q = y is a possible assignment to every variable. the operator =* must take all nodes that are currently associated with identifiers as arguments, which is relevant for dead - code elimination. the *= operator kills all other nodes so far constructed in the DAG. Global pointer analyses can be used to limit the set of variables  Procedure calls behave much like assignments through pointers. Assume that a procedure uses and changes any data to which it has access. If variable x is in the scope of a procedure P, a call to P both uses the node with attached variable x and kills that node. “Advanced Compiler Techniques” 51

Reassembling BBs From DAG 's b is not live on exit b is live on exit “Advanced Compiler Techniques” 52

Reassembling BBs From DAG 's  The rules of reassembling The order of instructions must respect the order of nodes in the DAG Assignments to an array must follow all previous assignments to, or evaluations from, the same array Evaluations of array elements must follow any previous assignments to the same array Any use of a variable must follow all previous procedure calls or indirect assignments through a pointer. Any procedure call or indirect assignment through a pointer must follow all previous evaluations of any variable. “Advanced Compiler Techniques” 53

“Advanced Compiler Techniques” Peephole Optimization  Dragon§ 8.7  Introduction to peephole  Common techniques  Algebraic identities  An example 54

“Advanced Compiler Techniques” Peephole Optimization  Simple compiler do not perform machine - independent code improvement They generates naive code  It is possible to take the target hole and optimize it Sub - optimal sequences of instructions that match an optimization pattern are transformed into optimal sequences of instructions This technique is known as peephole optimization Peephole optimization usually works by sliding a window of several instructions ( a peephole ) 55

Peephole Optimization  Goals : - improve performance - reduce memory footprint - reduce code size  Method : 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one. 56 redundant - instruction elimination algebraic simplifications flow - of - control optimizations use of machine idioms “Advanced Compiler Techniques”

Peephole Optimization Common Techniques 57

“Advanced Compiler Techniques” Peephole Optimization Common Techniques 58

“Advanced Compiler Techniques” Peephole Optimization Common Techniques 59

“Advanced Compiler Techniques” Peephole Optimization Common Techniques 60

“Advanced Compiler Techniques” Algebraic identities  Worth recognizing single instructions with a constant operand Eliminate computations  A * 1 = A  A * 0 = 0  A / 1 = A Reduce strenth  A * 2 = A + A  A /2 = A * 0.5 Constant folding  2 * 3.14 = 6.28  More delicate with floating - point 61

“Advanced Compiler Techniques” Is this ever helpful?  Why would anyone write X * 1 ?  Why bother to correct such obvious junk code?  In fact one might write # define MAX_TASKS 1... a = b * MAX_TASKS ;  Also, seemingly redundant code can be produced by other optimizations. This is an important effect. 62

Replace Multiply by Shift  A := A * 4; Can be replaced by 2- bit left shift ( signed / unsigned ) But must worry about overflow if language does  A := A / 4; If unsigned, can replace with shift right But shift right arithmetic is a well - known problem Language may allow it anyway ( traditional C ) 63 “Advanced Compiler Techniques”

The Right Shift problem  Arithmetic Right shift : shift right and use sign bit to fill most significant bits -5 111111...1111111011 SAR 111111...1111111101 which is -3, not -2 in most languages -5/2 = -2 64 “Advanced Compiler Techniques”

Addition chains for multiplication  If multiply is very slow ( or on a machine with no multiply instruction like the original SPARC ), decomposing a constant operand into sum of powers of two can be effective : X * 125 = x * 128 - x *4 + x two shifts, one subtract and one add, which may be faster than one multiply Note similarity with efficient exponentiation method 65 “Advanced Compiler Techniques”

Flow-of-control optimizations 66 goto L1... L1: goto L2 goto L2... L1: goto L2 if a < b goto L1... L1: goto L2 if a < b goto L2... L1: goto L2 goto L1... L1: if a < b goto L2 L3: if a < b goto L2 goto L3... L3: “Advanced Compiler Techniques”

Peephole Opt: an Example 67 debug = 0... if(debug) { print debugging information } debug = 0... if debug = 1 goto L1 goto L2 L1: print debugging information L2: Source Code: Intermediate Code: “Advanced Compiler Techniques”

Eliminate Jump after Jump 68 Before: After: debug = 0... if debug = 1 goto L1 goto L2 L1: print debugging information L2: debug = 0... if debug  1 goto L2 print debugging information L2: “Advanced Compiler Techniques”

Constant Propagation 69 Before: After: debug = 0... if debug  1 goto L2 print debugging information L2: debug = 0... if 0  1 goto L2 print debugging information L2: “Advanced Compiler Techniques”

Unreachable Code (dead code elimination) 70 Before: After: debug = 0... debug = 0... if 0  1 goto L2 print debugging information L2: “Advanced Compiler Techniques”

Peephole Optimization Summary  Peephole optimization is very fast Small overhead per instruction since they use a small, fixed - size window  It is often easier to generate na ï ve code and run peephole optimization than generating good code ! 71

“Advanced Compiler Techniques” Summary  Introduction to optimization  Control Flow Analysis  Basic knowledge Basic blocks Control - flow graphs  Local Optimizations Peephole optimizations 72

“Advanced Compiler Techniques” HW & Next Time  Homework EX 8.4.1, 8.5.1, 8.5.2  Next Time : Dataflow analysis Dragon§ 9.2 73

If You Want to Get Started …  Go to http :// llvm. org http :// llvm. org  Download and install LLVM on your favorite Linux box Read the installation instructions to help you Will need gcc 4. x  Try to run it on a simple C program 74 “Advanced Compiler Techniques”

PSA 预测技术的编译支持  编译支持： 首先，编译器根据子程序结构信息和静态数据依赖图， 分析哪些数据值与间接转移指令具有较强关联性 然后，编译器为每个指令插入引导指令并进行调度 虚函数调用 以虚函数表首地址作 为关联数据值 虚函数调用 以虚函数表首地址作 为关联数据值 Switch-Case 语句 以规格化之后的 case 值作为关联数据值 Switch-Case 语句 以规格化之后的 case 值作为关联数据值 函数指针调用 以指针值或指针数组 索引作为关联数据值 函数指针调用 以指针值或指针数组 索引作为关联数据值

Download ppt "Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Control Flow Analysis & Local Optimizations."

Similar presentations