Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Compiler Techniques

Similar presentations


Presentation on theme: "Advanced Compiler Techniques"— Presentation transcript:

1 Advanced Compiler Techniques
Control Flow Analysis & Local Optimizations LIU Xianhua School of EECS, Peking University

2 Levels of Optimizations
Local inside a basic block Global (intraprocedural) Across basic blocks Whole procedure analysis Interprocedural Across procedures Whole program analysis “Advanced Compiler Techniques”

3 The Golden Rules of Optimization
Premature Optimization is Evil Donald Knuth, premature optimization is the root of all evil Optimization can introduce new, subtle bugs Optimization usually makes code harder to understand and maintain Get your code right first, then, if really needed, optimize it Document optimizations carefully Keep the non-optimized version handy, or even as a comment in your code “Advanced Compiler Techniques”

4 The Golden Rules of Optimization
The 80/20 Rule In general, 80% percent of a program’s execution time is spent executing 20% of the code 90%/10% for performance-hungry programs Spend your time optimizing the important 10/20% of your program Optimize the common case even at the cost of making the uncommon case slower “Advanced Compiler Techniques”

5 The Golden Rules of Optimization
Good Algorithms Rule The best and most important way of optimizing a program is using good algorithms E.g. O(n*logn) rather than O(n2) However, we still need lower level optimization to get more of our programs In addition, asymptotic complexity is not always an appropriate metric of efficiency Hidden constant may be misleading E.g. a linear time algorithm than runs in 100*n+100 time is slower than a cubic time algorithm than runs in n3+10 time if the problem size is small “Advanced Compiler Techniques”

6 General Optimization Techniques
Strength reduction Use the fastest version of an operation E.g. x >> 2 instead of x / 4 x << 1 instead of x * 2 Common sub expression elimination Eliminate redundant calculations double x = d * (lim / max) * sx; double y = d * (lim / max) * sy; double depth = d * (lim / max); double x = depth * sx; double y = depth * sy; “Advanced Compiler Techniques”

7 General Optimization Techniques
Code motion Invariant expressions should be executed only once E.g. for (int i = 0; i < x.length; i++) x[i] *= Math.PI * Math.cos(y); double picosy = Math.PI * Math.cos(y); x[i] *= picosy; “Advanced Compiler Techniques”

8 General Optimization Techniques
Loop unrolling The overhead of the loop control code can be reduced by executing more than one iteration in the body of the loop. E.g. double picosy = Math.PI * Math.cos(y); for (int i = 0; i < x.length; i++) x[i] *= picosy; for (int i = 0; i < x.length; i += 2) { x[i+1] *= picosy; } “Advanced Compiler Techniques”

9 Compiler Optimizations
Compilers try to generate good code i.e. Fast Code improvement is challenging Many problems are NP-hard Code improvement may slow down the compilation process In some domains, such as just-in-time compilation, compilation speed is critical “Advanced Compiler Techniques”

10 “Advanced Compiler Techniques”
Phases of Compilation The first three phases are language-dependent The last two are machine-dependent The middle two dependent on neither the language nor the machine “Advanced Compiler Techniques”

11 “Advanced Compiler Techniques”
Phases “Advanced Compiler Techniques”

12 “Advanced Compiler Techniques”
Control Flow Control transfer = branch (taken or fall-through) Control flow Branching behavior of an application What sequences of instructions can be executed Execution  Dynamic control flow Direction of a particular instance of a branch Predict, speculate, squash, etc. Compiler  Static control flow Not executing the program Input not known, so what could happen Control flow analysis Determining properties of the program branch structure Determining instruction execution properties “Advanced Compiler Techniques”

13 “Advanced Compiler Techniques”
Basic Blocks A basic block is a maximal sequence of consecutive three-address instructions with the following properties: The flow of control can only enter the basic block thru the 1st instruction in the block. (no jumps into the middle of the block) Control will leave the block without halting or branching, except possibly at the last instruction in the block. Basic blocks become the nodes of a flow graph, with edges indicating the order. “Advanced Compiler Techniques”

14 Examples for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0
t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0.0 j = j + 1 if j <= 10 goto (3) i = i + 1 if i <= 10 goto (2) t5 = i - 1 t6 = 88 * t5 a[t6] = 1.0 if i <= 10 goto (13) for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0 a[i,i]=1.0

15 Identifying Basic Blocks
Input: sequence of instructions instr(i) Output: A list of basic blocks Method: Identify leaders: the first instruction of a basic block Iterate: add subsequent instructions to basic block until we reach another leader “Advanced Compiler Techniques”

16 “Advanced Compiler Techniques”
Identifying Leaders Rules for finding leaders in code First instr in the code is a leader Any instr that is the target of a (conditional or unconditional) jump is a leader Any instr that immediately follow a (conditional or unconditional) jump is a leader “Advanced Compiler Techniques”

17 Basic Block Partition Algorithm
leaders = {1} // start of program for i = 1 to |n| // all instructions if instr(i) is a branch leaders = leaders U targets of instr(i) U instr(i+1) worklist = leaders While worklist not empty x = first instruction in worklist worklist = worklist – {x} block(x) = {x} for i = x + 1; i <= |n| && i not in leaders; i++ block(x) = block(x) U {i} “Advanced Compiler Techniques”

18 “Advanced Compiler Techniques”
Basic Block Example A i = 1 j = 1 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0.0 j = j + 1 if j <= 10 goto (3) i = i + 1 if i <= 10 goto (2) t5 = i - 1 t6 = 88 * t5 a[t6] = 1.0 if i <= 10 goto (13) B C Leaders Basic Blocks D E F “Advanced Compiler Techniques”

19 “Advanced Compiler Techniques”
Control-Flow Graphs Control-flow graph: Node: an instruction or sequence of instructions (a basic block) Two instructions i, j in same basic block iff execution of i guarantees execution of j Directed edge: potential flow of control Distinguished start node Entry & Exit First & last instruction in program “Advanced Compiler Techniques”

20 “Advanced Compiler Techniques”
Control-Flow Edges Basic blocks = nodes Edges: Add directed edge between P and S if: Jump/branch from last statement of P to first statement of S, or According to the initial order, S immediately follows P in program order and P does not end with unconditional branch (goto/return/call) Definition of predecessor and successor P is a predecessor of S S is a successor of P “Advanced Compiler Techniques”

21 Control-Flow Edge Algorithm
Input: block(i), sequence of basic blocks Output: CFG where nodes are basic blocks for i = 1 to the number of blocks x = last instruction of block(i) if instr(x) is a branch/jump for each target y of instr(x), create edge (i -> y) if instr(x) is not unconditional branch, create edge (i -> i+1) “Advanced Compiler Techniques”

22 “Advanced Compiler Techniques”
Dominator Defn: Dominator – Given a CFG(V, E, Entry, Exit), a node x dominates a node y, if every path from the Entry block to y contains x In the reverse direction, node x post-dominates block y if every path from y to the exit has to pass through block x. Some properties of dominators: Reflexivity, transitivity, anti-symmetry If x dominates z and y dominates z, then either x dominates y or y dominates x Intuition Given some BB, which blocks are guaranteed to have executed prior to executing the BB “Advanced Compiler Techniques”

23 “Advanced Compiler Techniques”
Dominator Tree It is said that a block x immediately dominates block y if x dominates y, and there is no intervening block P such that x dominates P and P dominates y. In other words, x is the last dominator on all paths from entry to y. Each block has a unique immediate dominator. A dominator tree is a tree where each node's children are those nodes it immediately dominates. Because the immediate dominator is unique, it is a tree. The start node is the root of the tree. {1} 1 1 2 {1,2} 4 {1,4} 2 4 3 5 {1,2,3} 3 5 {1,5} “Advanced Compiler Techniques”

24 “Advanced Compiler Techniques”
Loops Loops comes from while, do-while, for, goto…… Many transformation depends on loops Back edge: An edge is a back edge if its head dominates its tail. Loop definition: A set of nodes L in a CFG is a loop if There is a node called the loop entry: no other node in L has a predecessor outside L. Every node in L has a nonempty path (within L) to the entry of L. “Advanced Compiler Techniques”

25 “Advanced Compiler Techniques”
Example: Back Edges {1} 1 CFG(Control Flow Graph) 2 {1,2} 4 {1,4} 3 5 {1} {1,2,3} 1 {1,5} 2 {1,2} 4 {1,4} DAG(Directed Acyclic Graph) 3 5 {1,2,3} {1,5} “Advanced Compiler Techniques”

26 “Advanced Compiler Techniques”
Loop Examples {B3} {B6} {B2, B3, B4} “Advanced Compiler Techniques”

27 “Advanced Compiler Techniques”
Identifying Loops Motivation majority of runtime focus optimization on loop bodies! remove redundant code, replace expensive operations ) speed up program Finding loops: easy… i = 1; j = 1; k = 1; A1: if i > 1000 goto L1; A2: if j > 1000 goto L2; A3: if k > 1000 goto L3; do something k = k + 1; goto A3; L3: j = j + 1; goto A2; L2: i = i + 1; goto A1; L1: halt for i = 1 to 1000 for j = 1 to 1000 for k = 1 to 1000 do something or harder (GOTOs) “Advanced Compiler Techniques”

28 Interval Analysis(T1/T2 Trans)
T1 Transformation T2 Transformation “Advanced Compiler Techniques”

29 Interval Analysis(T1/T2 Trans)
4 3 5 “Advanced Compiler Techniques”

30 Interval Analysis(T1/T2 Trans)
14 T1 23 5 “Advanced Compiler Techniques”

31 Interval Analysis(T1/T2 Trans)
14 23 5 “Advanced Compiler Techniques”

32 Interval Analysis(T1/T2 Trans)
12345 12345 “Advanced Compiler Techniques”

33 “Advanced Compiler Techniques”
Structure Analysis 静态特征 特征描述 1 SS_No. 典型子结构唯一标识 2 Edge_No. 典型子结构中控制流边的唯一标识 3 I_last_of_head 该边首基本块最后一条指令的操作码 4 Br_direction 该边首基本块最后一条指令的跳转方向 5 I_pre_last 该边首基本块最后一条指令的前一条指令的操作码 “Advanced Compiler Techniques”

34 “Advanced Compiler Techniques”
Weighted CFG Profiling – Run the application on 1 or more sample inputs, record some behavior Control flow profiling edge profile block profile Path profiling Cache profiling Memory dependence profiling Annotate control flow profile onto a CFG  weighted CFG Optimize more effectively with profile info!! Optimize for the common case Make educated guess Entry 20 BB1 10 10 BB2 BB3 10 10 BB4 20 BB5 BB6 20 BB7 20 Exit “Advanced Compiler Techniques”

35 “Advanced Compiler Techniques”
Local Optimization Optimization of basic blocks §8.5 “Advanced Compiler Techniques”

36 Transformations on basic blocks
eliminating local common sub-expressions eliminating dead code reordering statements that do not depend on one another applying algebraic laws to reorder operands of three-address instructions All of the above require symbolic execution of the basic block, to obtain def/use information “Advanced Compiler Techniques”

37 Simple symbolic interpretation: next-use information
If x is computed in statement i, and is an operand of statement j, j > i, its value must be preserved (register or memory) until j. If x is computed at k, k > i, the value computed at i has no further use, and be discarded (i.e. register reused) Next-use information is annotated over statements and symbol table. Computed on one backwards pass over statement. “Advanced Compiler Techniques”

38 “Advanced Compiler Techniques”
Next-Use Information Definitions Statement i assigns a value to x; Statement j has x as an operand; Control can flow from i to j along a path with no intervening assignments to x; Statement j uses the value of x computed at statement i. i.e., x is live at statement i. “Advanced Compiler Techniques”

39 “Advanced Compiler Techniques”
Computing next-use Use symbol table to annotate status of variables Each operand in a statement carries additional information: Operand liveness (boolean) Operand next use (later statement) On exit from block, all temporaries are dead (no next-use) “Advanced Compiler Techniques”

40 “Advanced Compiler Techniques”
Algorithm INPUT: a basic block B OUTPUT: at each statement i: x=y op z in B, create liveness and next-use for x, y, z METHOD: for each statement in B (backward) Retrieve liveness & next-use info from a table Set x to “not live” and “no next-use” Set y, z to “live” and the next uses of y,z to “i” Note: step 2 & 3 cannot be interchanged. E.g., x = x + y “Advanced Compiler Techniques”

41 “Advanced Compiler Techniques”
Example x = 1 y = 1 x = x + y z = y x = y + z Exit: x: live, 6 y: not live z: not live 3: x: live, 3 y: live, 3 z: not live, no 5: x: not live, no y: live, 5 z: live, 5 2: x: live, 3 y: not live, no z: not live, no Exit: x: live, 6 y: not live z: not live 4: x: not live, no y: live, 4 z: not live, no 1: x: not live, no y: not live, no z: not live, no “Advanced Compiler Techniques”

42 Computing dependencies in BB: the DAG
Use directed acyclic graph (DAG) to recognize common subexpressions and remove redundant quadruples. Intermediate code optimization: basic block => DAG => improved block => assembly Leaves are labeled with identifiers and constants. Internal nodes are labeled with operators and identifiers “Advanced Compiler Techniques”

43 DAG Representation of Basic Blocks
lec08-memoryorg DAG Representation of Basic Blocks December 29, 2018 Construct a DAG for a basic block 1. There is a node in the DAG for each of the initial values of the variables appearing in the basic block. 2. There is a node N associated with each statement s within the block. The children of N are those nodes corresponding to statements that are the last definitions, prior to s, of the operands used by s. 3. Node N is labeled by the operator applied at s, and also attached to N is the list of variables for which it is the last definition within the block. 4. Certain nodes are designated output nodes. These are the nodes whose variables are live on exit from the block; that is, their values may be used later, in another block of the flow graph. “Advanced Compiler Techniques”

44 “Advanced Compiler Techniques”
DAG construction Forward pass over basic block For x = y op z; Find node labeled y, or create one Find node labeled z, or create one Create new node for op, or find an existing one with descendants y, z (need hash scheme) Add x to list of labels for new node Remove label x from node on which it appeared For x = y; Add x to list of labels of node which currently holds y a = b + c b = a – d c = b + c d = a - d c b d + a + d0 b0 c0 “Advanced Compiler Techniques”

45 Finding Local Common Subexpr.
Suppose b is not live on exit. a = b + c b = a – d c = b + c d = a - d c + b, d - + a d0 a = b + c d = a – d c = d + c a = b + c d = a – d b = d c = d + c b0 c0 “Advanced Compiler Techniques”

46 “Advanced Compiler Techniques”
LCS: another example + - b0 c0 d0 a b e c a = b + c b = b – d c = c + d e = b + c “Advanced Compiler Techniques”

47 “Advanced Compiler Techniques”
Dead Code Elimination Delete any root that has no live variables attached Repeated application of this transformation will remove all nodes from the DAG that correspond to dead code. a = b + c b = b – d c = c + d e = b + c + - b0 c0 d0 a b e c On exit: a, b live c, e not live a = b + c b = b – d “Advanced Compiler Techniques”

48 The Use of Algebraic Identities
Eliminate computations Reduction in strength Constant folding 2*3.14 = evaluated at compile time Other algebraic transformations x*y => y*x x>y => x-y>0 a=b+c; e=c+d+b; => a=b+c; e=a+d; “Advanced Compiler Techniques”

49 Representation of Array References
lec08-memoryorg Representation of Array References December 29, 2018 x = a[i] a[j]=y killed node x = a[i] a[j] = y z = a[i] z = x?? “Advanced Compiler Techniques”

50 Representation of Array References
lec08-memoryorg Representation of Array References December 29, 2018 b = a + 12 x = b[i] b[j] = y a is an array. b is a position in the array a. x is killed by b[j]=y. “Advanced Compiler Techniques”

51 Pointer Assign. & Proc. Calls
Problem of the following assignments x = *p *q = y we do not know what p or q point to. x = *p is a use of every variable *q = y is a possible assignment to every variable. the operator =* must take all nodes that are currently associated with identifiers as arguments, which is relevant for dead-code elimination. the *= operator kills all other nodes so far constructed in the DAG. Global pointer analyses can be used to limit the set of variables Procedure calls behave much like assignments through pointers. Assume that a procedure uses and changes any data to which it has access. If variable x is in the scope of a procedure P, a call to P both uses the node with attached variable x and kills that node. “Advanced Compiler Techniques”

52 Reassembling BBs From DAG 's
b is not live on exit b is live on exit “Advanced Compiler Techniques”

53 Reassembling BBs From DAG 's
The rules of reassembling The order of instructions must respect the order of nodes in the DAG Assignments to an array must follow all previous assignments to, or evaluations from, the same array Evaluations of array elements must follow any previous assignments to the same array Any use of a variable must follow all previous procedure calls or indirect assignments through a pointer. Any procedure call or indirect assignment through a pointer must follow all previous evaluations of any variable. “Advanced Compiler Techniques”

54 Peephole Optimization
Dragon§8.7 Introduction to peephole Common techniques Algebraic identities An example “Advanced Compiler Techniques”

55 Peephole Optimization
Simple compiler do not perform machine-independent code improvement They generates naive code It is possible to take the target hole and optimize it Sub-optimal sequences of instructions that match an optimization pattern are transformed into optimal sequences of instructions This technique is known as peephole optimization Peephole optimization usually works by sliding a window of several instructions (a peephole) “Advanced Compiler Techniques”

56 Peephole Optimization
Goals: - improve performance - reduce memory footprint - reduce code size Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one. redundant-instruction elimination algebraic simplifications flow-of-control optimizations use of machine idioms “Advanced Compiler Techniques”

57 Peephole Optimization Common Techniques
“Advanced Compiler Techniques”

58 Peephole Optimization Common Techniques
“Advanced Compiler Techniques”

59 Peephole Optimization Common Techniques
“Advanced Compiler Techniques”

60 Peephole Optimization Common Techniques
“Advanced Compiler Techniques”

61 “Advanced Compiler Techniques”
Algebraic identities Worth recognizing single instructions with a constant operand Eliminate computations A * 1 = A A * 0 = 0 A / 1 = A Reduce strenth A * 2 = A + A A/2 = A * 0.5 Constant folding 2 * 3.14 = 6.28 More delicate with floating-point “Advanced Compiler Techniques”

62 “Advanced Compiler Techniques”
Is this ever helpful? Why would anyone write X * 1? Why bother to correct such obvious junk code? In fact one might write #define MAX_TASKS a = b * MAX_TASKS; Also, seemingly redundant code can be produced by other optimizations. This is an important effect. “Advanced Compiler Techniques”

63 Replace Multiply by Shift
A := A * 4; Can be replaced by 2-bit left shift (signed/unsigned) But must worry about overflow if language does A := A / 4; If unsigned, can replace with shift right But shift right arithmetic is a well-known problem Language may allow it anyway (traditional C) “Advanced Compiler Techniques”

64 The Right Shift problem
Arithmetic Right shift: shift right and use sign bit to fill most significant bits SAR which is -3, not -2 in most languages -5/2 = -2 “Advanced Compiler Techniques”

65 Addition chains for multiplication
If multiply is very slow (or on a machine with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective: X * = x * x*4 + x two shifts, one subtract and one add, which may be faster than one multiply Note similarity with efficient exponentiation method “Advanced Compiler Techniques”

66 Flow-of-control optimizations
goto L1 . . . L1: goto L2 goto L2 . . . L1: goto L2 if a < b goto L1 . . . L1: goto L2 if a < b goto L2 . . . L1: goto L2 goto L1 . . . L1: if a < b goto L2 L3: if a < b goto L2 goto L3 . . . L3: “Advanced Compiler Techniques”

67 Peephole Opt: an Example
debug = 0 . . . if(debug) { print debugging information } Source Code: debug = 0 . . . if debug = 1 goto L1 goto L2 L1: print debugging information L2: Intermediate Code: “Advanced Compiler Techniques”

68 Eliminate Jump after Jump
debug = 0 . . . if debug = 1 goto L1 goto L2 L1: print debugging information L2: Before: debug = 0 . . . if debug  1 goto L2 print debugging information L2: After: “Advanced Compiler Techniques”

69 “Advanced Compiler Techniques”
Constant Propagation debug = 0 . . . if debug  1 goto L2 print debugging information L2: Before: debug = 0 . . . if 0  1 goto L2 print debugging information L2: After: “Advanced Compiler Techniques”

70 Unreachable Code (dead code elimination)
debug = 0 . . . if 0  1 goto L2 print debugging information L2: Before: debug = 0 . . . After: “Advanced Compiler Techniques”

71 Peephole Optimization Summary
Peephole optimization is very fast Small overhead per instruction since they use a small, fixed-size window It is often easier to generate naïve code and run peephole optimization than generating good code! “Advanced Compiler Techniques”

72 “Advanced Compiler Techniques”
Summary Introduction to optimization Control Flow Analysis Basic knowledge Basic blocks Control-flow graphs Local Optimizations Peephole optimizations “Advanced Compiler Techniques”

73 “Advanced Compiler Techniques”
HW & Next Time Homework EX 8.4.1, 8.5.1, 8.5.2 Next Time: Dataflow analysis Dragon§9.2 “Advanced Compiler Techniques”

74 If You Want to Get Started …
Go to Download and install LLVM on your favorite Linux box Read the installation instructions to help you Will need gcc 4.x Try to run it on a simple C program “Advanced Compiler Techniques”


Download ppt "Advanced Compiler Techniques"

Similar presentations


Ads by Google