Download presentation

Presentation is loading. Please wait.

Published byHailie Mucklow Modified about 1 year ago

1
1 Classical Optimization Types of classical optimizations Operation level: one operation in isolation Local: optimize pairs of operations in same basic block (with or without dataflow analysis), e.g. peephole optimization Global: optimize pairs of operations spanning multiple basic blocks and must use dataflow analysis in this case, e.g. reaching definitions, UD/DU chains, or SSA forms Loop: optimize loop body and nested loops

2
2 Local Constant Folding Goal: eliminate unnecessary operations Rules: 1. X is an arithmetic operation 2. If src1(X) and src2(X) are constant, then change X by applying the operation r7 = r5 = 2 * r4 r6 = r5 * 2 src1(X) = 4 src2(X) = 1

3
3 Local Constant Combining Goal: eliminate unnecessary operations First operation often becomes dead Rules: 1. Operations X and Y in same basic block 2. X and Y have at least one literal src 3. Y uses dest(X) 4. None of the srcs of X have defs between X and Y (excluding Y) r7 = 5 r5 = 2 * r4 r6 = r5 * 2 r6 = r4 * 4

4
4 Local Strength Reduction Goal: replace expensive operations with cheaper ones Rules (example): 1. X is an multiplication operation where src1(X) or src2(X) is a const 2 k integer literal 2. Change X by using shift operation 3. For k=1 can use add r7 = 5 r5 = 2 * r4 r6 = r4 * 4 r6 = r4 << 2 r5 = r4 + r4

5
5 Local Constant Propagation Goal: replace register uses with literals (constants) in single basic block Rules: 1. Operation X is a move to register with src1(X) literal 2. Operation Y uses dest(X) 3. There is no def of dest(X) between X and Y (excluding defs at X and Y) 4. Replace dest(X) in Y with src1(X) r1 = 5 r2 = _x r3 = 7 r4 = r4 + r1 r1 = r1 + r2 r1 = r1 + 1 r3 = 12 r8 = r1 - r2 r9 = r3 + r5 r3 = r2 + 1 r7 = r3 - r1 M[r7] = 0

6
6 Local Common Subexpression Elimination (CSE) Goal: eliminate recomputations of an expression More efficient code Resulting moves can get copy propagated (see later) Rules: 1. Operations X and Y have the same opcode and Y follows X 2. src(X) = src(Y) for all srcs 3. For all srcs, no def of a src between X and Y (excluding Y) 4. No def of dest(X) between X and Y (excluding X and Y) 5. Replace Y with move dest(Y) = dest(X) r1 = r2 + r3 r4 = r4 + 1 r1 = 6 r6 = r2 + r3 r2 = r1 - 1 r5 = r4 + 1 r7 = r2 + r3 r5 = r1 - 1

7
7 Dead Code Elimination Goal: eliminate any operation who’s result is never used Rules (dataflow required) 1. X is an operation with no use in DU chain, i.e. dest(X) is not live 2. Delete X if removable (not a mem store or branch) Rules too simple! Misses deletion of r4, even after deleting r7, since r4 is live in loop Better is to trace UD chains backwards from “critical” operations r4 = r4 + 1 r7 = r1 * r4 r3 = r3 + 1r2 = 0 r3 = r2 + r1 M[r1] = r3 r1 = 3 r2 = 10

8
8 Local Backward Copy Propagation Goal: propagate LHS of moves backward Eliminates useless moves Rules (dataflow required) 1. X and Y in same block 2. Y is a move to register 3. dest(X) is a register that is not live out of the block 4. Y uses dest(X) 5. dest(Y) not used or defined between X and Y (excluding X and Y) 6. No uses of dest(X) after the first redef of dest(Y) 7. Replace src(Y) on path from X to Y with dest(X) and remove Y r1 = r8 + r9 r2 = r9 + r1 r4 = r2 r6 = r2 + 1 r9 = r1 r7 = r6 r5 = r6 + 1 r4 = 0 r8 = r2 + r7

9
9 Global Constant Propagation Goal: globally replace register uses with literals Rules (dataflow required) 1. X is a move to a register with src1(X) literal 2. Y uses dest(X) 3. dest(X) has only one def at X for UD chains to Y 4. Replace dest(X) in Y with src1(X) r5 = 2 r7 = r1 * r5 r3 = r3 + r5r2 = 0 r3 = r2 + r1 r6 = r7 * r4 M[r1] = r3 r1 = 4 r2 = 10

10
10 Global Constant Propagation with SSA Goal: globally replace register uses with literals Rules (high level) 1. For operation X with a register src(X) 2. Find def of src(X) in chain 3. If def is move of literal, src(X) is constant: done 4. If RHS of def is an operation, including node, recurse on all srcs 5. Apply rule for operation to determine src(X) constant 6. Note: abstract values T (top) and (bottom) are often used to indicate unknown values r5 = 2 r7 = r1 * r5 r3 = r3 + r5r2 = 0 r3 = r2 + r1 r6 = r7 * r4 M[r1] = r3 r1 = 4 r2 = 10 Exercise: compute SSA form and propagate constants

11
11 Forward Copy Propagation Goal: globally propagate RHS of moves forward Reduces dependence chain May be possible to eliminate moves Rules (dataflow required) 1. X is a move with src1(X) register 2. Y uses dest(X) 3. dest(X) has only one def at X for UD chains to Y 4. src1(X) has no def on any path from X to Y 5. Replace dest(X) in Y with src1(X) r1 = r2 r3 = r4 r6 = r3 + 1r2 = 0 r5 = r2 + r3

12
12 Global Common Subexpression Elimination (CSE) Goal: eliminate recomputations of an expression Rules: 1. X and Y have the same opcode and X dominates Y 2. src(X) = src(Y) for all srcs 3. For all srcs, no def of a src on any path between X and Y (excluding Y) 4. Insert rx = dest(X) immediately after X for new register rx 5. Replace Y with move dest(Y) = rx r1 = r2 * r6 r3 = r4 / r7 r2 = r2 + 1r1 = r3 * 7 r5 = r2 * r6 r8 = r4 / r7 r9 = r3 * 7

13
13 Loop Optimizations Loops are the most important target for optimization Programs spend much time in loops Loop optimizations Invariant code removal (aka. code motion) Global variable migration Induction variable strength reduction Induction variable elimination

14
14 Code Motion Goal: move loop-invariant computations to preheader Rules: 1. Operation X in block that dominates all exit blocks 2. X is the only operation to modify dest(X) in loop body 3. All srcs of X have no defs in any of the basic blocks in the loop body 4. Move X to end of preheader 5. Note 1: if one src of X is a memory load, need to check for stores in loop body 6. Note 2: X must be movable and not cause exceptions r4 = M[r5] r7 = r4 * 3 r8 = r2 + 1 r7 = r8 * r4 r3 = r2 + 1 r1 = r1 + r7 M[r1] = r3 r1 = 0 preheader header

15
15 Global Variable Migration Goal: assign a global variable to a register for the entire duration of a loop Rules: 1. X is a load or store to M[x] 2. Address x of M[x] not modified in loop 3. Replace all M[x] in loop by new register rx 4. Add rx = M[x] to preheader 5. Add M[x] = rx to each loop exit 6. Memory disambiguation is required: all mem ops in loop whose address can equal x must use same address x r4 = M[r5] r4 = r4 + 1 r8 = M[r5] r7 = r8 * r4 M[r5] = r4 M[r5] = r7

16
16 Loop Strength Reduction (1) Goal: create basic IVs from derived IVs Rules 1. X is a *, <<, +, or - operation 2. src1(X) is a basic IV 3. src2(X) is invariant 4. No other ops modify dest(X) 5. dest(X) != src(X) for all srcs 6. dest(X) is a register r5 = r4 - 3 r4 = r4 + 1 r7 = r4 * r9 r6 = r4 << 2 preheader header src1(X) = r4 src2(X) = r9 dest(X) = r7 Basic IV r4 has triple ( r4, 1, ?)

17
17 Loop Strength Reduction (2) Transformation 1. Insert into the bottom of the preheader: new_reg = RHS(X) 2. If opcode(X) is not + or -, then insert into the bottom of the preheader: new_inc = inc(src1(X)) opcode(X) src2(X) 3. Else new_inc = inc(src1(X)) 4. Insert at each update of src1(X): new_reg += new_inc 5. Change X by: dest(X) = new_reg r5 = r4 - 3 r4 = r4 + 1 r1 = r1 + r2 r7 = r1 r6 = r4 << 2 r1 = r4 * r9 r2 = 1 * r9 Exercise: apply strength reduction to r5 and r6

18
18 IV Elimination (1) Goal: remove unnecessary basic IVs from the loop by substituting uses with another basic IV Rules for IVs with same increment and initial value: 1. Find two basic IV x and y 2. If x and y in same family and have same increment and initial values 3. Incremented at same place 4. x is not live at loop exit 5. For each basic block where x is defined, there are no uses of x between first/last def of x and last/first def of y 6. Replace uses of x with y r1 = r1 - 1 r2 = r2 - 1 r9 = r2 + r4r7 = r1 * r9 r4 = M[r1] M[r2] = r7 r1 = 0 r2 = 0 Exercise: apply IV elimination

19
19 IV Elimination (2) Many variants, from simple to complex: 1. Trivial cases: IV variable that is never used except by the increment operations and is not live at loop exit 2. IVs with same increment and same initial value 3. IVs with same increment and initial values are known constant offset from each other 4. IVs with same increment, but initial values unknown 5. IVs with different increments and no info on initial values Method 1 and 2 are virtually free, so always applied Methods 3 to 5 require preheader operations

20
20 IV Elimination (3) Example for method 4 r3 = M[r1+4] r4 = M[r2+8] … r1 = r1 + 4 r2 = r2 + 4 r1 = ? r2 = ? r3 = M[r1+4] r4 = M[r1+r5] … r1 = r1 + 4 r1 = ? r2 = ? r5 = r2-r1+8

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google