CSE P501 – Compiler Construction

CSE P501 – Compiler Construction
Optimization: Goals Eliminating Redundancy & Value Numbering Control flow graphs (CFGs) Optimization Scopes: local, regional, global, whole-program Dominators See Cooper&Torczon chapter 8, 9 Spring 2014 Jim Hogg - UW - CSE - P501

Code Improvement Pick a better algorithm(!)
Eg: Quicksort rather than BubbleSort Compilers ain't smart enough to choose better algorithms Use machine resources effectively Instruction selection & scheduling Register allocation More about these later… Spring 2014 Jim Hogg - UW - CSE - P501

“Optimization” No optimizations are actually “optimal”
Best we can do is to improve things Most (much?) (some?) of the time Realistically: try to do better for common idioms both in the code and on the machine Never to make it worse! Spring 2014 Jim Hogg - UW - CSE - P501

Example Optimization Classic example: Array references in a loop
for (k = 0; k < n; k++) a[k] = 0; Naive codegen for a[k] = 0 in loop body mov eax, 4 // elemsize = 4 bytes imul eax, [ebp+offsetk] // k * elemsize add eax, [ebp+offseta] // &a[0] + k * elemsize mov [eax], 0 // a[k] = 0 Better! mov eax, [ebp+offseta] // &a[0], once-off mov [eax], 0 // a[0]=0, a[1]=0, etc add eax, 4 // eax = &a[1], &a[2], etc Note: pointers allow a user to do this directly in C or C++ Eg: for (p = a; p < a + n; ) *p++ = 0; Spring 2014 Jim Hogg - UW - CSE - P501

Example Optimization: A[row, col]
column row-major layout 0,0 0,1 0,2 0,3 0,4 1,0 1,1 1,2 1,3 1,4 2,0 2,1 2,2 2,3 2,4 3,0 3,1 3,2 3,3 3,4 row 0,0 0,1 0,2 0,3 0,4 1,0 1,1 1,2 1,3 1,4 2,0 2,1 2,2 2,3 2,4 int A[row, col] + (row * numcols + col) * 4 // 0-based A[row, col] + (row – rowmin) * (colmax – colmin + 1) * elemsize + (col – colmin) * elemsize Spring 2014 Jim Hogg - UW - CSE - P501

Optimize A[,] = 0 A[i, j] = @A + (i * ncols + j) * 4
for (i = 0; i < ncols; i++) { for (j = 0; j < nrows; j++) { a[i, j] = 0; } A[i, j] + (i * ncols + j) * 4 Naive codegen for A[i, j] = 0 mov eax, [ebp+offseti] imul eax, [ebp+offsetncols] add eax, [ebp+offsetj] imul eax, 4 lea edx, [ebp+offsetA] add eax, edx mov [eax], 0 Better! lea eax, [ebp+offsetA] // &A[0], once-off mov [eax], 0 // A[i, j] = 0 add eax, 4 Strength Reduction C programmers can do this with pointers Spring 2014 Jim Hogg - UW - CSE - P501

Optimization Phase Goal
Discover, at compile time, information about the runtime behavior of the program, and use that information to improve the generated code Spring 2014 Jim Hogg - UW - CSE - P501

Common Problems in Optimization
Strategy is typical of most compiler optimizations Discover opportunities through program analysis Modify the IR to take advantage of the opportunities Historically, goal usually was to decrease execution time Other possibilities: reduce space, power, … Spring 2014 Jim Hogg - UW - CSE - P501

Issues (1) Safety – transformation must not change program meaning
Must generate correct results Can’t generate spurious errors (ie, report error when there is none) Optimizations must be conservative, and always correct never 'take a risk': 99% != 100% Eg: fetch &o.m and remember m's address if call to m then call remembered m, so long as obj.vtbl == o.vtbl Large part of analysis goes towards proving safety However, it’s not black&white. Eg: hoist bounds-check elide exception thru DCE Spring 2014 Jim Hogg - UW - CSE - P501

Issues (2) Profitability
If a transformation is possible, is it profitable? Example: loop unrolling Can increase efficiency by reducing loop overhead (fewer tests & branches) Can eliminate duplicate operations done on separate iterations But, increases code size Spring 2014 Jim Hogg - UW - CSE - P501

Issues (3) Downside risks
Even if a transformation is worthwhile, consider its side-effects Eg: Transformation might need more temporaries, increasing register pressure Increased code size could cause cache misses Increased code size could increase paging (a disaster) Spring 2014 Jim Hogg - UW - CSE - P501

Eliminating Redundancy
An expression x+y is redundant at a program point iff: along every path from the procedure’s entry, it has been evaluated and its constituent sub-expressions (x and y) have not been re-defined (ie, assigned new values) since that evaluation If the compiler can prove the expression is redundant Can store the result of the earlier evaluation: t123 = x + y; Can replace the redundant computation with a reference to the earlier (stored) result, as in = t Spring 2014 Jim Hogg - UW - CSE - P501

Value Numbering Technique for eliminating redundant expressions
Assign an identifying number VN(n) to each expression VN(x+y) == VN(j) if x+y and j have the same value Use hashing over value numbers for efficiency But hashing is not crucial to the algorithm Old idea (Balke 1968, Ershov 1954) Invented for low-level, linear IRs Equivalent methods exist for tree IRs. Eg: build a DAG Spring 2014 Jim Hogg - UW - CSE - P501

Redundancies? Basic Block 1 2 3 4 5 6
a = x * y b = x * y a = 7 c = x * y x = 3 d = x * y Are any of these IR instructions "redundant"? In 2, can I re-use a in place of b? In 4, can I re-use a in place of c? In 6, can I re-use a in place of d? Spring 2014 Jim Hogg - UW - CSE - P501

Local Value Numbering (LVN)
a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 VN of a variable is not the actual value of that variable at runtime! Two variables with same VN are guaranteed to have the same value at runtime Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Spring 2014 Jim Hogg - UW - CSE - P501

Transform Code Remove Redundancies VN'd Oops! a3 was overwritten a3 = x1 * y2 b3 = x1 * y2 a4 = 74 c3 = x1 * y2 x5 = 35 d6 = x5 * y2 a3 = x1 * y2 b3 = a3 a4 = 74 c3 = a3 x5 = 35 d6 = x5 * y2 Spring 2014 Jim Hogg - UW - CSE - P501

Solution - SSA Could remember a3 into a temporary But realize that a4 is really a different variable from a3 Eg: we could store a3 in register R7 and a4 in a different register R2; they are totally unrelated Use subscript to denote different instances of a variable Bump the lifetime subscript each time we assign into that variable This is the essence of SSA form - see later Fixed: a03 still alive a03 = x01 * y02 b03 = a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 Spring 2014

Extensions to LVN Constant Folding
Note whether each operand is a constant Evaluate any arithmetic of those constants at compile-time Replace instruction with a load-immediate (eg: mov eax, 42) Algebraic Identities x+0 = x, x - x = 0, etc, etc Humans don't write these - but may be created by index calculation, etc Commutativity x8 = a5 + b7 y9 = b7 + a5 Recognize a+b = b+a; eg: order by increasing VNs a03 = x01 * y02 b03 = a03 a14 = 74 c13 = a03 x15 = 35 d06 = x15 * y02 Spring 2014

LVN in Detail (1) Basic Block 1 2 3 4 5 6 7 8
g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v Are any of these IR instructions "redundant"? In 3 can I re-use already-calculated value of g? What about 5 - can I replace x with a re-use of h from 2? Any others? Spring 2014 Jim Hogg - UW - CSE - P501

LVN in Detail (2) Expression ValueNumber Basic Block x y <1,+,2>
3 Basic Block g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v u v <4,-,5> h 4 5 6 x y <1,+,2> i 1 2 3 VN of a variable is not the actual value of that variable at runtime! Two variables with same VN are guaranteed to have the same value at runtime Two variables with different VN may have same value at runtime, by happenstance, but it's not guaranteed Spring 2014 Jim Hogg CSE P 501

LVN in Detail (3) Expression ValueNumber x y <1,+,2> g 6 2 3 u v
<4,-,5> h 4 7 5 6 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v x y <1,+,2> i 1 2 3 3 7 Spring 2014 Jim Hogg CSE P 501

LVN in Detail (4) Expression ValueNumber Expression ValueNumber x y
<1,+,2> g 6 8 2 3 <7,-,5> 8 <3,+,6> 9 <3,+,8> 10 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v u v <4,-,5> h 4 7 9 5 10 6 <9,+,10> w 11 x y <1,+,2> i 1 2 3 3 7 Spring 2014 Jim Hogg CSE P 501

LVN in Detail: Summary Exp VN Basic Block g 3 h 6 i u g = x + y 4 7 9
w x y 3 <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> 3 6 4 7 9 5 10 11 1 8 2 7 9 10 8 g = x + y h = u - v i = x + y u = 3 x = u – v u = g + h v = i + x w = u + v Spring 2014 Jim Hogg CSE P 501

LVN in Detail - Superscript Notation
Exp Basic Block VN g h i u v w x y <1,+,2> <3,+,6> <3,+,8> <4,-,5> <7,-,5> <9,+,10> 3 6 4 7 9 5 10 11 1 8 2 9 10 8 g3 = x1 + y2 h6 = u4 - v5 i3 = x1 + y2 u7 = 37 x8 = u7 – v5 u9 = g3 + h6 v10 = i3 + x8 w11 = u9 + v10 Spring 2014 Jim Hogg CSE P 501

LVN Algorithm BasicBlock EVN-Table
foreach instr in BasicBlock, with instr ≡ lhs = <rhs1,op,rhs2> lookup rhs1 in EVN-Table, giving vn1 if vn1 < 0 then {vn1 = ++V; insert <rhs1,vn1> into EVN-Table} lookup rhs2 in EVN-Table, giving vn2 if vn2 < 0 then {vn2 = ++V; insert <rhs2,vn2> into EVN-Table} lookup <vn1,op,vn2> in EVN-Table, giving vn if vn < 0 then insert <<vn1,op,vn2>,vn> into EVN-Table else update EVN-Table to hold <<vn1,op,vn2>,vn> next V is a sequence counter, starting at 1 BasicBlock EVN-Table g = x + y h = u - v . . . g h ... <1,+,2> <3,+,6> 3 6 ... 9 For efficiency, use hash table for EVN-Table Spring 2014 Jim Hogg CSE P 501

Basic Block A basic block is a maximal length sequence of straight-line code A branch-free sequence of instructions Properties Statements are executed sequentially If any statement executes, they all do (barring exceptions) In linear IR, first instruction of a basic block is called a leader Procedure entry; jump targets; instruction following jump or call Implementation Array of quads (d = x op y) Linked list of quads Spring 2014 Jim Hogg - UW - CSE - P501

Reminder: Flowgraph B0 B2 B1 B3 B4 B5 B6 A Flowgraph is:
Directed graph of Nodes and Edges Node = Basic Block Edge = any possible flow between nodes B1 B3 B4 B5 B6 Spring 2014 Jim Hogg - UW - CSE - P501

Real Example Flowgraph
ENTRY Real Example Flowgraph B1 i = 1 B2 j = 1 Control Flow Graph ("CFG", again!) B3 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0 j = j + 1 if j <= 10 goto B3 3 loops total 2 of the loops are nested B4 i = i + 1 if i <= 10 goto B2 B5 i = 1 B6 t5 = i - 1 t6 = 88 * t5 a[t6] = 1 i = i + 1 if i <= 10 goto B6 Most of the executions likely spent in loop bodies; that's where to focus efforts at optimization Jim Hogg - UW - CSE - P501 EXIT

EBBs: Formal Definition
A set of blocks b1, b2, …, bn where b1 has multiple predecessors and each of the remaining blocks bi (2≤i≤n) have only bi-1 as its unique predecessor The EBB is entered only at b1, but may have multiple exits A single block bi can be the head of multiple EBBs (these EBBs form a tree rooted at bi) Spring 2014 Jim Hogg - UW - CSE - P501

Example Extended Basic Blocks (EBBs)
Intuition: Maximal straight-line sequence of Blocks, with no join points B2 EBBs cannot represent loops! B1 B3 B4 { B0, B1 } { B0, B2, B3 } { B0, B2, B4 } { B5 } { B6 } B5 B6 Spring 2014 Jim Hogg - UW - CSE - P501

Optimization Scopes Local Within a Basic Block Regional
Bigger than block, smaller than function Eg: loop or Extended Basic Block (EBB) Global Whole function Whole Program Spans two or more functions Maybe large part of a program (excludes libraries) BasicBlock < Region < Function < Program Spring 2014 Jim Hogg - UW - CSE - P501

Local Optimization Local optimizations ≡ within a Basic Block
Algebraic simplifications (eg: 2*x = x+x; x+0 = x) Constant folding (eg: FeetPerMile/MinutesPerHour = 88) Common sub-expression elimination (CSE) Dead code elimination (DCE) (unreachable or unused) Specialize computation based on context Etc Spring 2014 Jim Hogg - UW - CSE - P501

Regional Optimization
Bigger than a Basic-Block; less than whole function Eg: Loops Hoist invariant code out of loop Unroll-and-jam (unroll outer loop; merge inners) Loop Interchange, for better caching Array-index strength reduction Eg: Extended-Basic-Blocks (EBBs) - no-join paths Super-Local Value Numbering (SVN) Etc EBB Spring 2014 Jim Hogg - UW - CSE - P501

Global Optimization Misnomer - really means whole-function
Global common sub-expression elimination (CSE) Register allocation Choose which variables, for how long, are enregistered Choose which registers - not so important with 'orthogonal' ISAs Result memoization / caching Remember last 50, say, results calculated; re-use where possible Eg: can speed a naive Fibonacci calculation 10x or more Etc Spring 2014 Jim Hogg - UW - CSE - P501

Whole-Program Optimization (WPO)
a.k.a Inter-Procedural Optimization (IPO) Spans all, or much, of a program; certainly >1 function At-odds with "separate compilation" Inlining (assisted by profile 'training') Hot/Cold code separation Whole-Program DCE, or "tree shaking" No program uses an entire library! Remove unused classes and/or methods Can make drastic reductions in program size Spring 2014 Jim Hogg - UW - CSE - P501

Super-Local Value Numbering (SVN)
m = a + b n = a + b B C p = c + d r = c + d q = a + b r = c + d Local Value Numbering 1 block at a time Strong local results No cross-block effects LVN finds these redundancies LVN misses these D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f Au02: Cooper’s example F v = a + b w = c + d x = e + f G y = a + b z = c + d Spring 2014 Jim Hogg - UW - CSE - P501

Value Numbering over EBBs
Apply LVN to EBBs: {A,B}, {A,C,D}, {A,C,E} A m = a + b n = a + b Final info from A is initial info for B, C Final info from C is initial for D, E B C p = c + d r = c + d q = a + b r = c + d Avoid re-analyzing A, C Doesn’t help with F, G D E e = b + 18 s = a + b u = e + f e = a + 17 t = c + d u = e + f Au02: Cooper’s example F LVN SVN Missed v = a + b w = c + d x = e + f G y = a + b z = c + d Spring 2014 Jim Hogg - UW - CSE - P501

SSA Name Space (from before)
a03 = x01 * y02 b03 = a03 a14 = 74 c03 = a03 x15 = 35 d06 = x15 * y02 Unique name for each definition Subscript denotes variable's SSA number The variable's "generation" number c03 = a03 now works, since a1 = 7 happened to a 'different' variable Au02: Difference is second assignment is to a1, not a0. Spring 2014

SSA Name Space Two Principles Need to deal with merge points
Each name is defined (assigned-to) by exactly one instruction Each operand refers to exactly one definition Need to deal with merge points Add Φ functions at merge points to reconcile names Use subscripts on variable names for uniqueness Spring 2014 Jim Hogg - UW - CSE - P501

φ Functions a3 a6 A B a7 = φ(a3, a6) C
Result of a7 = φ(a3, a6) is either a3 or a6 depending on whether we reached node C from node A or from node B Not really a mathematical function at all Spring 2014 Jim Hogg - UW - CSE - P501

SVN with Bells & Whistles
A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Insert φ functions at join nodes F, G Little extra cost Still does nothing for F and G Au02: Cooper’s example F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501

Larger Scopes A B C Still have not helped F, G
m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 Still have not helped F, G Problem: multiple predecessors Must decide what facts hold in F and in G For G, combine B & F? Merging states is expensive Fall back on what we know D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Au02: Cooper’s example F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501

Dominators Definition By definition, x dominates x
x dominates y iff every path from the entry of the flowgraph to y includes x By definition, x dominates x Associate a Dom set with each node | Dom(x) | ≥ 1 Many uses in analysis and transformation Finding loops, building SSA form, code motion Spring 2014 Jim Hogg - UW - CSE - P501

Dominator Sets A B C n Dom(n) A B C D E F G {A} {A,B} {A,C} {A,C,D}
m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 n Dom(n) A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Au02: Cooper’s example F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501

Dominator Tree & IDom n Dom(n) IDom(n) A A B C D E F G {A} {A,B} {A,C} {A,C,D} {A,C,E} {A,C,F} {A,G} - A C B C G D E F Closest dominator is called the Immediate Dominator Denoted IDom(n) Spring 2014 Jim Hogg - UW - CSE - P501

Dominator Value Numbering
m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Still looking for a way to handle F, G Use info from IDom(x) to start analysis of x Use C for F and A for G Dominator VN Technique (DVNT) Au02: Cooper’s example F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501

DVN algorithm Use SVN on EBBs
Use scoped hash tables & SSA namespace as before Start each node with EVN-Table from its IDOM No values flow along back edges (i.e., loops) Constant folding, algebraic identities as before Spring 2014 Jim Hogg - UW - CSE - P501

Dominator Value Numbering
Advantages Finds more redundancies Little extra cost Shortcomings Misses some opportunities (common calculations in ancestors that are not IDOMs) Doesn’t handle loops or other back edges A m0 = a0 + b0 n0 = a0 + b0 B C p0 = c0 + d0 r0 = c0 + d0 q0 = a0 + b0 r1 = c0 + d0 D E e0 = b0 + 18 s0 = a0 + b0 u0 = e0 + f0 e1 = a0 + 17 t0 = c0 + d0 u1 = e1 + f0 Au02: Cooper’s example F e2 = Φ(e0,e1) u2 = Φ(u0,u1) v0 = a0 + b0 w0 = c0 + d0 x0 = e2 + f0 LVN SVN DVN Missed G r2 = Φ(r0,r1) y0 = a0 + b0 z0 = c0 + d0 Spring 2014 Jim Hogg - UW - CSE - P501

The Story So Far… Local algorithm: LVN Superlocal extension: SVN
Some local methods extend cleanly to superlocal scopes Dominator VN Technique (DVN) All of these propagate along forward edges None are global (can't cope with loops) Spring 2014 Jim Hogg - UW - CSE - P501

Next Dataflow analysis Transformations
Global solution to redundant expression analysis Transformations Catalog of transforms a compiler can do with dataflow analysis Spring 2014 Jim Hogg - UW - CSE - P501

CSE P501 – Compiler Construction

Similar presentations

Presentation on theme: "CSE P501 – Compiler Construction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE P501 – Compiler Construction

Similar presentations

Presentation on theme: "CSE P501 – Compiler Construction"— Presentation transcript:

Similar presentations

About project

Feedback