Code Generation Part II

Slides:



Advertisements
Similar presentations
Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
Advertisements

Register Allocation Consists of two parts: Goal : minimize spills
Target Code Generation
7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic 5: Peep Hole Optimization José Nelson Amaral
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1 CS 201 Compiler Construction Machine Code Generation.
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
Code optimization: –A transformation to a program to make it run faster and/or take up less space –Optimization should be safe, preserve the meaning of.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Lecture 23 Basic Blocks Topics Code Generation Readings: 9 April 17, 2006 CSCE 531 Compiler Construction.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
Code Generation Professor Yihjia Tsai Tamkang University.
Lecture 25 Generating Code for Basic Blocks Topics Code Generation Readings: April 19, 2006 CSCE 531 Compiler Construction.
Intermediate Code. Local Optimizations
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
Compilers Modern Compiler Design
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Final Code Generation and Code Optimization.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Topic #9: Target Code EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Chapter10: Code generator. 2 Code Generator Source Program Target Program Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator.
Code Generation Part I Chapter 8 (1st ed. Ch.9)
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
Code Analysis and Optimization Pat Morin COMP 3002.
More Code Generation and Optimization Pat Morin COMP 3002.
Chapter 8 Code Generation
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
High-level optimization Jakub Yaghob
Code Optimization.
Material for course thanks to:
Optimization Code Optimization ©SoftMoore Consulting.
Code Generation Part I Chapter 9
Code Generation Part III
Unit IV Code Generation
Code Generation Part I Chapter 8 (1st ed. Ch.9)
CS 201 Compiler Construction
TARGET CODE GENERATION
Code Generation Part I Chapter 9
Topic 4: Flow Analysis Some slides come from Prof. J. N. Amaral
Code Optimization Overview and Examples Control Flow Graph
Code Generation Part III
Optimizations using SSA
Control Flow Analysis (Chapter 7)
Interval Partitioning of a Flow Graph
Final Code Generation and Code Optimization
TARGET CODE -Next Usage
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Intermediate Code Generation
Compiler Construction
Target Code Generation
TARGET CODE GENERATION
CSc 453 Final Code Generation
Code Optimization.
Presentation transcript:

Code Generation Part II Chapter 8 (1st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013

Flow Graphs A flow graph is a graphical depiction of a sequence of instructions with control flow edges A flow graph can be defined at the intermediate code level or target code level MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 MOV 0,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1

Basic Blocks A basic block is a sequence of instructions Control enters through the first instruction Control leaves the block without branching, except possibly at the last instruction MOV 1,R0 MOV n,R1 JMP L2 MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1

Basic Blocks and Control Flow Graphs A control flow graph (CFG) is a directed graph with basic blocks Bi as vertices and with edges Bi  Bj iff Bj can be executed immediately after Bi MOV 1,R0 MOV n,R1 JMP L2 MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1

Successor and Predecessor Blocks Suppose the CFG has an edge B1  B2 Basic block B1 is a predecessor of B2 Basic block B2 is a successor of B1 MOV 1,R0 MOV n,R1 JMP L2 L1: MUL 2,R0 SUB 1,R1 L2: JMPNZ R1,L1

Partition Algorithm for Basic Blocks Input: A sequence of three-address statements Output: A list of basic blocks with each three-address statement in exactly one block Determine the set of leaders, the first statements if basic blocks The first statement is the leader Any statement that is the target of a goto is a leader Any statement that immediately follows a goto is a leader For each leader, its basic block consist of the leader and all statements up to but not including the next leader or the end of the program

Loops A loop is a collection of basic blocks, such that All blocks in the collection are strongly connected The collection has a unique entry, and the only way to reach a block in the loop is through the entry

Loops (Example) Strongly connected components: SCC={ {B2,B3}, {B4} } MOV 1,R0 MOV n,R1 JMP L2 Strongly connected components: SCC={ {B2,B3}, {B4} } B2: L1: MUL 2,R0 SUB 1,R1 B3: L2: JMPNZ R1,L1 B4: L3: ADD 2,R2 SUB 1,R0 JMPNZ R0,L3 Entries: B3, B4

Equivalence of Basic Blocks Two basic blocks are (semantically) equivalent if they compute the same set of expressions b := 0 t1 := a + b t2 := c * t1 a := t2 a := c * a b := 0 a := c*a b := 0 a := c*a b := 0 Blocks are equivalent, assuming t1 and t2 are dead: no longer used (no longer live)

Transformations on Basic Blocks A code-improving transformation is a code optimization to improve speed or reduce code size Global transformations are performed across basic blocks Local transformations are only performed on single basic blocks Transformations must be safe and preserve the meaning of the code A local transformation is safe if the transformed basic block is guaranteed to be equivalent to its original form

Common-Subexpression Elimination Remove redundant computations a := b + c b := a - d c := b + c d := a - d a := b + c b := a - d c := b + c d := b t1 := b * c t2 := a - t1 t3 := b * c t4 := t2 + t3 t1 := b * c t2 := a - t1 t4 := t2 + t1

Dead Code Elimination Remove unused statements b := a + 1 a := b + c … b := a + 1 … Assuming a is dead (not used) if true goto L2 b := x + y … Remove unreachable code

Renaming Temporary Variables Temporary variables that are dead at the end of a block can be safely renamed t1 := b + c t2 := a - t1 t1 := t1 * d d := t2 + t1 t1 := b + c t2 := a - t1 t3 := t1 * d d := t2 + t3 Normal-form block

Interchange of Statements Independent statements can be reordered t1 := b + c t2 := a - t1 t3 := t1 * d d := t2 + t3 t1 := b + c t3 := t1 * d t2 := a - t1 d := t2 + t3 Note that normal-form blocks permit all statement interchanges that are possible

Algebraic Transformations Change arithmetic operations to transform blocks to algebraic equivalent forms t1 := a - a t2 := b + t1 t3 := 2 * t2 t1 := 0 t2 := b t3 := t2 << 1

Next-Use Next-use information is needed for dead-code elimination and register assignment Next-use is computed by a backward scan of a basic block and performing the following actions on statement i: x := y op z Add liveness/next-use info on x, y, and z to statement i Before going up to the previous statement (scan up): Set x info to “not live” and “no next use” Set y and z info to “live” and the next uses of y and z to i

Next-Use (Step 1) i: b := b + 1 j: a := b + c k: t := a + b [ live(a) = true, live(b) = true, live(t) = true, nextuse(a) = none, nextuse(b) = none, nextuse(t) = none ] Attach current live/next-use information Because info is empty, assume variables are live (Data flow analysis Ch.10 can provide accurate information)

Next-Use (Step 2) Compute live/next-use information at k i: b := b + 1 j: a := b + c live(a) = true nextuse(a) = k live(b) = true nextuse(b) = k live(t) = false nextuse(t) = none k: t := a + b [ live(a) = true, live(b) = true, live(t) = true, nextuse(a) = none, nextuse(b) = none, nextuse(t) = none ] Compute live/next-use information at k

Next-Use (Step 3) Attach current live/next-use information to j i: b := b + 1 j: a := b + c [ live(a) = true, live(b) = true, live(c) = false, nextuse(a) = k, nextuse(b) = k, nextuse(c) = none ] k: t := a + b [ live(a) = true, live(b) = true, live(t) = true, nextuse(a) = none, nextuse(b) = none, nextuse(t) = none ] Attach current live/next-use information to j

Next-Use (Step 4) Compute live/next-use information j live(a) = false nextuse(a) = none live(b) = true nextuse(b) = j live(c) = true nextuse(c) = j live(t) = false nextuse(t) = none i: b := b + 1 j: a := b + c [ live(a) = true, live(b) = true, live(c) = false, nextuse(a) = k, nextuse(b) = k, nextuse(c) = none ] k: t := a + b [ live(a) = true, live(b) = true, live(t) = true, nextuse(a) = none, nextuse(b) = none, nextuse(t) = none ] Compute live/next-use information j

Next-Use (Step 5) Attach current live/next-use information to i i: b := b + 1 [ live(a) = false, live(b) = true, live(c) = true, nextuse(a) = none, nextuse(b) = j, nextuse(c) = j] j: a := b + c [ live(a) = true, live(b) = true, live(c) = false, nextuse(a) = k, nextuse(b) = k, nextuse(c) = none ] k: t := a + b [ live(a) = true, live(b) = true, live(t) = true, nextuse(a) = none, nextuse(b) = none, nextuse(t) = none ] Attach current live/next-use information to i

A Code Generator Generates target code for a sequence of three-address statements using next-use information Uses getreg to assign registers to variables For instruction x := y op z getreg(y, z) returns a location (register) for x Results are kept in registers as long as possible: Result is needed in another computation Register is kept up to a procedure call or end of block Check if operands of three-address code are available in registers

The Code Generation Algorithm For each statement x := y op z Set location L = getreg(y, z) to get register for x If y  L then generate MOV y’,L where y’ denotes one of the locations where the value of y is available (choose register if possible) Generate OP z’,L where z’ is one of the locations of z; Update register/address descriptor of x to include L If y and/or z has no next use and is stored in register, update register descriptors to remove y and/or z

Register and Address Descriptors A register descriptor keeps track of what is currently stored in a register at a particular point in the code, e.g. a local variable, argument, global variable, etc. MOV a,R0 “R0 contains a” An address descriptor keeps track of the location where the current value of the name can be found at run time, e.g. a register, stack location, memory address, etc. MOV a,R0 MOV R0,R1 “a in R0 and R1”

The getreg Algorithm To compute getreg(y, z) If y is stored in a register R and R only holds the value y, and y has no next use, then return R; Update address descriptor: value y no longer in R Else, return a new empty register if available Else, find an occupied register R; Store contents (register spill) by generating MOV R,M for every M in address descriptor of y; Return register R Return a memory location

Code Generation Example Statements Code Generated Register Descriptor Address Descriptor t := a - b u := a - c v := t + u d := v + u live(d)=true all other dead MOV a,R0 SUB b,R0 MOV a,R1 SUB c,R1 ADD R1,R0 ADD R1,R0 MOV R0,d Registers empty R0 contains t R0 contains t R1 contains u R0 contains v R1 contains u R0 contains d t in R0 t in R0 u in R1 u in R1 v in R0 d in R0 d in R0 and memory

Register Allocation and Assignment The getreg algorithm is simple but sub-optimal All live variables in registers are stored (flushed) at the end of a block Global register allocation assigns variables to limited number of available registers and attempts to keep these registers consistent across basic block boundaries Keeping variables in registers in looping code can result in big savings

Allocating Registers in Loops Suppose loading a variable x has a cost of 2 Suppose storing a variable x has a cost of 2 Benefit of allocating a register to a variable x within a loop L is BL ( use(x, B) + 2 live(x, B) ) where use(x, B) is the number of times x is used in B and live(x, B) = true if x is live on exit from B

Global Register Allocation with Graph Coloring When a register is needed but all available registers are in use, the content of one of the used registers must be stored (spilled) to free a register Graph coloring allocates registers and attempts to minimize the cost of spills Build a conflict graph (interference graph) Find a k-coloring for the graph, with k the number of registers

Register Allocation with Graph Coloring: Example a := read(); b := read(); c := read(); a := a + b + c; if (a < 10) { d := c + 8; write(c); } else if (a < 20) { e := 10; d := e + a; write(e); } else { f := 12; d := f + a; write(f); } write(d);

Register Allocation with Graph Coloring: Live Ranges a := read(); b := read(); c := read(); a := a+b+c; a f a b Live range of b b c d e c Interference graph: connected vars have overlapping ranges a < 10 d d := c+8; write(c); a < 20 e e := 10; d := e+a; write(e); f f := 12; d := f+a; write(f); d d write(d);

Register Allocation with Graph Coloring: Solution r2 := read(); r3 := read(); r1 := read(); r2 := r2 + r3 + r1; if (r2 < 10) { r2 := r1 + 8; write(r1); } else if (r2 < 20) { r1 := 10; r2 := r1 + r2; write(r1); } else { r1 := 12; r2 := r1 + r2; write(r1); } write(r2); f a b 1 2 3 d e c 2 1 1 Interference graph Solve Three registers: a = r2 b = r3 c = r1 d = r2 e = r1 f = r1

Peephole Optimization Examines a short sequence of target instructions in a window (peephole) and replaces the instructions by a faster and/or shorter sequence when possible Applied to intermediate code or target code Typical optimizations: Redundant instruction elimination Flow-of-control optimizations Algebraic simplifications Use of machine idioms

Peephole Opt: Eliminating Redundant Loads and Stores Consider MOV R0,a MOV a,R0 The second instruction can be deleted, but only if it is not labeled with a target label Peephole represents sequence of instructions with at most one entry point The first instruction can also be deleted if live(a)=false

Peephole Optimization: Deleting Unreachable Code Unlabeled blocks can be removed if 0==0 goto L2 goto L2 b := x + y … b := x + y …

Peephole Optimization: Branch Chaining Shorten chain of branches by modifying target labels if a==0 goto L2 if a==0 goto L3 b := x + y … b := x + y … L2: goto L3 L2: goto L3

Peephole Optimization: Other Flow-of-Control Optimizations Remove redundant jumps … goto L1 … L1: …

Other Peephole Optimizations Reduction in strength: replace expensive arithmetic operations with cheaper ones Utilize machine idioms Algebraic simplifications … a := x ^ 2 b := y / 8 … a := x * x b := y >> 3 … a := a + 1 … inc a … a := a + 0 b := b * 1 …