Lecture 10: Code Generation 1. You are here 2 Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep.

Lecture 10: Code Generation 1

You are here 2 Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep. (IR) Code Gen. characterstokens AST ( Abstract Syntax Tree) Annotated AST

target languages 3 Absolute machine code Code Gen. Relative machine code Assembly IR + Symbol Table

From IR to ASM: Challenges  mapping IR to ASM operations  what instruction(s) should be used to implement an IR operation?  how do we translate code sequences  call/return of routines  managing activation records  memory allocation  register allocation  optimizations 4

Intel IA-32 Assembly  Going from Assembly to Binary…  Assembling  Linking  AT&T syntax vs. Intel syntax  We will use AT&T syntax  matches GNU assembler (GAS) 5

AT&T versus Intel Syntax AttributeAT&TIntel Parameter orderSource comes before the destination Destination before Parameter Size Mnemonics are suffixed with a letter indicating the size of the operands (e.g., "q" for qword, "l" for dword, "w" for word, and "b" for byte) Derived from the name of the register that is used Immediate value signals Prefixed with a "$", and registers must be prefixed with a "%” The assembler automatically detects the type of symbols; i.e., if they are registers, constants or something else. Effective addresses General syntax DISP(BASE,INDEX,SCALE) Example: movl mem_location(%ebx,%ecx,4), %eax Use variables, and need to be in square brackets; additionally, size keywords like byte, word, or dword have to be used. [1] Example: mov eax, dword [ebx + ecx*4 + mem_location] 6

7 IA-32 Registers  Eight 32-bit general-purpose registers  EAX – accumulator for operands and result data. Used to return value from function calls.  EBX – pointer to data. Often use as array-base address  ECX – counter for string and loop operations  EDX – data/general  ESI – GP and source pointer for string operations  EDI – GP and destination pointer for string operations  EBP – stack frame (base) pointer  ESP – stack pointer  EFLAGS register  EIP (instruction pointer) register  Six 16-bit segment registers  … (ignore the rest for our purposes)

8 Not all registers are born equal  EAX  Required operand of MUL,IMUL,DIV and IDIV instructions  Contains the result of these operations  EDX  Stores remainder of a DIV or IDIV instruction (EAX stores quotient)  ESI, EDI  ESI – required source pointer for string instructions  EDI – required destination pointer for string instructions  Destination Registers of Arithmetic operations  EAX, EBX, ECX, EDX  EBP – stack frame (base) pointer  ESP – stack pointer

Immediate and Register Operands  Immediate  Value specified in the instruction itself  GAS syntax – immediate values preceded by $  add $4, %esp  Register  Register name is used  GAS syntax – register names preceded with %  mov %esp,%ebp 9

Memory and Base Displacement Operands  Memory operands  Value at given address  GAS syntax - parentheses  mov (%eax), %eax  Base displacement  Value at computed address  Address computed out of  base register, index register, scale factor, displacement  offset = base + (index*scale) + displacement  Syntax: disp(base,index,scale)  movl $42, $2(%eax)  movl $42, $1(%eax,%ecx,4) 10

11 Base Displacement Addressing Mov (%ecx,%ebx,4), %eax 7 Array Base Reference 44 0245671 444444 %ecx = base %ebx = 3 offset = base + (index*scale) + displacement offset = base + (3*4) + 0 = base + 12 (%ecx,%ebx,4)

Basic Blocks  An important notion.  Start by breaking the IR into basic blocks  A basic block is a sequence of instructions with  single entry (to first instruction), no jumps to the middle of the block  single exit (last instruction)  code execute as a sequence from first instruction to last instruction without any jumps  edge from one basic block B1 to another block B2 when the last statement of B1 may jump to B2 12

Example 13 False B1B1 B2B2 B3B3 B4B4 True t 1 := 4 * i t 2 := a [ t 1 ] if t 2 <= 20 goto B 3 t 5 := t 2 * t 4 t 6 := prod + t 5 prod := t 6 goto B 4 t 7 := i + 1 i := t 2 Goto B 5 t 3 := 4 * i t 4 := b [ t 3 ] goto B 4

creating basic blocks  Input: A sequence of three-address statements  Output: A list of basic blocks with each three- address statement in exactly one block  Method  Determine the set of leaders (first statement of a block)  The first statement is a leader  Any statement that is the target of a conditional or unconditional jump is a leader  Any statement that immediately follows a goto or conditional jump statement is a leader  For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program 14

control flow graph  A directed graph G=(V,E)  nodes V = basic blocks  edges E = control flow  (B1,B2)  E if control from B1 flows to B2  A loop is a strongly connected component of the graph that has a single entry point.  An inner loop is a loop that has no sub-loop. 15 B1B1 B2B2 t 1 := 4 * i t 2 := a [ t 1 ] t 3 := 4 * i t 4 := b [ t 3 ] t 5 := t 2 * t 4 t 6 := prod + t 5 prod := t 6 t 7 := i + 1 i := t 7 if i <= 20 goto B 2 prod := 0 i := 1

example 1) i = 1 2) j =1 3) t1 = 10*I 4) t2 = t1 + j 5) t3 = 8*t2 6) t4 = t3-88 7) a[t4] = 0.0 8) j = j + 1 9) if j <= 10 goto (3) 10) i=i+1 11) if i <= 10 goto (2) 12) i=1 13) t5=i-1 14) t6=88*t5 15) a[t6]=1.0 16) i=i+1 17) if I <=10 goto (13) 16 i = 1 j = 1 t1 = 10*I t2 = t1 + j t3 = 8*t2 t4 = t3-88 a[t4] = 0.0 j = j + 1 if j <= 10 goto B3 i=i+1 if i <= 10 goto B2 i = 1 t5=i-1 t6=88*t5 a[t6]=1.0 i=i+1 if I <=10 goto B6 B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 for i from 1 to 10 do for j from 1 to 10 do a[i, j] = 0.0; for i from 1 to 10 do a[i, i] = 1.0; sourceIR CFG

Optimizations  Possible to obtain performance improvements by working inside basic block boundaries.  A better accuracy is obtained when considering all blocks in a routine.  It is best to consider all blocks in the program (whole program analysis). But:  Costly  Usually unknown.  Any optimization must start with program “understanding” = program analysis.  An example: to allocate registers efficiently we must analyzing liveness of variables. 17

Variable Liveness  A statement x = y + z  defines x  uses y and z  A variable x is live at a program point if its value is used at a later point  If x is defined in instruction i, used in instr. j, and there is a computation path from i to j that does not modify x, then instr. j is using the value of x that is defined in instr. i. 18 y = 42 z = 73 x = y + z print(x); x is live, y dead, z dead x undef, y live, z live x undef, y live, z undef x is dead, y dead, z dead (showing state after the statement)

Computing Liveness Information  between basic blocks – dataflow analysis (next lecture)  within a single basic block?  idea  use symbol table to record next-use information  scan basic block backwards  update next-use for each variable 19

Computing Liveness Information  INPUT: A basic block B of three-address statements. symbol table initially shows all non-temporary variables in B as being live on exit.  OUTPUT: At each statement i: x = y + z in B, liveness and next-use information of x, y, and z at i.  Algorithm: Start at the last statement in B and scan backwards  At each statement i: x = y + z in B, we do the following: 1. Attach to i the information currently found in the symbol table regarding the next use and liveness of x, y, and z. 2. In the symbol table, set x to "not live" and "no next use.“ 3. In the symbol table, set y and z to "live" and the next uses of y and z to i 20

Computing Liveness Information  Start at the last statement in B and scan backwards  At each statement i: x = y + z in B, we do the following: 1. Attach to i the information currently found in the symbol table regarding the next use and liveness of x, y, and z. 2. In the symbol table, set x to "not live" and "no next use.“ 3. In the symbol table, set y and z to "live" and the next uses of y and z to i 21 can we change the order between 2 and 3? x = 1 y = x + 3 z = x * 3 x = x * z

common-subexpression elimination  common-subexpression elimination  Easily identified by DAG representation. 22 a = b + c b = a - d c = b + c d = a - d a = b + c b = a - d c = b + c d = b

DAG Representation of Basic Blocks 23 a = b + c b = a - d c = b + c d = a - d b0b0 c0c0 + d0d0 - + a b,d c

DAG Representation of Basic Blocks 24 a = b + c b = b - d c = c + d e = b + c b0b0 c0c0 + d0d0 - + a bc + e Also: discover dead code. Perform dead code elimination.

algebraic identities 25 a = x^2 b = x*2 c = x/2 d = 1*x a = x*x b = x+x c = x*0.5 d = x

simple code generation  registers  used as operands of instructions  can be used to store temporary results  can (should) be used as loop indexes due to frequent arithmetic operation  used to manage administrative info (e.g., runtime stack)  number of registers is limited  need to allocate them in a clever way 26

simple code generation  assume machine instructions of the form  LD reg, mem  ST mem, reg  OP reg,reg,reg  further assume that we have all registers available for our use  ignore registers allocated for stack management 27

simple code generation  translate each 3AC instruction separately  A register descriptor keeps track of the variable names whose current value is in that register.  we use only those registers that are available for local use within a basic block, we assume that initially, all register descriptors are empty.  As code generation progresses, each register will hold the value of zero or more names.  For each program variable, an address descriptor keeps track of the location or locations where the current value of that variable can be found.  The location may be a register, a memory address, a stack location, or some set of more than one of these  Information can be stored in the symbol-table entry for that variable 28

simple code generation For each three-address statement x := y op z, 1. Invoke getreg (x := y op z) to select registers R x, R y, and R z. 2. If Ry does not contain y, issue: “LD R y, y’ ”, for a location y’ of y. 3. If Rz does not contain z, issue: “LD R z, z’ ”, for a location z’ of z. 4. Issue the instruction “OP R x,R y,R z ” 5. Update the address descriptors of x, y, z, if necessary.  R x is the only location of x now, and R x contains only x (remove R x from other address descriptors). 29

updating descriptors  1. For the instruction LD R, x a) Change the register descriptor for register R so it holds only x. b) Change the address descriptor for x by adding register R as an additional location.  2. For the instruction ST x, R  change the address descriptor for x to include its own memory location.  3. For an operation such as ADD Rx, Ry, Rz, implementing a 3AC instruction x = y + z a) Change the register descriptor for Rx so that it holds only x. b) Change the address descriptor for x so that its only location is Rx. Note that the memory location for x is not now in the address descriptor for x. c) Remove Rx from the address descriptor of any variable other than x.  4. When we process a copy statement x = y, after generating the load for y into register Ry, if needed, and after managing descriptors as for all load statements (rule 1): a) Add x to the register descriptor for Ry. b) Change the address descriptor for x so that its only location is Ry. 30

example 31 t= A – B u = A- C v = t + u A = D D = v + u A B C D = live outside the block t,u,v = temporaries in local storate R1 R2R3 ABC A BC D D tuv t = A – B LD R1,A LD R2,B SUB R2,R1,R2 At R1 R2R3 A,R1 BC A BC DR2 D tuv u = A – C LD R3,C SUB R1,R1,R3 v = t + u ADD R3,R2,R1 utC R1 R2R3 AB C,R3 A BC DR2R1 D tuv utv R2R3 ABC A BC DR2R1 D tu R3 v

example 32 t= A – B u = A- C v = t + u A = D D = v + u A B C D = live outside the block t,u,v = temporaries in local storate A = D LD R2, D u A,D v R1 R2R3 R2 BC A BC D,R2 R1 D tu R3 v D = v + u ADD R1,R3,R1 exit ST A, R2 ST D, R1 DAv R1 R2R3 R2BC A BC R1 D tu R3 v utv R1 R2R3 ABC A BC DR2R1 D tu R3 v DAv R1 R2R3 A,R2 BC A BC D,R1 D tu R3 v

design of getReg  many design choices  simple rules  If y is currently in a register, pick a register already containing y as Ry. No need to load this register  If y is not in a register, but there is a register that is currently empty, pick one such register as Ry  complicated case  y is not in a register, but there is no free register 33

design of getReg  instruction: x = y + z  y is not in a register, no free register  let R be a taken register holding value of a variable v  possibilities:  if the value v is available somewhere other than R, we can allocate R to be Ry  if v is x, the value computed by the instruction, we can use it as Ry (it is going to be overwritten anyway)  if v is not used later, we can use R as Ry  otherwise: spill the value to memory by ST v,R 34

global register allocation  so far we assumed that register values are written back to memory at the end of every basic block  want to save load/stores by keeping frequently accessed values in registers  e.g., loop counters  idea: compute “weight” for each variable  for each use of v in B prior to any definition of v add 1 point  for each occurrence of v in a following block using v add 2 points, as we save the store/load between blocks  cost(v) =  B use(v,B) + 2*live(v,B)  use(v,B) is is the number of times v is used in B prior to any definition of v  live(v, B) is 1 if v is live on exit from B and is assigned a value in B  after computing weights, allocate registers to the “heaviest” values 35

Example 36 a = b + c d = d - b e = a + f bcdf f = a - d acde cdef b = d + f e = a – c acdf bcdef b = d + c cdef bcdef b,c,d,e,f live B1 B2 B3 B4 acdef cost(a) =  B use(a,B) + 2*live(a,B) = 4 cost(b) = 6 cost(c) = 3 cost(d) = 6 cost(e) = 4 cost(f) = 4 b,d,e,f live

Example 37 LD R3,c ADD R0,R1,R3 SUB R2,R2,R1 LD R3,f ADD R3,R0,R3 ST e, R3 SUB R3,R0,R2 ST f,R3 LD R3,f ADD R1,R2,R3 LD R3,c SUB R3,R0,R3 ST e, R3 LD R3,c ADD R1,R2,R3 B1 B2B3 B4 LD R1,b LD R2,d ST b,R1 ST d,R2 ST b,R1 ST a,R2

Register Allocation by Graph Coloring  Address register allocation by  liveness analysis  reduction to graph coloring  optimizations by program transformation  Main idea  register allocation = coloring of an interference graph  every node is a variable  edge between variables that “interfere” = are both live at the same time  number of colors = number of registers 38

Example 39 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v8v8 time V1V1 V8V8 V2V2 V4V4 V7V7 V6V6 V5V5 V3V3

Example 40 a = read(); b = read(); c = read(); a = a + b + c; if (a<10) { d = c + 8; print(c); } else if (a<2o) { e = 10; d = e + a; print(e); } else { f = 12; d = f + a; print(f); } print(d); a = read(); b = read(); c = read(); a = a + b + c; if (a<10) goto B2 else goto B3 d = c + 8; print(c); if (a<20) goto B4 else goto B5 e = 10; d = e + a; print(e); f = 12; d = f + a; print(f); print(d); B1 B2 B3 B4 B5 B6 b a c d e f d d

Example: Interference Graph 41 fab dec a = read(); b = read(); c = read(); a = a + b + c; if (a<10) goto B2 else goto B3 d = c + 8; print(c); if (a<20) goto B4 else goto B5 e = 10; d = e + a; print(e); f = 12; d = f + a; print(f); print(d); B2 B3 B4 B5 B6 b a c d e f d d

Register Allocation by Graph Coloring  variables that interfere with each other cannot be allocated the same register  graph coloring  classic problem: how to color the nodes of a graph with the lowest possible number of colors  bad news: problem is NP-complete (to even approximate)  good news: there are pretty good heuristic approaches 42

Heuristic Graph Coloring  idea: color nodes one by one, coloring the “easiest” node last  “easiest nodes” are ones that have lowest degree  fewer conflicts  algorithm at high-level  find the least connected node  remove least connected node from the graph  color the reduced graph recursively  re-attach the least connected node 43

44 Heuristic Graph Coloring fab dec fa dec fa de f de stack:  stack: b stack: cb stack: acb

45 f de stack: acb f d stack: eacb f stack: deacbstack: fdeacb f1 stack: deacb f1 d2 stack: eacb f1 d2e1 stack: acb f1a2 d2e1 stack: cb Heuristic Graph Coloring

46 f1a2b3 d2e1c1 f1a2 d2e1c1 f1a2 d2e1 stack:  stack: b stack: cb Heuristic Graph Coloring Result: 3 registers for 6 variables Can we do with 2 registers?

 two sources of non-determinism in the algorithm  choosing which of the (possibly many) nodes of lowest degree should be detached  choosing a free color from the available colors 47 Heuristic Graph Coloring

 The above heuristic gives a coloring of the graph.  But what we really need is to color the graph with a given number of colors = number of available registers.  Many times this is not possible.  (Telling whether it is possible is NP-Hard.)  We’d like to find the maximum sub-graph that can be colored.  Vertices that cannot be colored will represent variables that will not be assigned a register. 48 Heuristic Graph Coloring

Similar Heuristic 1. Iteratively remove any vertex whose degree < k (with all of its edges). 2. Note: no matter how we color the other vertices, this one can be colored legitimately! V1V1 V8V8 V2V2 V4V4 V7V7 V6V6 V5V5 V3V3 4. Now all vertices are of degree >=k (or graph is empty) 5. If graph empty: color the vertices one-by-one as in previous slides. Otherwise, 6. Choose any vertex, remove it from the graph. Implication: this variable will not be assigned a register. Repeat this step until we have a vertex with degree <k and go back to (1).

Similar Heuristic 1. Iteratively remove any vertex whose degree < k (with all of its edges). 2. Note: no matter how we color the other vertices, this one can be colored legitimately! V1V1 V8V8 V2V2 V4V4 V7V7 V6V6 V5V5 V3V3 4. Now all vertices are of degree >=k (or graph is empty) 5. If graph empty: color the vertices one-by-one as in previous slides. Otherwise, 6. Choose any vertex, remove it from the graph. Implication: this variable will not be assigned a register. Repeat this step until we have a vertex with degree <k and go back to (1). Source of non-determinism: choose which vertex to remove in (6). This decision determines the number of spills.

Summary: Code Generation  Depends on the target language and platform.  GNU Assembly  IA-32 platform.  Basic blocks and control flow graph show program executions paths.  Determining variable liveness in a basic block.  useful for many optimizations.  Most important use: register allocation.  Simple code generation.  Better register allocation via graph coloring heuristic. 51

Lecture 10: Code Generation 1. You are here 2 Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep.

Similar presentations

Presentation on theme: "Lecture 10: Code Generation 1. You are here 2 Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 10: Code Generation 1. You are here 2 Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep.

Similar presentations

Presentation on theme: "Lecture 10: Code Generation 1. You are here 2 Executable code exe Source text txt Compiler Lexical Analysis Syntax Analysis Semantic Analysis Inter. Rep."— Presentation transcript:

Similar presentations

About project

Feedback