Presentation is loading. Please wait.

Presentation is loading. Please wait.

Code Generation Compiler Baojian Hua

Similar presentations


Presentation on theme: "Code Generation Compiler Baojian Hua"— Presentation transcript:

1 Code Generation Compiler Baojian Hua bjhua@ustc.edu.cn

2 Middle and Back End AST translation IR1 asm other IR and translation translation IR2

3 Back-end Structure IR TempMa p instruction selector register allocator Assem instruction scheduler

4 Recap What about “CODE”? CODEDATA Procedures Control Flow Statements Data Access Global Static Variables Global Dynamic Data Local Variables Temporaries Parameter Passing Read-only Data

5 A Simpler Target ISA To simplify the discussion, let ’ s start with a much simpler ISA---a stack machine Stack machines once were very popular in the history but not today, for its low speed but we ’ d like to discuss it for: generating code for stack machine is simpler many (virtual) stack machines are in widely use today Pascal P code Java byte code Postscript …

6 Code Generation for Stack Machines

7 Stack Machine … the stack: Memory StackALU Control Stack-based no registers ALU operates the stack and the memory stack for expression calculation and function call (also called operand stack on JVM)

8 Stack Machine ISA // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control stack operations memory access arithmetic function call and return A subset of the Java virtual machine language (JVML)!

9 Frame and Stack // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control Each function comes with two storages: frame and stack frame: holding arguments, locals and control stack: computation … before: … 3 after: xy … frame: stack

10 ISA Semantics: push // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control push NUM: top++; stack[top] = NUM; … before: … 3 after: xy … frame:

11 ISA Semantics: pop // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control pop x: x = stack[top]; top--; … before: … after: 3 xy … frame: 3

12 ISA Semantics: unwind // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control unwind n: top -= n; … before: … after: v xy … frame: v … v

13 ISA Semantics: load // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control load x: top++; stack[top] = x; … before: … after: x xy … frame:

14 ISA Semantics: store // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control store x: x = stack[top]; top--; … before: … after: xy … frame: v

15 ISA Semantics: add // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control add: temp = stack[top-1] +stack[top]; top -= 2; push temp; … before: … after: xy … frame: 51 6

16 ISA Semantics: sub // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control sub: temp = stack[top-1] -stack[top]; top -= 2; push temp; … before: … after: xy … frame: 51 4

17 ISA Semantics: mult // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret Memory StackALU Control sub: temp = stack[top-1] *stack[top]; top -= 2; push temp; … before: … after: xy … frame: 52 10

18 ISA Semantics: call // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret call f: // create a new frame for f // pop all arguments to f’s // frame … before: … after: xy … frame: 52 mn … frame for f: before(empty):

19 ISA Semantics: ret // ISA syntax s -> push NUM | pop x | unwind n | load x | store x | add | sub | mult | div | call f | ret ret: // pop callee’s value and // push it onto the // caller’s stack top … before: … after: xy … frame: mn … frame for f: … before: … after(empty): … v

20 Extended SLP // Extending SLP with functions: (* is the Kleen // closure) prog -> func* func -> id (x1, …, xn){ s } s -> s; s | x := e | print (es) | return e e -> n | x | e+e | e-e | e*e | e/e | f(es) es-> e, es | \eps

21 Sample Programs main (){ m := 10; n := 5; z := plus (m, n); print (z); } plus (x, y){ t = x+y; return t; }

22 Recursive Decedent Code Generation // Invariant: expression’s value is on stack top gen_s (s1; s2) = gen_s (s1); gen_s (s2); gen_s (x := e) = gen_e (e); “store x” gen_s (print (es)) = gen_es (es); “call print” gen_s (return e) = gen_e (e); “ret” gen_e (n) = “push n” gen_e (x) = “load x” gen_e (e1+e2) = gen_e (e1); gen_e (e2); “add” gen_e (…) // similar for -, *, / gen_e (f(es)) = gen_es(es); “call f” gen_es (e; es) = gen_e (e); gen_es (es)

23 Example main (){ m := 10; n := 5; z := plus (m, n); print (z); } plus (x, y){ t := x+y; return t; } 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret

24 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack(empty) : mn z frame for main: pc

25 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 10 operand stack: mn z frame for main: pc

26 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 10 operand stack: mn z frame for main: pc

27 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n z frame for main: pc

28 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 5 operand stack: m 10 n z frame for main: pc

29 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 5 operand stack: m 10 n z frame for main: pc

30 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc

31 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 10 operand stack: m 10 n5n5 z frame for main: pc

32 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 10 operand stack: m 10 n5n5 z frame for main: pc 5

33 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret 10 operand stack: m 10 n5n5 z frame for main: pc 5 operand stack: xy t frame for plus:

34 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t frame for plus: 10

35 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t frame for plus: 105

36 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t frame for plus: 10515

37 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t frame for plus: 15

38 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t 15 frame for plus: 15

39 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t 15 frame for plus: 15

40 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z frame for main: pc operand stack: x 10 y5y5 t 15 frame for plus: 15

41 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z 15 frame for main: pc operand stack: x 10 y5y5 t 15 frame for plus: 15

42 Example 0: push 10 // <- main 1: store m 2: push 5 3: store n 4: load m 5: load n 6: call plus 7: store z 8: load z 9: call print 10: load x // <- plus 11: load y 12: add 13: store t 14: load t 15: ret operand stack: m 10 n5n5 z 15 frame for main: pc operand stack: x 10 y5y5 t 15 frame for plus: 15

43 Run the Stack machine code Run the code on a real stack machine if one is lucky to buy one … Write an interpreter (virtual machine) just like the JVM Mimic a stack machine on non-stack machines: E.g., use the call stack on x86 as the operand stack and the function frame Or we may create a customized software stack

44 Mimic stack machine on x86 // gen_s as before gen_e (n) = “pushl $n” gen_e (x) = “pushl x” gen_e (e1+e2) = gen_e (e1) gen_e (e2) “addl 0(%esp), 4(%esp)” “addl $4, %esp” correct?

45 Mimic stack machine on x86 // gen_s as before gen_e (n) = “pushl $n” gen_e (x) = “pushl x” gen_e (e1+e2) = gen_e (e1) gen_e (e2) “popl %edx” “addl %edx, 0(%esp)”

46 Better code generation Generating stack machine code for x86 reveals a serious defect: the generated code may be too slow this will be more severe on RISC which does not operate memory directly, so there may be a lot of “ load ” and “ store ” A better idea is to introduce some registers into the stack machine and some more instructions

47 Stack Machine with one Register … the stack: Memory StackALU Control Stack-based but with one register: r r

48 Revised Stack Machine ISA // ISA syntax v -> NUM | x | r s -> push v | pop v | unwind n | load v | store v | add | sub | mult | div | call f | ret | mov v, v // ISA semantics (sample) add: r = stack[top]+r; top--; … 1 before: … after “ add ” : 2 3

49 Recursive Decedent Code Generation (revised) // Invariant: expression value is in register “r” gen_s (s1; s2) = gen_s (s1); gen_s (s2); gen_s (x := e) = gen_e (e); “mov r, x” gen_s (print (es)) = gen_es (es); “call print” gen_s (return e) = gen_e(e); “ret” gen_e (n) = “mov n, r” gen_e (x) = “mov x, r” gen_e (e1+e2) = gen_e (e1) “push r” gen_e (e2) “add” gen_e (…) // similar for -, *, / gen_e (s, e) = gen_s (s); gen_e(e) gen_es (e; es) = gen_e (e); “push r”; gen_es (es)

50 Example main (){ m := 10; n := 5; z := plus (m, n); print (z); } plus (x, y){ t = x+y; return t; } 0: mov 10, r // <- main 1: mov r, m 2: mov 5, r 3: mov r, n 4: load m 5: load n 6: call plus 7: mov r, z 8: load z 9: call print 10: mov x, r // <- plus 11: push r 12: mov y, r 13: add 13: mov r, t 14: load t 15: ret

51 More registers? Can we put all intermediate results in registers? thus do not need a stack for instance, if we have two extra registers: r1 and r2, is the following code generation scheme right? gen_e (e1+e2) = gen_e (e1) “mov r, r1” gen_e (e2) “mov r, r2” “add r1, r2, r”

52 Code Generation for Register-based Machines

53 Register Machine r1 … rn register file: Memory RegisterALU Control Register-based a set of registers some 16, typically 32 ALU operates registers load/store memory registers holding all local variables, arguments, and temporaries

54 Better code generator The decedent recursive code generation is relatively old efficient and easy to implement you ’ ll do this in lab3 Most modern compilers generate code for some register machines (IRs) Next, we discuss a widely-used IR: the 3- address code a register-based IR

55 3-address-code v -> NUM | id s -> x = v1 ⊕ v2 // arith | x = v // move | x[v1] = v2 // store | x = y[v] // load | x = f (v1, …, vn) // call | Cjmp (v1, L1, L2) // conditional | Jmp L // uncond. jump | Label L // label | Return v // return

56 Recursive Decedent Code Generation // Invariant: expression’s value is on stack top gen_s (s1; s2) = gen_s (s1); gen_s (s2); gen_s (x := e) = r = gen_e (e); “x = r” gen_s (print (es)) = (r1, …, rn) = gen_es (es); “print(r1, …, rn)” gen_s (return e) = r = gen_e (e); “ret r” gen_e (n) = “r = n”, r gen_e (x) = “r = x”, r gen_e (e1+e2) = r1 = gen_e (e1); r2 = gen_e (e2); “r3 = r1+r2”, r3 gen_e (…) // similar for -, *, / gen_e (f(es)) = (r1, …, rn) = gen_es(es); “f(r1, …, rn)” gen_es (e; es) = gen_e (e); gen_es (es)

57 Example main (){ m := 10; n := 5; z := plus (m, n); print (z); } plus (x, y){ t = x+y; return t; } 0: r1 = 10 // <- main 1: n = r1 2: r2 = 5 3: n = r2 4: z = plus(m, n) 5: call print(z) 6: r3 = x // <- plus 7: r4 = y 8: r5 = r3+r4 9: t = r5 10: ret t

58 Tree pattern matching Consider this statement: z = x + y = z+ xy movl x, t movl y, s addl s, t movl t, z However, this is not optimal at all! ts t

59 Tree pattern matching Consider this statement: z = x + y = z+ xy movl x, t addl y, t movl t, z

60 Or better Consider this statement: z = x + y = z+ xy movl x, z addl y, z

61 Best tiling? In practice, many different tilings exist We want a tiling with “ minimal cost ” : usually the smallest code size can also take account of cost of instructions, etc. Optimum tiling Optimal tiling

62 Optimal tilings Optimal tiling is easy a simple greedy algorithm well understood algorithm is maximul munch start at the root use “ biggest ” match (in # of tree nodes)

63 Optimum tiling Optimum tiling is hard a dynamic programming problem start from the leaves, bottom up carefully calculate some cost

64 Maximal munch rules (sample) z = x + y movl x, z addl y, z z = x - y movl x, z subl y, z z = x * y movl x, z mult y, z z = x / y movl x, z divl y, z But, one must take into account the machine constraints! What about both y and z are in memory? Multiplication and division make special use of register. Solution: deciding memory layout before code generation! Solution: treat these instructions in an ad-hoc way.

65 Example int f (int x, int y) { int a; int b; int c; int d; a = x + y; b = a + 4; c = b * 2; d = b / 8; return 0; } y: 12(%ebp) x: 8(%ebp) Positions for a, b, c, d can not be decided now. int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl b, %eax cltd idivl $8 movl %eax, d movl $0, %eax leave ret } Prolog Epilog

66 Register allocation After instruction selection, there are still some variables to put as many as possible of them into registers (speed!) and extras in memory (spilling) This requires liveness analysis All these will be discussed later

67 Register Allocation Register allocation determines that: a => ecx b => ecx c => eax d => eax t1 => ecx t2 => eax int f (int x, int y){ int a,b,c,d; int t1, t2; pushl %ebp movl %esp, %ebp movl 8(%ebp), t1 movl 12(%ebp), t2 movl t1, a addl t2, a movl a, b addl $4, b movl b, %eax imult $2 movl %eax, c movl b, %eax cltd idivl $8 movl %eax, d movl $0, %eax leave ret }

68 Rewriting Register allocation determines that: a => ecx b => ecx c => eax d => eax t1 => ecx t2 => eax.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx movl 12(%ebp), %eax movl %ecx, %ecx addl %eax, %ecx movl %ecx, %ecx addl $4, %ecx movl %ecx, %eax imult $2 movl %eax, %eax movl %ecx, %eax cltd idivl $8 movl %eax, %eax movl $0, %eax leave ret

69 Peep-hole Optimization Register allocation determines that: a => ecx b => ecx c => eax d => eax t1 => ecx t2 => eax.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx movl 12(%ebp), %eax movl %ecx, %ecx addl %eax, %ecx movl %ecx, %ecx addl $4, %ecx movl %ecx, %eax imult $2 movl %eax, %eax movl %ecx, %eax cltd idivl $8 movl %eax, %eax movl $0, %eax leave ret

70 After Optimization.globl f f: pushl %ebp movl %esp, %ebp movl 8(%ebp), %ecx movl 12(%ebp), %eax addl %eax, %ecx addl $4, %ecx movl %ecx, %eax imult $2 movl %ecx, %eax cltd idivl $8 movl $0, %eax leave ret int f (int x, int y) { int a; int b; int c; int d; a = x + y; b = a + 4; c = b * 2; d = b / 8; return 0; }


Download ppt "Code Generation Compiler Baojian Hua"

Similar presentations


Ads by Google