Intermediate Representation I High-Level to Low-Level IR Translation
- 1 - Where We Are... Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Gen Source code (character stream) token stream abstract syntax tree abstract syntax tree + symbol tables, types Intermediate code regular expressions grammars static semantics
- 2 - Intermediate Representation (aka IR) v The compilers internal representation »Is language-independent and machine- independent ASTIR Pentium Java bytecode Itanium TI C5x ARM optimize Enables machine independent and machine dependent optis
- 3 - What Makes a Good IR? v Captures high-level language constructs »Easy to translate from AST »Supports high-level optimizations v Captures low-level machine features »Easy to translate to assembly »Supports machine-dependent optimizations v Narrow interface: small number of node types (instructions) »Easy to optimize »Easy to retarget
- 4 - Multiple IRs v Most compilers use 2 IRs: »High-level IR (HIR): Language independent but closer to the language »Low-level IR (LIR): Machine independent but closer to the machine »A significant part of the compiler is both language and machine independent! ASTHIR Pentium Java bytecode Itanium TI C5x ARM optimize LIR optimize C++ C Fortran
- 5 - High-Level IR v HIR is essentially the AST »Must be expressive for all input languages v Preserves high-level language constructs »Structured control flow: if, while, for, switch »Variables, expressions, statements, functions v Allows high-level optimizations based on properties of source language »Function inlining, memory dependence analysis, loop transformations
- 6 - Low-Level IR v A set of instructions which emulates an abstract machine (typically RISC) v Has low-level constructs »Unstructured jumps, registers, memory locations v Types of instructions »Arithmetic/logic (a = b OP c), unary operations, data movement (move, load, store), function call/return, branches
- 7 - Alternatives for LIR v 3 general alternatives »Three-address code or quadruples a = b OP c Advantage: Makes compiler analysis/opti easier »Tree representation Was popular for CISC architectures Advantage: Easier to generate machine code »Stack machine Like Java bytecode Advantage: Easier to generate from AST
- 8 - Three-Address Code v a = b OP c »Originally, because instruction had at most 3 addresses or operands This is not enforced today, ie MAC: a = b * c + d »May have fewer operands v Also called quadruples: (a,b,c,OP) v Example a = (b+c) * (-e) t1 = b + c t2 = -e a = t1 * t2 Compiler-generated temporary variable
- 9 - IR Instructions v Assignment instructions »a = b OP C (binary op) arithmetic: ADD, SUB, MUL, DIV, MOD logic: AND, OR, XOR comparisons: EQ, NEQ, LT, GT, LEQ, GEQ »a = OP b (unary op) arithmetic MINUS, logical NEG »a = b : copy instruction »a = [b] : load instruction »[a] = b : store instruction »a = addr b: symbolic address v Flow of control »label L: label instruction »jump L: unconditional jump »cjump a L : conditional jump v Function call »call f(a1,..., an) »a = call f(a1,..., an) v IR describes the instruction set of an abstract machine
IR Operands v The operands in 3-address code can be: »Program variables »Constants or literals »Temporary variables v Temporary variables = new locations »Used to store intermediate values »Needed because 3-address code not as expressive as high-level languages
Class Problem n = 0; while (n < 10) { n = n+1; } Convert the following code segment to assembly code
Translating High IR to Low IR v May have nested language constructs »E.g., while nested within an if statement v Need an algorithmic way to translate »Strategy for each high IR construct »High IR construct sequence of low IR instructions v Solution »Start from the high IR (AST like) representation »Define translation for each node in high IR »Recursively translate nodes
Notation v Use the following notation: »[[e]] = the low IR representation of high IR construct e v [[e]] is a sequence of low IR instructions v If e is an expression (or statement expression), it represents a value »Denoted as: t = [[e]] »Low IR representation of e whose result value is stored in t v For variable v: t = [[v]] is the copy instruction »t = v
Translating Expressions v Binary operations: t = [[e1 OP e2]] »(arithmetic, logical operations and comparisons) v Unary operations: t = [[OP e]] OP e1e2 t1 = [[e1]] t2 = [[e2]] t1 = t1 OP t2 OP e1 t1 = [[e1]] t = OP t1
Translating Array Accesses v Array access: t = [[ v[e] ]] »(type of e is array [T] and S = size of T) t1 = addr v t2 = [[e]] t3 = t2 * S t4 = t1 + t3 t = [t4] /* ie load */ array ve
Translating Structure Accesses v Structure access: t = [[ v.f ]] »(v is of type T, S = offset of f in T) t1 = addr v t2 = t1 + S t = [t2] /* ie load */ struct vf
Translating Short-Circuit OR v Short-circuit OR: t = [[e1 SC-OR e2]] »e.g., || operator in C/C++ t = [[e1]] cjump t Lend t = [[e2]] Lend: semantics: 1. evaluate e1 2. if e1 is true, then done 3. else evaluate e2 SC-OR e1e2
Class Problem v Short-circuit AND: t = [[e1 SC-AND e2]] »e.g., && operator in C/C++ Semantics: 1. Evaluate e1 2. if e1 is true, then evaluate e2 3. else done
Translating Statements v Statement sequence: [[s1; s2;...; sN]] v IR instructions of a statement sequence = concatenation of IR instructions of statements [[ s1 ]] [[ s2 ]]... [[ sN ]] seq s1s2sN...
Assignment Statements v Variable assignment: [[ v = e ]] v Array assignment: [[ v[e1] = e2 ]] v = [[ e ]] t1 = addr v t2 = [[e1]] t3 = t2 * S t4 = t1 + t3 t5 = [[e2] [t4] = t5 /* ie store */ recall S = sizeof(T) where v is array(T)
Translating If-Then [-Else] v [[ if (e) then s ]] v [[ if (e) then s1 else s2 ]] t1 = [[ e ]] t2 = not t1 cjump t2 Lelse Lthen: [[ s1 ]] jump Lend Lelse: [[ s2 ]] Lend: t1 = [[ e ]] t2 = not t1 cjump t2 Lend [[ s ]] Lend: How could I do this more efficiently??
While Statements v [[ while (e) s ]] Lloop: t1 = [[ e ]] t2 = NOT t1 cjump t2 Lend [[ s ]] jump Lloop Lend: or while-do translation do-while translation t1 = [[ e ]] t2 = NOT t1 cjump t2 Lend Lloop: [[ s ]] t3 = [[ e ]] cjump t3 Lloop Lend: Which is better and why?
Switch Statements v [[ switch (e) case v1:s1,..., case vN:sN ]] t = [[ e ]] L1: c = t != v1 cjump c L2 [[ s1 ]] jump Lend /* if there is a break */ L2: c = t != v2 cjump c L3 [[ s2 ]] jump Lend /* if there is a break */... Lend: Can also implement switch as table lookup. Table contains target labels, ie L1, L2, L3. ‘t’ is used to index table. Benefit: k branches reduced to 1. Negative: target of branch hard to figure out in hardware
Call and Return Statements v [[ call f(e1, e2,..., eN) ]] v [[ return e ]] t1 = [[ e1 ]] t2 = [[ e2 ]]... tN = [[ eN ]] call f(t1, t2,..., tN) t = [[ e ]] return t
Nested Expressions v Translation recurses on the expression structure v Example: t = [[ (a – b) * (c + d) ]] t1 = a t2 = b t3 = t1 – t2 t4 = c t5 = d t5 = t4 + t5 t = t3 * t5 [[ (a – b) ]] [[ (c + d) ]] [[ (a-b) * (c+d) ]]
Nested Statements v Same for statements: recursive translation v Example: t = [[ if c then if d then a = b ]] t1 = c t2 = NOT t1 cjump t2 Lend1 t3 = d t4 = NOT t3 cjump t4 Lend2 t3 = b a = t3 Lend2: Lend1: [[ if c... ]] [[ a = b ]] [[ if d... ]]
Class Problem for (i=0; i<100; i++) { A[i] = 0; } if ((a > 0) && (b > 0)) c = 2; else c = 3; Translate the following to the generic assembly code discussed
Issues v These translations are straightforward v But, inefficient: »Lots of temporaries »Lots of labels »Lots of instructions v Can we do this more intelligently? »Should we worry about it?
Intermediate Representation II Storage Allocation and Management
Overview v Program Organization v Memory pools »Static »Automatic »Dynamic v Activation Records
Classes of Storage in Processor v Registers »Fast access, but only a few of them »Address space not visible to programmer Doesn’t support pointer access! v Memory »Slow access, but large »Supports pointers v Storage class for each variable generally determined when map HIR to LIR
Distinct Regions of Memory v Code space – Instructions to be executed »Best if read-only v Static (or Global) – Variables that retain their value over the lifetime of the program v Stack – Variables that is only as long as the block within which they are defined (local) v Heap – Variables that are defined by calls to the system storage allocator (malloc, new)
Virtual Address Space v Traditional Organization »Code Area at the bottom »Static Data above Constants Static strings, variable Global variables »Heap Grows upward »Stack Grows downward »Lot ’ s of free VM in between 0x0 0xffffffff
Class Problem Specify whether each variable is stored in register or memory. For memory which area of the memory? int a; void foo(int b, double c) { int d; struct { int e; char f;} g; int h[10]; char i = 5; float j; }
Zooming In. v Close look on the code area
Execution Stack v A memory area at the top of the VM »Grows downward »Grows on demand (with OS collaboration) v Purpose »Automatic storage for local variables
Overview v Program Organization v Memory pools »Static »Automatic »Dynamic v Activation Records v Parameter Passing Modes v Symbol Table
Memory Pools v Where does memory comes from ? v Three pools »Static »Automatic »Dynamic Static Automatic Dynamic
Static Pool v Content »All the static “ strings ” that appear in the program »All the static constants used in the program »All the global/static variables declared in the program static int static arrays static records static.... v Allocation ? »Well... it is static, i.e., All the sizes are determined at compile time. Cannot grow or shrink Static
Dynamic Pool v Content »Anything allocated by the program at runtime v Allocation »Depends on the language C malloc C++/Java/C#new ML/Lisp/Schemeimplicit v Deallocation »Depends on the language Cfree C++delete Java/C#/ML/Lisp/SchemeGarbage collection Dynamic
Automatic Pool v Content »Local variables »Actuals (arguments to methods/functions/procedures) v Allocation »Automatic when calling a method/function/procedure v Deallocation »Automatic when returning from a method/function/procedure v Management policy »Stack-like
Overview v Program Organization v Memory pools »Static »Automatic »Dynamic v Activation Records
Activation Records Also known as “ Frames ” »A record pushed on the execution stack
Creating the Frame v Three actors »The caller »The CPU »The callee int foo(int x,int y) {... } bar() {... x = foo(3,y);... } int foo(int x,int y) {... } bar() {... x = foo(3,y);... }
Creating the Frame v Three actors »The caller »The CPU »The callee int foo(int x,int y) {... } bar() {... x = foo(3,y);... } int foo(int x,int y) {... } bar() {... x = foo(3,y);... } Actual Function Call
Creating the Frame v Three actors »The caller »The CPU »The callee int foo(int x,int y) {... } bar() {... x = foo(3,y);... } int foo(int x,int y) {... } bar() {... x = foo(3,y);... }
Closeup on management data
Returning From a Call v Easy »The RET instruction simply Access MGMT Area from FP Restores SP Restores FP Transfer control to return address
Returning From a Call v Easy »The RET instruction simply Access MGMT Area from FP Restores SP Restores FP Transfer control to return address
Returning From a Call v Easy »The RET instruction simply Access MGMT Area from FP Restores SP Restores FP Transfer control to return address
Returning From a Call v Easy »The RET instruction simply Access MGMT Area from FP Restores SP Restores FP Transfer control to return address
Stack Frame Construction Example int f(int a) { int b, c; } void g(int a) { int b, c;... b = f(a+c);... } main() { int a, b;... g(a+b);... } a b a + b ret addr to main FP/SP for main b c a + c ret addr to g FP/SP for g b main g f c parameter local var... Note: I have left out the temp part of the stack frame
Class Problem For the following program: int foo(int a) { int x; if (a <= 1) return 1; x = foo(a-1) + foo(a-2); return (x); } main() { int y, z = 10; y = foo(z); } 1. Show the first 3 stack frames created when this program is executed (starting with main). 2. Whats the maximum number of frames the stack grows to during the execution of this program?
Data Layout v Naive layout strategies generally employed »Place the data in the order the programmer declared it! v 2 issues: size, alignment v Size – How many bytes is the data item? »Base types have some fixed size E.g., char, int, float, double »Composite types (structs, unions, arrays) Overall size is sum of the components (not quite!) Calculate an offset for each field
Memory Alignment v Cannot arbitrarily pack variables into memory Need to worry about alignment v Golden rule – Address of a variable is aligned based on the size of the variable »Char is byte aligned (any addr is fine) »Short is halfword aligned »Int is word aligned »This rule is for C/C++, other languages may have a slightly different rules
Structure Alignment (for C) v Each field is layed out in the order it is declared using Golden Rule for aligning v Identify largest field »Starting address of overall struct is aligned based on the largest field »Size of overall struct is a multiple of the largest field »Reason for this is so can have an array of structs
Structure Example struct { char w; int x[3] char y; short z; } Largest field is int (4 bytes) struct size is multiple of 4 struct starting addr is word aligned Struct must start at word-aligned address char w 1 byte, start anywhere x[3] 12 bytes, but must start at word aligned addr, so 3 empty bytes between w and x char y 1byte, start anywher short z 2 bytes, but must start at halfword aligned addr, so 1 empty byte between y and z Total size = 20 bytes!
Class Problem short a[100]; char b; int c; double d; short e; struct { char f; int g[1]; char h[2]; } i; How many bytes of memory does the following sequence of C declarations require (int = 4 bytes) ?