Presentation on theme: "Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate."— Presentation transcript:
Chapter 8 ICS 412
Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate some form of assembly code that must be processed further by an assembler, a linker, and a loader.
Intermediate Code An intermediate representation that looks like target code is called intermediate code. Intermediate code can take many forms. Two popular forms are: 1.Three-Address code 2.P-Code
Intermediate Code Intermediate code is particularly useful when the goal of the compiler is to produce efficient code. –The analysis of the properties of the target code can be easily generated from intermediate code. Intermediate code can also be useful in making the compiler target machine independent. –To generate code for a different target machine we only need to write a translator from the intermediate code to the target code.
Three-Address Code The three address code has the following general form: x = y op z The name "three-address code" comes from this form of instruction. –x, y, and z represents an address in memory.
Example Consider the arithmetic expression: 2 * a + ( b – 3 ) The corresponding three-address code is: t1 = 2 * a t2 = b – 3 t3 = t1 + t2
Three-Address Code t1, t2, and t3 are temporaries correspond to the interior nodes of the syntax tree and represent their computed values, with the final temporary (t3, in this example) representing the value of the root. + *- 2ab3
Three-Address Code The above three-address code represents a left-to-right linearization of the syntax tree. Another order is possible for this three- address code, namely (with a different meaning for the temporaries), t1 = b - 3 t2 = 2 * a t3 = t2 + t1
Three-Address Code One form of three-address code is insufficient to represent all language features. For instance, unary operators uses a two- addresses variation of the three-address code t2 = - t1 It is necessary to vary the form of three-address code to represent all the programming constructs.
Factorial Example read x; if 0 < x then fact := 1; repeat fact := fact * x; x := x – 1 until x = 0; write fact end read x t1 = x > 0 if_false t1 goto L1 fact = 1 label L2 t2 = fact * x fact = t2 t3 = x – 1 x = t3 t4 = x == 0 if_false t4 goto L2 write fact label L1 halt
Factorial Example This code contains a number of different forms of three-address code: –Built-in input/output operations read and write have been translated directly into one-address instructions. –Conditional jump instruction if_false that is used to translate both if-stmt and repeat-stmt ant that contains two addresses. –One address label instruction used to represent the position of the jump. –Halt instruction to represent the end of the code.
P-Code P-code began as a standard target assembly code produced by a number of Pascal compilers of the 1970s and early 1980s. It was designed to be the actual code for a hypothetical stack machine, called the P- machine, for which an interpreter was written on various actual machines. The idea was to make Pascal compilers easily portable by requiring only that the P-machine interpreter be rewritten for a new platform.
P-Code The P-machine consists of: –A code memory. –An unspecified data memory for named variables. –A stack for temporary data, together with whatever registers are needed to maintain the stack and support.
Example 1 Consider the expression: 2*a+(b-3) P-code for this expression is as follows: ldc 2 ; load constant 2 lod a; load value of variable a mpi; integer multiplication lod b; load value of variable b ldc 3; load constant 3 sbi; integer subtraction adi ; integer addition
Example 1 These instructions are to be viewed as representing the following P-machine operations: –ldc 2 pushes the value 2 onto the temporary stack. –lod a pushes the value of the variable a onto the stack. –mpi pops these two values from the stack, multiplies them and pushes the result onto the stack. –lod b and ldc 3 push the value of b and the constant 3 onto the stack (there are now three values on the stack). –sbi pops the top two values from the stack, subtracts them, and pushes the result. –adi pops the remaining two values from the stack, adds them, and pushes the result. –The code ends with a single value on the stack, representing the result of the computation.
Example 2 Consider the assignment statement: x = y +1 The corresponding P-Code is: lda x ; load address of x lod y; load value of y ldc 1; load constant 1 adi; add sto ; store top to address ; below top & pop both.
Example 3 read x; if 0 < x then fact := 1; repeat fact := fact * x; x := x – 1 until x = 0; write fact end
P-Code lda x ; load address of x rdi; read an integer store to address on top of stack (& pop it) lod x; load the value of x ldc 0; load constant 0 grt; pop and compare top two values, push Boolean result fjp Ll; pop Boolean value, jump to Ll if false lda fact; load address of fact ldc 1; load constant 1 sto; pop two values, storing first to address of second lab L2; definition of label L2 lda fact; load address of fact lod fact; load value of fact lod x; load value of x mpi; multiply
P-Code sto ; store top to address of second & pop lda x; load address of x lod x; load value of x ldc 1; load constant 1 sbi; subtract sto; store (as before) lod x; load value of x ldc 0; load constant 0 equ; test for equality fjp L2; jump to L2 if false lod fact; load value of fact wri; write top of stack & pop lab Ll; definition of label Ll stp
P-Code vs. Three-Address Code P-code –Closer to machine code. –Fewer addresses (1 or 0). –Stack automatically handles temps, so compiler does not need to generate name/locations. Three-Address –fewer instructions –More complex instructions, so less code to generate.