Code Generation Ⅱ CS308 Compiler Theory.

Code Generation Ⅱ CS308 Compiler Theory

A Simple Code Generator
One of the primary issues: deciding how to use registers to best advantage Four principal uses: In most machine architectures, some or all of the operands of an operation must be in registers in order to perform the operation. Registers make good temporaries to hold the result of a sub expression or a variable that is used only within a single basic block. Registers are used to hold (global) values that are computed in one basic block and used in other blocks. Registers are often used to help with run-time storage management. CS308 Compiler Theory

A Simple Code Generator
Assumption of the code-generation algorithm in this section: Some set of registers is available to hold the values that are used within the block. The basic block has already been transformed into a preferred sequence of three-address instructions For each operator, there is exactly one machine instruction that takes the necessary operands in registers and performs that operation, leaving the result in a register CS308 Compiler Theory

Register and Address Descriptors
lec08-memoryorg April 17, 2017 Register and Address Descriptors Descriptors are necessary for variable load and store decision. Register descriptor For each available register Keeping track of the variable names whose current value is in that register Initially, all register descriptors are empty Address descriptor For each program variable Keeping track of the location (s) where the current value of that variable can be found Stored in the symbol-table entry for that variable name. Our code-generation algorithm considers each three-address instruction in turn and decides what loads are necessary to get the needed operands into registers. After generating the loads, it generates the operation itself. Then, if there is a need to store the result into a memory location, it also generates that store. CS308 Compiler Theory

The Code-Generation Algorithm
lec08-memoryorg April 17, 2017 The Code-Generation Algorithm Function getReg(I) Selecting registers for each memory location associated with the three-address instruction I. Machine Instructions for Operations For a three-address instruction such as x = y + z, do the following: 1. Use getReg(x = y + z) to select registers for x, y, and z. Call these Rx, Ry, and Rz . 2 . If y is not in Ry (according to the register descriptor for Ry) , then issue an instruction LD Ry , y' , where y' is one of the memory locations for y (according to the address descriptor for y) . 3. Similarly, if z is not in Rz , issue an instruction LD Rz, z’ , where z’ is a location for z. 4. Issue the instruction ADD Rx , Ry , Rz. Function getReg has access to the register and address descriptors for all the variables of the basic block, and may also have access to certain useful data-flow information such as the variables that are live on exit from the block. In a three-address instruction such as x = y + Z, we shall treat + as a generic operator and ADD as the equivalent machine instruction. We do not, therefore, take advantage of commutativity of +. CS308 Compiler Theory

lec08-memoryorg April 17, 2017 The Code-Generation Algorithm Machine Instructions for Copy Statements For x=y, getReg will always choose the same register for both x and y. If y is not in that register Ry , generate instruction LD Ry , y. If y was in Ry , do nothing. Need to adjust the register description for Ry so that it includes x as one of the values. Ending the Basic Block generate the instruction ST x, R, where R is a register in which x's value exists at the end of the block if x is live on exit from the block. CS308 Compiler Theory

lec08-memoryorg April 17, 2017 The Code-Generation Algorithm Managing Register and Address Descriptors 1 . For the instruction LD R, x (a) Change the register descriptor for register R so it holds only x. (b) Change the address descriptor for x by adding register R as an additional location. 2. For the instruction ST x, R, change the address descriptor for x to include its own location. 3. For an operation such as ADD Rx , Ry , Rz for x = y + z (a) Change the register descriptor for Rx so that it holds only x. (b) Change the address descriptor for x so that its only location is Rx . Note that the memory location for x is not now in the address descriptor for x . (c) Remove Rx from the address descriptor of any variable other than x. 4. When process a copy statement x = y , after generating the load for y into register Ry, if needed, and after managing descriptors as for all load statements (per rule 1 ) : (a) Add x to the register descriptor for Ry . (b) Change the address descriptor for x so that its only location is Ry . x=y+z : x不标注自己是因为x只是临时计算，最后会st 从register到x As the code-generation algorithm issues load, store, and other machine instructions, it needs to update the register and address descriptors. CS308 Compiler Theory

CS308 Compiler Theory

Design of the Function getReg
Pick a register Ry for y in x=y+z 1 . y is currently in a register, pick the register. 2. y is not in a register, but there is an empty register, pick the register. 3. y is not in a register, and there is no empty register. Let R be a candidate register, and suppose v is one of the variables in the register descriptor need to make sure that v's value either is not needed, or that there is somewhere else we can go to get the value of R. (a) OK if the address descriptor for v says that v is somewhere besides R, (b) OK if v is x, and x is not one of the other operands of the instruction(z in this example) (c) OK if v is not used later (d) Generate the store instruction ST v, R to place a copy of v in its own memory location. This operation is called a spill. CS308 Compiler Theory

Design of the Function getReg
Pick a register Rx for x in x=y+z Almost as for y, differences: 1. Since a new value of x is being computed, a register that holds only x is a choice for Rx; 2. If y is not used after the instruction, and Ry holds only y after being loaded, then Ry can be used as Rx; A similar option holds regarding z and Rz · CS308 Compiler Theory

Test yourself Exercise 8.6.1 Exercise 8.6.3 lec08-memoryorg
April 17, 2017 Test yourself Exercise 8.6.1 Exercise 8.6.3 三地址码种类： X = y op z X = op y X = y Goto L If x goto L ifFalse x goto L If x relop y goto L Parm x1;…parm xn; call p, n X=y[i]; x[i]=y X=&y; x=*y; *x=y CS308 Compiler Theory

Peephole Optimization
The peephole is a small, sliding window on a program. Peephole optimization, is done by examining a sliding window of target instructions and replacing instruction sequences within the peephole by a shorter or faster sequence, whenever possible. Peephole optimization can be applied directly after intermediate code generation to improve the intermediate representation. CS308 Compiler Theory

Eliminating Redundant Loads and Stores
lec08-memoryorg April 17, 2017 Eliminating Redundant Loads and Stores LD instruction can be deleted for the sequence of Exception: The store instruction had a label. if the store instruction had a label（有跳转）, we could not be sure that the first instruction is always executed before the second, so we could not remove the store instruction. CS308 Compiler Theory

Eliminating Unreachable Code
An unlabeled instruction immediately following an unconditional jump may be removed. This operation can be repeated to eliminate a sequence of instructions. CS308 Compiler Theory

Flow-of- Control Optimizations
Unnecessary jumps can be eliminated in either the intermediate code or the target code by peephole optimizations. Suppose there is only one jump to L1 CS308 Compiler Theory

Algebraic Simplification and Reduction in Strength
lec08-memoryorg April 17, 2017 Algebraic Simplification and Reduction in Strength Algebraic identities can be used to eliminate three-address statements x = x+0; x=x*1 Reduction-in-strength transformations can be applied to replace expensive operations x2 ; power(x, 2); x*x Fixed-point multiplication or division; shift Floating-point division by a constant can be approximated as multiplication by a constant a=a*3 --> a=a*(2+1) --> a= a<<1 +a 12/3=12*(1/3) 1/3=2/6=2*(1/2-1/3) 一个浮点数a由两个数m和e来表示：a = m × be。在任意一个这样的系统中，我们选择一个基数b（记数系统的基）和精度p（即使用多少位来存储）。m（即尾数）是形如±d.ddd...ddd的p位数（每一位是一个介于 0到b-1之间的整数，包括0和b-1）。如果m的第一位是非0整数，m称作正规化的。有一些描述使用一个单独的符号位（s 代表+或者-）来表示正负，这样m必须是正的。e是指数。 CS308 Compiler Theory

Use of Machine Idioms The target machine may have hardware instructions to implement certain specific operations efficiently. Using these instructions can reduce execution time significantly. Example: some machines have auto-increment and auto-decrement addressing modes. The use of the modes greatly improves the quality of code when pushing or popping a stack as in parameter passing. These modes can also be used in code for statements like x = x + 1 . CS308 Compiler Theory

Register Allocation and Assignment
Efficient utilization of registers is vitally important in generating good code. This section presents various strategies for deciding at each point in a program : what values should reside in registers (register allocation) and in which register each value should reside (register assignment) . CS308 Compiler Theory

Global Register Allocation
A natural approach to global register assignment is to try to keep a frequently used value in a fixed register throughout a loop. One strategy for global register allocation is to assign some fixed number of registers to hold the most active values in each inner loop. CS308 Compiler Theory

lec08-memoryorg April 17, 2017 Usage Counts Keeping a variable x in a register for the duration of a loop L Save one unit for each use of x save two units if we can avoid a store of x at the end of a block. An approximate formula for the benefit to be realized from allocating a register x within loop L is 代价：如果x在寄存器中，比x在内存，访问x节省1；如果避免存储x到内存，则省2 Use：在定义前-因为通常，定义之后，将x放在寄存器；但如果固定分配，即使在定义前，也不用读内存，所以节省 Live：省了保存到内存 user(x,B): 定义后，寄存器中的值不能再用了，无效了 X必须在block中被赋值，并活跃的，才能省2，不用store和读取注意，公式是近似式，我们忽略了以下两个因素。 (1)如果M在循环入口之前是活跃的，并且在循环中给M固定分配一个寄存器，那么，在循环入口时，我们要先把它的值从主存单元取到寄存器，其执行代价为2。另外，假设B是循环出口基本块，C是B在循环外的后继基本块。如果在C的入口之前，M是活跃变量，那么，在循环出口时，我们需要把M的当前值从寄存器中存放到它的主存单元中，其执行代价又是2。由于这两处的执行代价，在整个循环中只要计算一次，这与公式每循环一次，就要计算一次相比，它可以忽略不计。（2）由于每循环一次，各个基本块不一定都会执行到，而且每一次循环，执行到的基本块还可能不相同。在公式的计算中，把上述因素也忽略了，而是看做每循环一次，各个基本块都要执行一次。 where use(x, B) is the number of times x is used in B prior to any definition of x, live(x, B) is 1 if x is live on exit from B and is assigned a value in B, and live(x, B) is 0 otherwise. CS308 Compiler Theory

Test yourself Usage counts of the variables lec08-memoryorg
April 17, 2017 Test yourself Usage counts of the variables 对b在B1中，有些书中b为5，有些为6，区别：use(b,B1)=1 or 2，对于2：严格按照use定义；对于1：对于没有固定给b寄存器，则第一次b load进寄存器，第二次用，不用load，所以只节省1 但对于第一二条指令，如果中间夹了另外指令，不含有b，则不应该为1，因为寄存器可能要被那条指令用，第二次b仍然要load CS308 Compiler Theory

Discuss: Register Assignment for Outer Loops
lec08-memoryorg April 17, 2017 Discuss: Register Assignment for Outer Loops If an outer loop L1 contains an inner loop L2 , the names allocated registers in L2 need not be allocated registers in L1 - L2 . Similarly, if we choose to allocate x a register in L2 but not L1 , we must load x on entrance to L2 and store x on exit from L2 . CS308 Compiler Theory

Register Allocation by Graph Coloring
lec08-memoryorg April 17, 2017 Register Allocation by Graph Coloring Graph coloring is a systematic technique for allocating registers and managing register spills. Two steps: (1) Target-machine instructions are selected as though there are an infinite number of symbolic registers; (2) Construct register-interference graph, and color the register- interference graph using k colors, where k is the number of assignable registers. Note that whether a graph is k-colorable is NP-complete. (1) names used in the intermediate code become names of registers and the three-address instructions become machine-language instructions. If access to variables requires instructions that use stack pointers, display pointers, base registers, or other quantities that assist access, then we assume that these quantities are held in registers reserved for each purpose. Normally, their use is directly translatable into an access mode for an address mentioned in a machine instruction. If access is more complex, the access must be broken into several machine instructions, and a temporary symbolic register (or several) may need to be created. (2) the cost of spills.（the contents of registers are stored ( spilled) into a memory location in order to free up a register.） In the second pass, for each procedure a register-interference graph is constructed in which the nodes are symbolic registers and an edge connects two nodes if one is live at a point where the other is defined. 实际上，node表示变量的生存期，即定义经过活跃到无效的过程；如果两个变量具有交叉的生存期，不能使用同一个R。 An attempt is made to color the register-interference graph using k colors, where k is the number of assignable registers. A graph is said to be colored if each node has been assigned a color in such a way that no two adjacent nodes have the same color. A color represents a register, and the color makes sure that no two symbolic registers that can interfere with each other are assigned the same physical register. CS308 Compiler Theory

Register Allocation by Graph Coloring
Heuristic technique: Suppose a node n in a graph G has fewer than k neighbors. Remove n and its edges from G to obtain a graph G' . A k-coloring of G' can be extended to a k-coloring of G by assigning n a color not assigned to any of its neighbors. By repeatedly eliminating nodes having fewer than k edges from G, 1) either we obtain the empty graph, in which case we can produce a k-coloring for G 2) or we obtain a graph in which each node has k or more adjacent nodes. Then a k-coloring is no longer possible. (At this point a node is spilled by introducing code to store and reload the register) CS308 Compiler Theory

Test yourself 8.8.1 CS308 Compiler Theory

Instruction Selection by Tree Rewriting
selecting target-language instructions to implement the operators in the intermediate representation a large combinatorial task, especially for CISC machines In this section, we treat instruction selection as a tree-rewriting problem. CS308 Compiler Theory

Tree-Translation Schemes
lec08-memoryorg April 17, 2017 Tree-Translation Schemes Example: a tree for the assignment statement a [i] = b + 1 , where the array a is stored on the run-time stack and the variable b is a global in memory location Mb . Rsp：寄存器sp的值 Throughout this section, the input to the code-generation process will be a sequence of trees at the semantic level of the target machine. The trees are what we might get after inserting run-time addresses into the intermediate representation. In addition, the leaves of the trees contain information about the storage types of their labels. the ind operator treats its argument as a memory address. CS308 Compiler Theory

The target code is generated by applying a sequence of tree-rewriting rules to reduce the input tree to a single node. Each tree-rewriting rule has the form where replacement is a single node, template is a tree, and action is a code fragment, as in a syntax-directed translation scheme. Example: CS308 Compiler Theory

Code Generation by Tiling an Input Tree
What if we use the tree-translation scheme above on the tree CS308 Compiler Theory

Code Generation by Tiling an Input Tree
To implement the tree-reduction process, we must address some issues related to tree-pattern matching: How is tree-pattern matching to be done? What do we do if more than one template matches at a given time? CS308 Compiler Theory

Pattern Matching by Parsing
lec08-memoryorg April 17, 2017 Pattern Matching by Parsing Uses an LR parser to do the pattern matching The input tree can be treated as a string by using its prefix representation. 树前序遍历 CS308 Compiler Theory

Pattern Matching by Parsing
lec08-memoryorg April 17, 2017 Pattern Matching by Parsing The tree-translation scheme can be converted into a syntax-directed translation scheme From the productions of the translation scheme we build an LR parser using one of the LR-parser construction techniques. The target code is generated by emitting the machine instruction corresponding to each reduction. CS308 Compiler Theory

Routines for Semantic Checking
lec08-memoryorg April 17, 2017 Routines for Semantic Checking Restrictions on Attribute value Generic templates can be used to represent classes of instructions and the semantic actions can then be used to pick instructions for specific cases. Parsing-action conflicts can be resolved by disambiguating predicates that can allow different selection strategies to be used in different contexts. 消除二义性 CS308 Compiler Theory

lec08-memoryorg April 17, 2017 General Tree Matching The LR-parsing approach to pattern matching based on prefix representations favors the left operand of a binary operator. Postfix representation an LR-parsing approach to pattern matching would favor the right operand. Hand-written code generator an ad-hoc matcher can be written. Code-generator generator needing a general tree-matching algorithm. An efficient top-down algorithm can be developed by extending the string pattern-matching techniques In a prefix representation op El E2 , the limited-Iookahead LR parsing decisions must be made on the basis of some prefix of E1 , since El can be arbitrarily long. Thus, pattern matching can miss nuances of the target-instruction set that are due to right operands. CS308 Compiler Theory

Test yourself Exercise b) Exercise 8.9.2 CS308 Compiler Theory

Optimal Code Generation for Expressions
Objective: generate optimal code for an expression tree when there is a fixed number of registers with which to evaluate the expression. CS308 Compiler Theory

lec08-memoryorg April 17, 2017 Ershov Numbers Rules of assigning to the nodes of an expression tree a number The number tells how many registers are needed to evaluate that node without storing any temporaries. 1. Label any leaf l . 2 . The label of an interior node with one child is the label of its child. 3. The label of an interior node with two children is (a) The larger of the labels of its children, if those labels are different. (b) One plus the label of its children if the labels are the same. 节点上的label是表达式需要的最小的寄存器数量 CS308 Compiler Theory

Ershov Numbers Example: (a - b) + e * (c + d) CS308 Compiler Theory

Generating Code From Labeled Expression Trees
lec08-memoryorg April 17, 2017 Generating Code From Labeled Expression Trees Algorithm : Generating code from a labeled expression tree. INPUT: A labeled tree with each operand appearing once (no common sub expressions ) . OUTPUT : An optimal sequence of machine instructions to evaluate the root into a register. METHOD: Start at the root of the tree. If the algorithm is applied to a node with label k, then only k registers will be used. However, there is a "base" b >=1 for the registers used so that the actual registers used are Rb, Rb+l , Rb+k-l. 1 . To generate machine code for an interior node with label k and two children with equal labels do the following: (a) Recursively generate code for the right child, using base=b The result of the right child appears in register Rb+k . (b) Recursively generate code for the left child, using base b; the result appears in Rb+k-l (c) Generate the instruction OP Rb+k , Rb+k-l , Rb+k , where OP is the appropriate operation for the interior node in question. It can be proved that, in our machine model, where all operands must be in registers, and registers can be used by both an operand and the result of an operation, the label of a node is the fewest registers with which the expression can be evaluated using no stores of temporary results. CS308 Compiler Theory

Generating Code From Labeled Expression Trees
lec08-memoryorg April 17, 2017 Generating Code From Labeled Expression Trees Algorithm : Generating code from a labeled expression tree. (cont.) 2. Suppose we have an interior node with label k and children with unequal labels. Then one of the children has label k, and the other has label m < k. Do the following with using base b: (a) Recursively generate code for the big child, using base b; the result appears in register Rb+k-l . (b) Recursively generate code for the small child, using base b; the result appears in register Rb+m-l . Note that since m < k, neither Rb+k-l nor any higher-numbered register is used. (c) Generate the instruction OP Rb+k-l , Rb+m-l , Rb+k-l or the instruction OP Rb+k-1, Rb+k-l , Rb+m-l , depending on whether the big child is the right or left child. 3 . For a leaf representing operand x, if the base is b generate the instruction LD Rb , x . It can be proved that, in our machine model, where all operands must be in registers, and registers can be used by both an operand and the result of an operation, the label of a node is the fewest registers with which the expression can be evaluated using no stores of temporary results. CS308 Compiler Theory

Example (a - b) + e * (c + d) Code for t2:
Complete sequence of instructions The label of the root is 3, the result will appear in R3, and only R1, R2, and R3 will be used. The base for the root is b = 1. CS308 Compiler Theory

Evaluating Expressions with an Insufficient Supply of Registers
Algorithm: Generating code from a labeled expression tree. INPUT: A labeled tree and a number of registers r>=2. OUTPUT: An optimal sequence of machine instructions, using no more than r registers. METHOD: Start at the root of the tree, with base b = 1. For a node N with label r or less, the algorithm is exactly the same as the above Algorithm. For an interior node N labeled k > r: 1 . N has at least one child with label r or greater. Pick the larger child to be the "big" child and let the other child be the "little" child. 2. For the big child, use base b = 1. The result of this evaluation will appear in register Rr. 3. Generate the machine instruction ST tk , Rr , where tk is a temporary variable . 4. For the little child: If it has label r or greater, pick base b = 1. If its label is j < r, b = r - j. Then recursively apply this algorithm to the little child; the result appears in Rr . 5. Generate the instruction LD Rr-l , tk . 6. If the big child is the right child of N, then generate the instruction OP Rr , Rr , Rr-1. If the big child is the left child, generate OP Rr, Rr-1 , Rr. CS308 Compiler Theory

Example For t3, using the original algorithm, and the output is
Final output: We then need both registers for the left child of the root, we need to generate the instruction CS308 Compiler Theory

Test yourself Exercise 8.10.1 a) 课堂练习 Exercise 8.10.3 Exercise 8.10.2

Dynamic Programming Code-Generation
The dynamic programming algorithm can be used to generate code for any machine with r interchangeable registers and load, store, and add instructions. CS308 Compiler Theory

Contiguous Evaluation
lec08-memoryorg April 17, 2017 Contiguous Evaluation The dynamic programming algorithm partitions the problem of generating optimal code for an expression into the sub-problems of generating optimal code for the sub expressions of the given expression. Contiguous evaluation: Complete the evaluations of T1, T2, then evaluate root Noncontiguous evaluation: First evaluate part of T1 leaving the value in a register, next evaluate T2, then return to evaluate the rest of T1 Dynamic programming algorithm uses contiguous evaluation. CS308 Compiler Theory

Contiguous Evaluation
For the register machine in this section, we can prove that given any machine-language program P to evaluate an expression tree T, we can find an equivalent program p’ such that 1 . P’ is of no higher cost than P, 2 . P’ uses no more registers than P, and 3. p’ evaluates the tree contiguously. This implies that every expression tree can be evaluated optimally by a contiguous program. CS308 Compiler Theory

The Dynamic Programming Algorithm
The dynamic programming algorithm proceeds in three phases (suppose the target machine has r registers) 1. Compute bottom-up for each node n of the expression tree T an array C of costs, in which the ith component C[i] is the optimal cost of computing the subtree S rooted at n into a register, assuming i registers are available for the computation, for 1<=i<=r. 2. Traverse T, using the cost vectors to determine which subtrees of T must be computed into memory. 3. Traverse each tree using the cost vectors and associated instructions to generate the final target code. The code for the subtrees computed into memory locations is generated first. CS308 Compiler Theory

Example ( a-b) +c* ( d/e ) Final output: lec08-memoryorg
April 17, 2017 Example ( a-b) +c* ( d/e ) Final output: C[O] , the cost of computing a into memory, is 0 since it is already there. C[l] , the cost of computing a into a register, is 1 since we can load it into a register with the instruction LD RO , a. C[2], the cost of loading a into a register with two registers available, is the same as that with one register available. 对于c【0】是比较特殊的，它的子树用寄存器没有限制。而对于c【1】，则子树最多用1个寄存器。 /: (3,2,2) : 将计算值放入内存最少需要的代价：由于至少需要一个寄存器，假设1个寄存器，因此cost=0+1+1（op R M）+1 （st M R） 2：1个寄存器的最小开销=0+1+1（op），不用保存 2：2个寄存器和1个相同 *: (5,5,4） ----5：计算到内存的最小开销：c一个寄存器，/一个寄存器，开销为3，×，以及保存共2，因此5 5: 一个寄存器：/ 子树用一个寄存器，结果在R中，代价2， * R，R, M 需要将/子树寄存器结果存入M，并load c，代价2，计算代价1，因此 5 4：二个：2+1+1 +：（8，8，7）---8：计算到内存： =8； Compute the left subtree with two registers available into register RO ,compute the right subtree with one register available into register R l , and use the instruction ADD RO , RO , Rl to compute the root. This sequence has cost = 8. Compute the right subtree with two registers available into R l , compute the left subtree with one register available into RO, and use the instruction ADD RO , RO , R 1 . This sequence has cost = 7. Compute the right subtree into memory location M , compute the left subtree with two registers available into register: RO, and use the instruction ADD RO , RO , M. This sequence has cost = 8. CS308 Compiler Theory

Test yourself 课堂练习 CS308 Compiler Theory

Code Generation Ⅱ CS308 Compiler Theory.

Similar presentations

Presentation on theme: "Code Generation Ⅱ CS308 Compiler Theory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Code Generation Ⅱ CS308 Compiler Theory.

Similar presentations

Presentation on theme: "Code Generation Ⅱ CS308 Compiler Theory."— Presentation transcript:

Similar presentations

About project

Feedback