Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson 2005 2 The Problem Given an expression tree and a machine architecture, generate.

Slides:



Advertisements
Similar presentations
Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
Advertisements

Target Code Generation
Instruction Set Design
Lecture 13: 10/8/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Goal: Write Programs in Assembly
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1 Code generation Our book's target machine (appendix A): opcode source1, source2, destination add r1, r2, r3 addI r1, c, r2 loadI c, r2 load r1, r2 loadAI.
1 CS 201 Compiler Construction Machine Code Generation.
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 2: IT Students.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Systems Architecture Lecture 5: MIPS Instruction Set
Chapter 2 Instructions: Language of the Computer
Chapter 2.
COMPILERS Basic Blocks and Traces hussein suleman uct csc3005h 2006.
1 Registers and MAL - Part I. Motivation So far there are some details that we have ignored instructions can have different formats most computers have.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Lecture 23 Basic Blocks Topics Code Generation Readings: 9 April 17, 2006 CSCE 531 Compiler Construction.
CS 536 Spring Code generation I Lecture 20.
Code Generation Simple Register Allocation Mooly Sagiv html:// Chapter
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Arithmetic Expression Consider the expression arithmetic expression: (a – b) + ((c + d) + (e * f)) that can be represented as the following tree.
Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.
Code Generation Mooly Sagiv html:// Chapter 4.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)
Code Generation Mooly Sagiv html:// Chapter 4.
Lecture 17 Today’s Lecture –Instruction formats Little versus big endian Internal storage in the CPU: stacks vs. registers Number of operands and instruction.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Instruction Selection II CS 671 February 26, 2008.
Compiler Construction
Compiler Chapter# 5 Intermediate code generation.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Assembly Language A Brief Introduction. Unit Learning Goals CPU architecture. Basic Assembler Commands High level Programming  Assembler  Machine Language.
Computer Organization and Architecture Instructions: Language of the Machine Hennessy Patterson 2/E chapter 3. Notes are available with photocopier 24.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
Compilers Modern Compiler Design
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
Intermediate code generation. Code Generation Create linear representation of program Result can be machine code, assembly code, code for an abstract.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 10 Ahmed Ezzat.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Chapter 8 Code Generation
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
COMPILERS Instruction Selection
Immediate Addressing Mode
Optimization Code Optimization ©SoftMoore Consulting.
Compiler Construction
Computer Architecture
Computer Programming Machine and Assembly.
Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*
Systems Architecture Lecture 5: MIPS Instruction Set
Unit IV Code Generation
Instruction Selection, II Tree-pattern matching
Classification of instructions
Computer Architecture
Local Optimizations.
Addressing mode summary
COMS 361 Computer Organization
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*
Presentation transcript:

Code Generation Steve Johnson

May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate a set of instructions that evaluate the tree –Initially, consider only trees (no common subexpressions) –Interested in the quality of the program –Interested in the running time of the algorithm

May 23, 2005Copyright (c) Stephen C. Johnson The Solution Over a large class of machine architectures, we can generate optimal programs in linear time –A very practical algorithm –But different from the way most compilers work today –And the technique, dynamic programming, is powerful and interesting Work done with Al Aho, published in JACM

May 23, 2005Copyright (c) Stephen C. Johnson What is an Expression Tree? Nodes represent –Operators (including assignment) –Operands (memory, registers, constants) No flow of control operations A = + BC

May 23, 2005Copyright (c) Stephen C. Johnson Representing Operands In fact, we want the tree to represent where the operands are found MEM (A) = + MEM (B) MEM (C)

May 23, 2005Copyright (c) Stephen C. Johnson Possible Programs load B,r1; load C,r2; add r1,r2,r1 store r1,A or load B,r1 add C,r1 store r1,A or add B,C,A

May 23, 2005Copyright (c) Stephen C. Johnson (Assembler Notation) Data always moves left to right load B,r1 r1 = MEM(B) add r1,r2,r3 r3 = r1 + r2 store r1,A MEM(A) = r1

May 23, 2005Copyright (c) Stephen C. Johnson Which is Better? Not all sequences legal on all machines Longer sequences may be faster Situation gets more complex when –Complicated expressions run out of registers –Some operations (e.g., call) take a lot of registers –Instructions have complicated addressing modes

May 23, 2005Copyright (c) Stephen C. Johnson Example Code A = 5*B + asin(C/2 + sin(D)) might generate (machine with 2 registers) load B,r1 OR load D,r1 mul r1,#5,r1 call sin store r1,T1 load C,r2 load C,r1 div r2,#2,r2 div r1,#2,r1 add r2,r1,r1 store r1,T2 call asin load D,r1 load B,r2 call sin mul r2,#5,r2 load T2,r2 add r1,r2,r1 add r2,r1 store r1,A call asin load T1,r2 add r2,r1,r1 store r1,A

May 23, 2005Copyright (c) Stephen C. Johnson What is an Instruction An instruction is a tree transformation MEM (A) REG (r1) load A,r1 REG (r1) MEM (A) store r1,A * REG (r2) load (r1),r2 REG (r1)

May 23, 2005Copyright (c) Stephen C. Johnson These can be Quite Complicated * + REG (r1) REG (r2) INT 2 << REG (r3) load r1(r2),r3

May 23, 2005Copyright (c) Stephen C. Johnson Types and Resources Expression Trees (and instructions) typically have types associated with them –We’ll ignore this –Doesn’t introduce any real problems Instructions often need resources to work –For example, a temporary register or a temporary storage location –Will be discussed later

May 23, 2005Copyright (c) Stephen C. Johnson Programs A program is a sequence of instructions A program computes an expression tree if it transforms the tree according to the desired goal –Compute the tree into a register –Compute the tree into memory –Compute the tree for its side-effects Condition codes Assignments

May 23, 2005Copyright (c) Stephen C. Johnson Example Goal: compute for side effects MEM (A) = + MEM (B) MEM (C) load B,r1 load C,r2 add r1,r2,r1 store A,r1

May 23, 2005Copyright (c) Stephen C. Johnson Example (cont.) MEM (A) = + REG (r1) MEM (C) load C,r2 MEM (A) = + REG (r1) REG (r2)

May 23, 2005Copyright (c) Stephen C. Johnson Example (cont.) MEM (A) = + REG (r1) REG (r2) add r1,r2,r1 MEM (A) = REG (r1)

May 23, 2005Copyright (c) Stephen C. Johnson Example (concl.) store r1,A MEM (A) = REG (r1) (Side effect done)

May 23, 2005Copyright (c) Stephen C. Johnson Typical Code Generation Some variables are assigned to registers, leaving a certain number of scratch registers An expression tree is walked, producing instructions (greedy algorithm...). An infinite number of temporary registers is assumed

May 23, 2005Copyright (c) Stephen C. Johnson Typical Code Generation (cont.) A register allocation phase is run –Assign temporary registers to scratch register Often by doing graph coloring... –If you run out of scratch registers, spill Select a register Store it into a temporary When it is needed again, reload it

May 23, 2005Copyright (c) Stephen C. Johnson Practical Observation Many (most?) code generation bugs happen in this spill code –Choose a register that is really needed –Very hard to test... Create test cases that just barely fit or just barely don’t fit to test edge cases... Can be quite inefficient –“thrashing” of scratch registers Code may not be optimal

May 23, 2005Copyright (c) Stephen C. Johnson Complexity Results Simple machine with 2-address instructions: – r1 op r2 => r1 Cost = number of instructions Allow common subexpressions only of the form A op B, where A and B are leaf nodes Generating optimal code is N-P complete Even if there are an infinite number of registers! Implies exponential time for a tree with n nodes

May 23, 2005Copyright (c) Stephen C. Johnson Complexity Results (cont.) Simple 3-address machine –r1 op r2 => r3 Cost = number of instructions Allow arbitrary common subexpressions Infinite number of registers Can get optimal code in linear time –Topological sort –Each node in a different register

May 23, 2005Copyright (c) Stephen C. Johnson Complexity Results (cont.) In the 3-address model, finding optimal code that uses the minimal number of registers is N-P complete But that’s not what we are faced with in practice –We have a certain number of registers –We need to use them intelligently

May 23, 2005Copyright (c) Stephen C. Johnson Complexity Results (concl.) For many practical machine architectures (including 2-address machines), we can generate optimal code in linear time when there are no common subexpressions (tree) Can be extended to an algorithm exponential in the amount of sharing The optimal instruction sequence is not generated by a simple tree walk

May 23, 2005Copyright (c) Stephen C. Johnson Machine Model Restrictions Resources (temporary registers) must be interchangeable. We will assume that we have N of them Every instruction has a (positive) cost The cost of a program is the sum of the costs of the instructions No other constraints on the instruction shape or format (!)

May 23, 2005Copyright (c) Stephen C. Johnson Study Optimal Programs Suppose we have an expression tree T that we wish to compute into a register –For the moment, we assume T can be computed with no stores –We assume we have N scratch registers Suppose the root node of T is + Then, in an optimal program, the last instruction must have a + at the root of the tree that it transforms –We make a list of these instructions –Each has some preconditions for it to be legal

May 23, 2005Copyright (c) Stephen C. Johnson Preconditions: Example Suppose the last instruction was add r1,r2,r1 Suppose the tree T looks like Then our optimal program must compute T1 into r1 and T2 into r2 + T1T2

May 23, 2005Copyright (c) Stephen C. Johnson Precondition Resources If our optimal program ends in this add instruction, then we can assume that it contains two subprograms that compute T1 and T2 into r1 and r2, respectively

May 23, 2005Copyright (c) Stephen C. Johnson Precondition Resources (cont.) Look at the first instruction –If it computes part of T1, then (since no stores) at least one register is always in use computing T1. –So T2 must be computed using at most N-1 registers –Alternatively, if the first instruction computes part of T2, T1 must be computed using at most N-1 registers.

May 23, 2005Copyright (c) Stephen C. Johnson Reordering Lemma Let P be an optimal program without stores that computes T. Suppose it ends in an instruction X that has k preconditions. Then we can reorder the instructions in P so it looks like P1 P2 P3... Pk X where the Pi compute the preconditions of X in some order. Moreover, P2 uses at most N-1 registers, P3 uses at most N-2 registers, etc., and each Pi computes its precondition optimally using that number of registers

May 23, 2005Copyright (c) Stephen C. Johnson Cost Computation Define C(T,n) to be the cost of the optimal program computing T using at most n registers. Suppose X is an instruction matching the root of T with k preconditions, corresponding to subtrees T1 through Tk. Then C(T,n) <= C(X)+C(T1,p1)+...+C(Tk,pk) where C(X) is the cost of instruction X, and p1,...,pk are a permutation of the numbers n, n-1,... n-k+1 In fact, C(T,n) equals the minimum, over all instructions X and permutations, of this sum

May 23, 2005Copyright (c) Stephen C. Johnson Sketch of Proof By the reordering lemma, we can write any optimal program as a sequence of subprograms computing the preconditions in order, with decreasing numbers of scratch registers, followed by some instruction X. If any subprograms is not optimal, we can make the program shorter, contradicting optimality of the original program. Thus the optimal cost equals one of the sums (for some X and permutation)

May 23, 2005Copyright (c) Stephen C. Johnson How About Stores (spills)? We will now let C(T,n) represent the cost of computing T with n registers if stores (spills) are allowed. More notation: if T is a tree and S a subtree, T/S will represent T with S removed and replaced by a MEM node.

May 23, 2005Copyright (c) Stephen C. Johnson Another Rearrangement Lemma Suppose P is an optimal program computing a tree T, and suppose a subtree S is stored into a temporary location in this optimal program. Then P can be rewritten in the form P1 P2 where P1 computes S into memory and P2 computes T/S.

May 23, 2005Copyright (c) Stephen C. Johnson Consequences P1 can use all N registers. After P1 runs, all registers are free again. Let C(S,0) be the cost of computing S into a temporary ( MEM ) location. Then C(T,n) <= C(S,0) + C(T/S,n) One way to compute S into memory is to compute it into a register and store it (there may be other, cheaper, ways). Thus C(S,0) <= C(S,N) + Cstore

May 23, 2005Copyright (c) Stephen C. Johnson Optimal Algorithm 1.Recursively compute C(S,n) and C(S,0) for all subtrees of T, starting bottom up, and all n <= N. 2.Enumerate all instructions matching the root of T. Those that leave a result in memory contribute to C(T,0). Those leaving a result in a register contribute to C(T,n). Apply the cost formula for each permutation of the preconditions of the instruction, and remember the minimal costs. 3.Update C(T,0) using C(T,0) = min(C(T,0),C(T,N)+Cstore) 4.The result gives the minimal cost to compute the tree using n registers, or to compute it into memory

May 23, 2005Copyright (c) Stephen C. Johnson Dynamic Programming This bottom-up technique is called dynamic programming It has a fixed cost per tree node because: –There are a finite (usually small) number of instructions that match the root of each tree –The number of permutations for each instruction is fixed (and typically small) –The number of scratch registers N is fixed So the optimal cost can be determined in time linear in the size of the tree

May 23, 2005Copyright (c) Stephen C. Johnson Unravelling Going from the minimal cost back to the instructions can be done several ways: –Can remember the instruction and permutation that gives the minimal value for each node –At each node, recompute the desired minimal value until you find an instruction and permutation that attain it

May 23, 2005Copyright (c) Stephen C. Johnson Top-Down Memo Algorithm Instead of computing bottom up, you can compute top down (in a lazy manner) and remember the results. This might be considerably faster for some architectures

May 23, 2005Copyright (c) Stephen C. Johnson No Spills! Note that we do not have to have spill code in this algorithm. The subtrees that are computed and stored “fall out” of the algorithm. They are computed ahead of the main computation, when all registers are available. The resulting instruction stream is not typically a tree walk of the input.

May 23, 2005Copyright (c) Stephen C. Johnson Reality Check Major assumptions –Cost is the sum of costs of instructions Assumes single ALU, no overlapping Many machines now have multiple ALUs, overlapping operations – All registers identical True of most RISC machines Not true of X86 architectures But memory operations getting more expensive –Optimality for spills is important

May 23, 2005Copyright (c) Stephen C. Johnson Other Issues Register allocation across multiple statements, flow control, etc. –Can make a big difference in performance –Can use this algorithms to evaluate possible allocations Cost of losing a scratch register to hold a variable

May 23, 2005Copyright (c) Stephen C. Johnson Common Subexpressions A subtree S of T is used more than once (T is now not a tree, but a DAG) Say there are 2 uses of S. Then there are 4 strategies –Compute S and store it –Compute one use and save the result until the second use (2 ways, depending on which use is first –Ignore the sharing, and recompute S

May 23, 2005Copyright (c) Stephen C. Johnson Cost Computations Ignoring the sharing is easy Computing and storing is easy Ordering the two uses implies an ordering of preconditions in some higher-level instruction selection –And the number of free registers is affected, too Do the problem twice, once for each order

May 23, 2005Copyright (c) Stephen C. Johnson Summary Register spills are evil –Complicated, error-prone, hard to test If something is to be spilled, compute it ahead of time with all registers free The optimal spill points fall out of the dynamic programming algorithm