Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

Target Code Generation
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register Allocation Zach Ma.
Lecture 11: Code Optimization CS 540 George Mason University.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
Graph-Coloring Register Allocation CS153: Compilers Greg Morrisett.
SSA.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
1 Optimistic Register Coalescing Sobeeh Almukhaizim UC San Diego Computer Science & Engineering Based on the work of: J. Park and S. Moon School of Electrical.
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
From AST to Code Generation Professor Yihjia Tsai Tamkang University.
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
November 29, 2005Christopher Tuttle1 Linear Scan Register Allocation Massimiliano Poletto (MIT) and Vivek Sarkar (IBM Watson)
Code Generation Simple Register Allocation Mooly Sagiv html:// Chapter
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Register Allocation (via graph coloring)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Chapter 13 Reduced Instruction Set Computers (RISC) CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
Linear Scan Register Allocation POLETTO ET AL. PRESENTED BY MUHAMMAD HUZAIFA (MOST) SLIDES BORROWED FROM CHRISTOPHER TUTTLE 1.
CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Register Allocation Harry Xu CS 142 (b) 02/11/2013.
Register Usage Keep as many values in registers as possible Keep as many values in registers as possible Register assignment Register assignment Register.
Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
More Code Generation and Optimization Pat Morin COMP 3002.
Global Register Allocation Based on
Mooly Sagiv html://
Register Allocation Hal Perkins Autumn 2009
Optimization Code Optimization ©SoftMoore Consulting.
Register Allocation Hal Perkins Autumn 2011
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Lecture 16: Register Allocation
Chapter 12 Pipelining and RISC
Compiler Construction
Register Allocation via Coloring of Chordal Graphs
Target Code Generation
Fall Compiler Principles Lecture 13: Summary
Introduction to Optimization
(via graph coloring and spilling)
CS 201 Compiler Construction
Presentation transcript:

Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982

Motivation n Before the register allocation phase, the compiler assumes that there are an unlimited number of general purpose registers n The symbolic registers must be mapped to real registers in a way that avoids conflicts n Symbolic registers that cannot be mapped to real registers must be spilled to memory n We need an algorithm to map registers with minimal spilling cost

Paper Overview n Register allocation overview n Subsumption algorithm n Interference graph coloring algorithm n Spilling algorithm

Register Allocation Steps 1. Determine which registers are live at any point in the intermediate language (IL) program 2. Build a register interference graph nNodes represent symbolic registers nEdges represent a conflict between symbolic registers 3. Subsumption: eliminate unnecessary register copies 4. Find a 32-coloring of the interference graph 5. Decide which registers to spill if necessary

Subsumption n If the source and destination of a register copy do not interfere, they may be coalesced into a single node n For each register copy in IL, determine whether the registers interfere n If not, coalesce the two nodes into one n After first pass, rewrite IL code n Repeat until no more coalescing is possible

Subsumption Example InstructionsLiveDead A = 1A B = AB B = B + 1 C = BCB D = ADA …C, D AB CD

Subsumption Example InstructionsLiveDead AD = 1AD BC = ADBC BC = BC + 1 …AD, BC ADBC

Finding a 32-Coloring n Each symbolic register is assigned a color representing a real register n If no adjacent nodes have the same color, then the coloring succeeds n Assume that G has a node N with degree < 32 n Then G is 32-colorable iff the reduced graph from which N and all its edges have been omitted is 32- colorable n Algorithm throws away nodes of degree < 32 until all nodes have been removed n Algorithm fails if no node has degree < 32

3-coloring example InstructionsLiveDead A = 1A B = 2B C = 3C ? = AA D = 4D ? = BB ? = CC ? = DD AB CD

Spilling n If the 32-coloring fails, then nodes must be spilled to memory n Spilled registers are stored to memory, then loaded momentarily when their results are needed n Every time spill code is generated, the interference graph must be rebuilt n Usually recoloring succeeds after spilling, but sometimes several passes are required

Spilling n NP-Complete problem n Heuristic: spill the node that minimizes –Cost of spilling / Degree of node n Cost of spilling –(number of definition points + number of use points) * frequency of each point n In some cases, spilled node can be reloaded for an extended interval

Conclusion n The graph coloring and spilling algorithms should produce faster code n The register allocation algorithm is efficient –Graph coloring is  (N) –But uses  (N 2 ) space

Compile-time Copy Elimination Peter Schnorf Mahadevan Ganapathi John Hennessy Stanford, 1993

Motivation n Single assignment languages simplify dependency checking n Which simplifies automatic detection and exploitation of parallelism n But single-assignment languages require a large number of copies n Previous implementations eliminate copies at runtime n Increased efficiency if copies can be eliminated at compile time

Paper Overview n Single-assignment languages n Code generation n Compile-time copy elimination techniques –Substitution –Pattern matching –Substructure sharing –Substructure targeting n Results – success! –Eliminated all copies in bubble sort

Single-assignment languages n Functional languages (LISP, Haskell, SISAL) n Simpler dependency checking –True dependencies – write, read < b = f(c), a = f(b) –Anti-dependencies – read, write < a = f(b), b = f(c) –Output dependencies – write, write < a = f(b), a = f(c) –Aliasing < caused by pointers, array indexes n To avoid aliasing, all inputs and outputs are passed by value

Example – Swap(A,i,j) n Data flow diagram –Edges transport values –Simple nodes are operations n Pick any feasible node evaluation order at random n Naïve implementation –Each edge has its own memory –Swap uses 5 array copies! n Optimized implementation –Swap array updates are done in- place AElement AReplace Input

Example: BubbleSort(A) n Compound nodes represent control flow n Loops are implemented using recursion to avoid multiple assignment of the iteration variable n Naïve implementation –Bubble sort requires  (n 2 ) array copies n Optimized implementation –All array updates are done in place –But parallelism is decreased

Code Generation Overview n Input is from compiler front-end –IF1: intermediate data-flow graph representation n Code generator eliminates copies n Output is in C –Compiled into machine code using an optimized C compiler

Vertical Substitution n If input and output have the same type and size, they can share memory –Updates are done in-place AElement AReplace Input

Horizontal Substitution n If an output has several destinations, the output edges can share memory AElement AReplace Input

Horizontal and Vertical Substitution n Horizontal and vertical substitution can interfere with each other –A node along the substitution chain modifies the shared object before its last use n Edges can be marked as read-only if they are shared and this is not the last use

Horizontal and Vertical Substitution AElement AReplace Input AElement AReplace Input

Interprocedural Substitution n Previous discussion concerned simple nodes that can be analyzed at compiler design time n Information about a function is needed in order to use substitution –Does the function modify an input? –Will an input be chained to an output?

Intersubgraph Substitution n Substitution analysis is done for each construct n Same basic principles

Determining the Evaluation Order n Evaluation order can impact efficiency of substitution n Naïve implementation selects the next node to evaluate at random n Hints tell algorithm which nodes should be evaluated before and after other nodes if possible n Hints are ad hoc?

Pattern Matching n Replace hard-to-optimize pieces of code n Patterns are language-specific n Patterns are detected using “ad hoc” methods

Substructure Sharing n Allow substructures to be referenced without copies n AElement can be treated as a NoOp n Happens after substitution analysis – less important n Same principles as substitution analysis

Substructure Targeting n Allow structures to be built from substructures without copies n Similar to substructure sharing

Results n Compared optimizations versus naïve implementation n Optimization eliminate all copies for bubble sort n Informal comparison to run-time optimizer shows improvements

Results

Conclusions n Substitution, pattern matching and substructure sharing can almost eliminate unnecessary copies in a single assignment language. n Copy elimination no longer has to be done at run-time. n Single assignment languages should be more efficient for parallel programs.