(via graph coloring and spilling)

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

Register Allocation Consists of two parts: Goal : minimize spills
Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (
Register Allocation Zach Ma.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Idea of Register Allocation x = m[0]; y = m[1]; xy = x*y; z = m[2]; yz = y*z; xz = x*z; r = xy + yz; m[3] = r + xz x y z xy yz xz r {} {x} {x,y} {y,x,xy}
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
Graph-Coloring Register Allocation CS153: Compilers Greg Morrisett.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
Multiprocessing Memory Management
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Register Allocation (via graph coloring)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Improving Code Generation Honors Compilers April 16 th 2002.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Linear Scan Register Allocation POLETTO ET AL. PRESENTED BY MUHAMMAD HUZAIFA (MOST) SLIDES BORROWED FROM CHRISTOPHER TUTTLE 1.
CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.
Supplementary Lecture – Register Allocation EECS 483 University of Michigan.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Spring 2014Jim Hogg - UW - CSE - P501P-1 CSE P501 – Compiler Construction Register allocation constraints Local allocation Fast, but poorer code Global.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Final Code Generation and Code Optimization.
Register Usage Keep as many values in registers as possible Keep as many values in registers as possible Register assignment Register assignment Register.
Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
More Code Generation and Optimization Pat Morin COMP 3002.
Global Register Allocation Based on
Topic Register Allocation
Mooly Sagiv html://
The compilation process
Register Allocation Hal Perkins Autumn 2009
Constraint Satisfaction Problems vs. Finite State Problems
Register Allocation Noam Rinetzky Text book:
Main Memory Management
Register Allocation Hal Perkins Autumn 2011
Factored Use-Def Chains and Static Single Assignment Forms
Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.
Virtual Memory Hardware
Register Allocation Hal Perkins Summer 2004
Register Allocation Hal Perkins Autumn 2005
Algorithms and data structures
Final Code Generation and Code Optimization
Lecture 16: Register Allocation
Chapter 12 Pipelining and RISC
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Lecture 17: Register Allocation via Graph Colouring
Code Generation Part II
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
CS 201 Compiler Construction
Presentation transcript:

(via graph coloring and spilling) Register Allocation (via graph coloring and spilling)

Register allocation LLVM IR uses an unbounded set of virtual registers. Register allocation yields code in terms of hardware registers. These are limited. For example, x86_64 has: 16x general purpose (64bit) registers (plus 16x SSE registers, 2x status registers, 6x 32bit registers, 8x FPU/MMX registers). Using registers when possible is crucial to performance. Register access for i7-4770 is ~1 CPU cycle, L1 cache is ~4 cycles, L2 cache is ~12 cycles, L3 cache is ~36-58 cycles. Register allocation is as old as intermediate languages. The first FORTRAN compiler (April 1957) had a primitive register allocator.

Register allocation Register pressure results from a lack of machine registers for the needed virtual registers. If there are too many virtual registers live at once, values must be spilled into memory, usually extra space on the stack. The goals of performing register allocation well are: To guarantee correct output code. To minimize spill code (added loads and stores). To minimize added spill space on the stack.

Two approaches The naïve approach: “local” register allocation. Allocates registers in each basic block separately. Traverses the basic block, maintaining a map from virtual registers to either a machine register or offset on the stack. When a virtual register not currently stored in a machine register is accessed, a machine register is spilled onto the stack using a store instruction and the required register is loaded. The virtual register map is updated for both. The graph-coloring approach: “global” register allocation. The new Chez Scheme produces 15-24% faster code.

Assuming 3 hardware registers: rax, rbx, rcx The naïve approach Assuming 3 hardware registers: rax, rbx, rcx ... %b = add %a, 1 %c = mul %a, %b %d = add %c, %b %e = add %b, %a %a → rax

Assuming 3 hardware registers: rax, rbx, rcx The naïve approach Assuming 3 hardware registers: rax, rbx, rcx ... %b = add %a, 1 %c = mul %a, %b %d = add %c, %b %e = add %b, %a %a → rax %b → rbx

Assuming 3 hardware registers: rax, rbx, rcx The naïve approach Assuming 3 hardware registers: rax, rbx, rcx ... %b = add %a, 1 %c = mul %a, %b %d = add %c, %b %e = add %b, %a %a → rax %b → rbx %c → rcx

Assuming 3 hardware registers: rax, rbx, rcx The naïve approach Assuming 3 hardware registers: rax, rbx, rcx ... %b = add %a, 1 %c = mul %a, %b store rax, rsp+0 %d = add %c, %b %e = add %b, %a %a → rsp+0 %b → rbx %c → rcx %d → rax

Assuming 3 hardware registers: rax, rbx, rcx The naïve approach Assuming 3 hardware registers: rax, rbx, rcx ... %b = add %a, 1 %c = mul %a, %b store rax, rsp+0 %d = add %c, %b store rcx, rsp+16 store rbx, rsp+8 rbx = load rsp+0 %e = add %d, %a %a → rbx %b → rsp+8 %c → rsp+16 %d → rax %e → rcx

Graph-coloring approach Interprets register allocation as a graph coloring problem: Given a graph G and number of colors k, a valid solution is an assignment of G’s nodes to colors, numbered [1..k], where no two adjacent nodes have the same color. The problem is NP-Hard for k > 2. The algorithm constructs a register interference graph where: There is a node for each SSA register (or assignment) There is an edge between two nodes when both registers may be live at some point in the code. k is the number of hardware registers in our allocation pool.

Liveness & live ranges Begins when a virtual register is assigned (for SSA this is unique). Ends when that virtual register is last used. %a %a = … %b %b = add %a, 1 %c %c = mul %b, 2 store %a … store %a …

Register interference graph At each instruction S in the procedure, add an edge (a, b) for all pairs of registers, %a and %b, live at S. %a %a %a = … %a = … a a %b … = %b … = %a %b %b = … %b = … b b … = %a … = %b

Graph coloring via simplification Solve using a heuristic that can’t guarantee an optimal coloring. First, reduce the complexity of the problem: While there exists a node R with a degree < k: remove R and push it onto a stack of low-degree nodes. Then, either all nodes in the graph are removed, or a node with degree of at least k is left over. Such a virtual register must* be spilled to an address on the stack. Last, given an empty graph and stack of nodes of degree < k, each node can be popped and inserted into the graph with a kth color not shared by any of its neighbors.

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b a d e c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b d e c a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b d e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b e d b,e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b e b d b,e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b e b d b,e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b e b d b,e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b e d b,e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b d e c b,d a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b d e c a b,c

Graph coloring via simplification Assuming 3 hardware registers / 3 colors b a d e c

Graph coloring via simplification If, at some point, all remaining nodes have a degree of at least k, then we must spill one of those virtual registers to the stack. Pick the virtual register with lowest spill cost by some heuristic. There are various strategies for spilling. Two common ones: Reserve a subset of registers for spilled values and wrap every use of the register with a load and store. Split the register into 2+ virtual registers (connected with a store & load) and recompute live ranges; the next iteration would then begin with larger but simpler graph where the spilled node is replaced by 2+ lower-degree nodes.