CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Comparison and Evaluation of Back Translation Algorithms for Static Single Assignment Form Masataka Sassa #, Masaki Kohama + and Yo Ito # # Dept. of Mathematical.
Register Usage Keep as many values in registers as possible Register assignment Register allocation Popular techniques – Local vs. global – Graph coloring.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (
Register Allocation Zach Ma.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Idea of Register Allocation x = m[0]; y = m[1]; xy = x*y; z = m[2]; yz = y*z; xz = x*z; r = xy + yz; m[3] = r + xz x y z xy yz xz r {} {x} {x,y} {y,x,xy}
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Register Allocation Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
1 CS 201 Compiler Construction Machine Code Generation.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
Carnegie Mellon Lecture 6 Register Allocation I. Introduction II. Abstraction and the Problem III. Algorithm Reading: Chapter Before next class:
School of Computer Science A Global Progressive Register Allocator David Ryan Koes Seth Copen Goldstein Carnegie Mellon University
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Register Allocation Mooly Sagiv html://
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
November 29, 2005Christopher Tuttle1 Linear Scan Register Allocation Massimiliano Poletto (MIT) and Vivek Sarkar (IBM Watson)
Prof. Bodik CS 164 Lecture 171 Register Allocation Lecture 19.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Register Allocation (via graph coloring)
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
2005 International Symposium on Code Generation and Optimization Progressive Register Allocation for Irregular Architectures David Koes
Register Allocation (via graph coloring). Lecture Outline Memory Hierarchy Management Register Allocation –Register interference graph –Graph coloring.
1 Liveness analysis and Register Allocation Cheng-Chia Chen.
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science John Cavazos Architecture and Language Implementation Lab Thesis Seminar University.
Register Allocation Recap Mooly Sagiv html:// Special Office Hours Wednesday 12-14, Thursday 12-14, Schriber.
4/29/09Prof. Hilfinger CS164 Lecture 381 Register Allocation Lecture 28 (from notes by G. Necula and R. Bodik)
Register Allocation and Spilling via Graph Coloring G. J. Chaitin IBM Research, 1982.
Linear Scan Register Allocation POLETTO ET AL. PRESENTED BY MUHAMMAD HUZAIFA (MOST) SLIDES BORROWED FROM CHRISTOPHER TUTTLE 1.
CS745: Register Allocation© Seth Copen Goldstein & Todd C. Mowry Register Allocation.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
Supplementary Lecture – Register Allocation EECS 483 University of Michigan.
Dr. José M. Reyes Álamo 1.  An assembly language that is easier to understand that regular assembly  Borrow some features from high-level languages.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Register Allocation John Cavazos University.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
Register Usage Keep as many values in registers as possible Keep as many values in registers as possible Register assignment Register assignment Register.
Register Allocation CS 471 November 12, CS 471 – Fall 2007 Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
University of Amsterdam Computer Systems – the instruction set architecture Arnoud Visser 1 Computer Systems The instruction set architecture.
Register Allocation: Graph Coloring Compiler Baojian Hua
2/22/2016© Hal Perkins & UW CSEP-1 CSE P 501 – Compilers Register Allocation Hal Perkins Winter 2008.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
More Code Generation and Optimization Pat Morin COMP 3002.
David W. Goodwin, Kent D. Wilken
Global Register Allocation Based on
Mooly Sagiv html://
Register Allocation Hal Perkins Autumn 2009
Register Allocation Noam Rinetzky Text book:
Code Generation.
Register Allocation Hal Perkins Autumn 2011
Chapter 12 Pipelining and RISC
Compiler Construction
Lecture 17: Register Allocation via Graph Colouring
Fall Compiler Principles Lecture 13: Summary
(via graph coloring and spilling)
CS 201 Compiler Construction
Presentation transcript:

CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz

8 December Outline Motivation of the Study Register Allocation Problem Classical Methods (Chaitin & Briggs) Optimal Register Allocator Experimental Study

8 December Motivation of the Study Challenges of Compilers for Embedded Systems Power consumption, memory space limitations Small set of applications Afford long execution cycles to generate good code quality for various phases instruction selection instruction scheduling register allocation

8 December Motivation of the Study (2) Instruction Selection selecting target machine instructions to implement pirimitive IR (Instruction Representation) code instructions changes quality of the code Instruction Scheduling ordering the operations in the compiled code decreases the running time of the compiler

8 December Register Allocation Problem assigning program variables into available registers shape runtime performance of a compiled code Failure to provide an efficient register allocation increase in the number of memory accesses increase in code size (effect memory capacity and overall form factor of the device) increase in power consumption (frequent memory visits due to poor register allocation)

8 December Register Allocation (2) NP-Complete (Garey & Johnson, 1976) Approaches Graph Coloring Chaitin (1981) Integer Programming Goodwin and Wilken (1996)

8 December Graph Coloring Traditional solution to register allocation problem. Graphs are used to show registers Each node represents a register, and an edge connecting these nodes shows that these registers are alive at the same point in the program Such nodes should be colored with different colors

8 December Graph Coloring (2) Spilling (lack of registers  variables stored in memory for some or all of its lifetime) Spill cost (runtime cost of a variable for loading from and storing in memory) address computation, memory operation, execution frequency

8 December Live Ranges A variable Vi is live at a point p in program if defined above p & not used yet for the last time. Live Range (LRi ) begins with the definition of Vi ends with its last use of Vi LRi & LRj simultaneously live at p  LRi interferes LRj Not stored in the same register. Interference Graph Gı = G(V,E) V = set of individual live ranges E = set of edges that represent interferences

8 December int main(){ int a; int b; int i; a=10; b=1; i=0; while (i<=a){ b+=b*i; i++; if (b>=100) break; } return 0; } main: pushl%ebp movl%esp, %ebp subl$24, %esp andl$-16, %esp movl$0, %eax subl%eax, %esp movl$10, -4(%ebp) movl$1, -8(%ebp) movl$0, -12(%ebp).L2: movl-12(%ebp), %eax cmpl-4(%ebp), %eax jle.L4 jmp.L Source Code GaS (GNU Assembler)

8 December main: subl $4, t1 (t2) movl t3, t2 movl t2, t3 (t4) subl $24, t2 (t5) andl $-16, t5 (t6) movl $0, t7 subl t7, t6 (t8) movl $10, t4 movl $1, t4 movl $0, t4.L2: movl t4, t7 (t9) cmpl t4, t Extended Representation Interference Graph t8 t9 t11 t1 2 t1 0 t3 t1 3 t7 t6 t5 t2 t1 t1 4 t1 5 t4t4

8 December Classical Methods for Register Allocation Register allocator based on Graph Coloring Chaitin’s Heuristic (limitations for diamond graphs) Optimistic Coloring Heuristic (Briggs) Stack-Based Methods

8 December Chaitin’s Heuristic Initialize stack S to empty. while(G I   ) do while  v of G 1 such that v 0 < k Pick any vertex v such that v 0 < k Remove v and its edges from G 1 and put v on S. if (G I   ) then Pick a vertex v based on the given Spill Metric Spill the live range associated with v. Remove v and its edges from G I while(S   ) do v = pop(S) Color v with the lowest color not used by any neighbor of v.

8 December Chaitin-Briggs Heuristic (OCH) Initialize stack S to empty. while(G I   ) do while  v of G 1 such that v 0 < k Pick any vertex v such that v 0 < k Remove v and its edges from G 1 and put v on S. if (G I   ) then Pick a vertex v based on the given Spill Metric Push v on the stack Remove v and its edges from G I while(S   ) do v = pop(S) Color v with the lowest color not used by any neighbor of v. If node υ cannot be colored, then pick an uncolored node υ to spill, spill it, and restart at step 1

8 December Comparison of Chaitin’s Heuristic and OCH Try to find 2 colorings A B C D Chaitin (A spilled, B->r1, C->r2, D->r1) OCH (A->r1, B->r2, C->r1, D->r2)

8 December Integer Programming (IP) Compared with graph coloring, IP increases program performance reduces code size The time to solve a register allocation problem can be significant The IP formulation should be as simple as possible

8 December Optimal Register Allocator (ORA) ORA uses IP to solve register allocation problem Proposed by Goodwin and Wilkonson (1996) IP model is very complex, because it contains many redundancies Solution of the problem is slow

8 December A Faster Optimal Register Allocator “A Faster Optimal Register Allocator” uses IP to solve register allocation problem Fu, Wilken and Goodwin (2005) The proposed approach uses global and local analysis techniques to identify locations where spill and deallocation decisions are unnecessary Uses a simplified IP formulation Faster

8 December Basic ORA Model

8 December Control Flow Graph and ORA Graphs

8 December Basic ORA Model Models register allocation as a set of network graphs Symbolic-register graphs Memory graphs An optimal allocation solution is obtained by selecting a set of graph edges whose costs are minimal Cost = allocation overhead of a decision

8 December IP Formulation

8 December Redundancy

8 December Global Reduction Eliminates unnecessary load, store and deallocation decisions placed at the diverge and merge edges in the live range graphs 80% of the total decisions generated by ORA model

8 December Decision Placement

8 December Diamond Region Reductions There are 4 reduction techniques which can eliminate unnecessary load, store and deallocation Void region coupling void region coupled decision paired decision Symmetric Decision Selection Jump-Edge Nullification Asymmetric Decision Elimination

8 December Local Reduction Examines symbolic registers used in adjacent instructions to identify unnecessary load and deallocation decisions

8 December Constraint Reduction Deallocation constraints Must-allocate constraint Single-symbolic constraint Liveness constraint

8 December Deallocation Constraints Used to allow a real register to be deallocated from a symbolic register at the deallocation decision location X r sp-1 >= X r sp X r sp-1 represents the allocation state of real register r to symbolic register s before the deallocation constraint p X r sp represents the allocation state after p

8 December Must-allocate Constraint Used to ensure a symbolic register must be allocated to a real register at each definition and each use Σ X r sp >=1 For optimal allocation, if no deallocation exists between two must-allocate constraints for a symbolic register, then the second must-allocate constraint is redundant

8 December Single-symbolic Constraint Used to ensure a real register can be allocated to at most one symbolic register Σ X r sp <=1 For optimal allocation, if no deallocation exists between two adjacant single-symbolic constraints for a real register, then the first must-allocate constraint is redundant

8 December Liveness constraint Used to ensure the liveness of a symbolic register Σ X r sp + X mem sp >=1 X mem sp represents the allocation state of a symbolic register s to memory at the liveness constraint location p

8 December Experimental Study Compares graph coloring, ORA and faster ORA For ORA and faster ORA, SPEC CPU2000 and SPEC CPU92 integer benchmark suites are used with a RISC processor

8 December SPEC CPU92 Benchmark Functions

8 December # decision variables and constraints produced by basic ORA and Faster ORA

8 December Dynamic spill-code saved using Faster ORA

8 December Dynamic spill code components for SPEC CPU 2000

8 December Conclusion Two different solutions to register allocation problem Integer Programming Graph Coloring The formulations and usages of these solutions are shown Faster ORA reduces the number of register allocation IP decision variables compared to the basic IP formulations IP gives better results as compared to graph coloring

8 December References G. Chatin and M. Auslender, “Register allocation via coloring,” Computer Languages, 1981 D. Goodwin and K. Wilken, “Optimal and near-optimal global register allocation using 0-1 integer programming,” Software Practice and Experience, 1996 C. Fu, K. Wilken and D. Goodwin, “A Faster Optimal Register Allocator,” Journal of Instruction-Level Parallelism 7, 2005

8 December Thank You ANY QUESTIONS??