Register Tracking Register tracking improves on a simple code generator Uses a simple local register allocation scheme in which the contents of allocatable.

Slides:



Advertisements
Similar presentations
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Advertisements

Register Allocation Consists of two parts: Goal : minimize spills
Calling sequence ESP.
Target Code Generation
Instruction Set Design
1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Instruction Set-Intro
The University of Adelaide, School of Computer Science
Lecture 11: Code Optimization CS 540 George Mason University.
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
Coalescing Register Allocation CS153: Compilers Greg Morrisett.
Compiler Construction Sohail Aslam Lecture ExampleExample a = b + c t1 = a * a b = t1 + a c = t1 * b t2 = c + b a = t2 + t2.
COMPILERS Register Allocation hussein suleman uct csc305w 2004.
INSTRUCTION SET ARCHITECTURES
1 CS 201 Compiler Construction Machine Code Generation.
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
A simple register allocation optimization scheme.
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
Lecture 6: MIPS Instruction Set Today’s topic –Control instructions –Procedure call/return 1.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
Instruction Set Architecture Classification According to the type of internal storage in a processor the basic types are Stack Accumulator General Purpose.
Computer Architecture CSCE 350
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
The University of Adelaide, School of Computer Science
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
(1) ICS 313: Programming Language Theory Chapter 10: Implementing Subprograms.
Informationsteknologi Saturday, September 29, 2007 Computer Architecture I - Class 41 Today’s class More assembly language programming.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Generating Code From DAGs Directed Ascyclic Graph (DAG) A tree structure such that nodes may have more than one parent Multiple parents are allowed so.
1 Chapter 7: Runtime Environments. int * larger (int a, int b) { if (a > b) return &a; //wrong else return &b; //wrong } int * larger (int *a, int *b)
1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.
CS 536 Spring Run-time organization Lecture 19.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
1 Register Allocation Consists of two parts: –register allocation What will be stored in registers –Only unambiguous values –register assignment Which.
Run time vs. Compile time
Run-time Environment and Program Organization
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
December 8, 2003Other ISA's1 Other ISAs Next, we discuss some alternative instruction set designs. – Different ways of specifying memory addresses – Different.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 7: MIPS Instructions III
Lecture 19: 11/7/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Lecture 4: MIPS Instruction Set
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Chapter 7 Object Code Generation. Chapter 7 -- Object Code Generation2  Statements in 3AC are simple enough that it is usually no great problem to map.
C Chuen-Liang Chen, NTUCS&IE / 297 CODE GENERATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Chapter 8 Code Generation
Chapter 11 Instruction Sets
Chapter 7 Subroutines Dr. A.P. Preethy
Programming Languages (CS 550) Mini Language Compiler
Code Generation.
Unit IV Code Generation
CS 201 Compiler Construction
Lecture 16: Register Allocation
TARGET CODE -Next Usage
Program and memory layout
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Compiler Construction
Code Generation Part II
Program and memory layout
Programming Languages (CS 360) Mini Language Compiler
Code Optimization.
Presentation transcript:

Register Tracking Register tracking improves on a simple code generator Uses a simple local register allocation scheme in which the contents of allocatable registers are tracked Allows allocation of frequently accessed variables and temporaries to registers It is not optimal We generate code from tuples n  2 registers must be available for allocation

Machine Instructions and Cost Load Storage, Reg-- Cost = 2 Store Storage, Reg-- Cost = 2 OP Storage, Reg-- Reg = Reg OP Storage; Cost = 2 OP Reg1, Reg2-- Reg2 = Reg2 OP Reg1; Cost = 1

Status Flags L (live) or D (dead) –A variable or temp that is live will be referenced again in the basic block S (to be saved) or NS (not to be saved) –A variable should always be saved at the end of a basic block if it has not already been stored in memory –Temps are not saved after they become dead

Cost of Var/Temp 0 If status is (D,NS) – if the next reference to it is an assignment of a new value 0If status is (D,S) – The variable won’t be used again, but it hasn’t been saved – There is no cost in freeing the register and doing the save immediately 2If status is (L,NS) – A load is needed to restore the it to a register 4If status is (L,S) – A store is needed to save the value, and a load is needed to restore the value to a register

Cost Algorithm cost = (U is in a register ? 0 : get_reg_cost() + 2 /* cost to load U into R1? + cost (R1) /* cost of loading U */ + (V is in a register || U == V ? 1 : 2) /* cost of register-to-register */ /* vs storage-to-register */

Example Basic Block a = b * c + d * e d = c + (d – b); f = e + a + c; a = d + e;

Quadruples 1.(*, b, c, t1)8. (+, e, a, t6) 2.(*, d, e, t2)9. (+, t6, c, t7) 3.(+, t1, t2, t3)10. (=, t7, f) 4.(=, t3, a)11. (+, d, e, t8) 5.(–, d, b, t4)12. (=, t8, a) 6.(+, c, t4, t5) 7.(=, t5, d)

Unoptimized Code 1.Load b, r111.Store d, r2 2.Mul c, r112.Load e, r1 3.Load d, r213.Add a, r1 4.Mul e, r214.Add c, r1 5.Add r2, r115.Store f, r1 6.Store a, r116.Load d, r1 7.Load d, r117.Add e, r1 8.Sub b, r118.Store a, r1 9.Add r1, r2 10.Store d, r2

Register Tracking – 1 Tuple/Code GeneratedRegisterAssoc r1r2r3r4 (*,b,c,t1) Cost(*,b,c,t1)=2+2+2 Cost(*,c,b,t1)=2+2+2 Load b,r1 Load c,r2 Mul r2, r1 b(L,NS) T1(L,S) C(L,NS) (*,d,e,t2) cost(*,d,e,t2)=2+2+2 cost(*,e,d,t2)=2+2+2 Load d,r3 Load e,r4 Mul r4,r3 t1(L,S) c(L,NS) d(L,NS) t2(L,S) e(L, NS)

Register Tracking – 2 Tuple/Code GeneratedRegisterAssoc r1r2r3r4 (+,t1,t2,t3) cost(+,t1,t2,t3)=0+0+1 cost(+,t2,t1,t3)=0+0+1 Add r3,r1 -- (D,NS) can be immediately removed t3(L,S)c(L,NS)t2(D,NS)e(L,NS) (=,t3,a) -- The store is deferred a(L,S)c(L,NS)e(L,NS) (–,d,b,t4) Load d,r3 Sub b,r3 -- B dead after this tuplea(L,S) c(L,NS) d(D,NS) t4(L,S) e(L,NS)

Register Tracking - 3 Tuple/Code GeneratedRegisterAssoc r1r2r3r4 (+,c,t4,t5) cost(+,c,t4,t5)=0+2+1 cost(+,t4,c,t50=0+0+1 Add r2,r3a(L,S)c(L,NS)t5(L,S)e(L,NS) (=,t5,d) -- The store is deferred a(L,S)c(L,NS)d(L,S)e(L,NS) (+,e,a,t6) cost(+,e,a,t6)=0+2+1 cost(+,a,e,t6)= a is dead after this Add r4,r1t6(L,S)c(L,NS)d(L,S)e(L,NS)

Register Tracking – 4 Tuple/Code GeneratedRegisterAssoc r1r2r3r4 (+,t6,c,t7) cost(+,t6,c,t7)=0+0+1 cost(+,c,t6,t7)=0+0+1 Add r2,r1t7(L,S)c(D,NS)d(L,S)e(L,NS) (=,t7,t8) Store d,r3 -- Store since f is not -- live in block t7(D,NS)d(L,S)e(L,NS)

Register Tracking – 5 Tuple/Code GeneratedRegisterAssoc r1r2r3r4 (+,d,e,t8) cost(+,d,e,t8)=0+0+1 cost(+,e,d,t8)=0+0+1 Store d,r3 -- Store is unavoidable Add r3,r4 d(L,NS) t8(L,S) e(L,NS) (=,t8,a) Store a,r3 -- Store is unavoidable

Optimized Code 1.Load b,r1 9. Sub b,r3 2.Load c,r2 10. Add r2,r3 3.Mul r2, r1 11. Add r4,r1 4.Load d,r3 12. Add r2,r1 5.Load e,r4 13. Store d,r3 6.Mul r4,r3 14. Store d,r3 7.Add r3,r1 15. Add r3,r4 8.Load d,r3 16. Store a,r3

Effects of Aliasing Let N be a name that can alias data objects. N can be a: –reference parameter –pointer –indexed variable For N we compute a set O of data objects that it may alias, i.e. set of all; –variables of a given type –heap objects of a given type –array elements

Reference to N When N is referenced: Examine register association list If any data object o  O appears with status S, the store must be completed before N is referenced

Assignment to N When assignment to N is made: If any data object o  O appears in the register association list, it must be removed. Removal reflects that the assignment to N may have changed the value of o, invalidating the value currently held in the register associated with o.

Subprogram Calls Allocatable registers are normally saved and restored across subprogram calls.

Caller Saves and Restores Registers Save (L,S) Don’t save (L,NS) Then, free all registers On return, only those variables that are needed are reloaded into registers

Callee Saves and Restores Registers Callee knows how many registers callee will use Possibly, some number of caller’s registers will not be used and therefore can remain untouched across subprogram call Divide callee registers to be used into two groups: –Def: set of variables that may be defined (updated) –Use; Set of variables that may be used (referenced only) Before call, save all data objects o  Use that appear in register association list with status S Similarly, remove all data objects o  Def from register association list That is, save values that may be referenced during the call, and remove associations that will be invalidated by assignments during the call

Register Tracking Variations Spill register whose next reference is most distant. Consider next two references Coloring Algorithm (Chaitin 82) –Register allocation becomes a problem of coloring the graph –Each color is a register Use more precise costing based on instruction size or timing Use additional address modes (immediate, indirect, indexed+base, etc) Consider different register classes and pairs of adjacent registers

Cutting Cost Trade:For: load c, r1load c, r2sub a, r1 load c, r2add b, r2 add b, r2add r2, r1 add r2, r1 cost = 7 cost = 9

Global Register Tracking Special operands such as loop indices or procedure arguments can be allocated to fixed registers that can span multiple blocks We can carry register status information forward to basic blocks where predecessors are unique: –i.e., if and case statements –It is not always a good idea to delay saves across block boundaries (might do saves in several successor blocks instead of once in single predecessor blocks

Global Difficulties We must agree in all blocks to keep a given data object in the same register or do extra moves Must decide which of the possibly thousands of variables and temps to keep in a small set of available registers Need to do flow-of control analysis and use some mechanism to estimate frequency of reference to variables and temps

Free (L,NS) variable v1 with only a cost of 1 (instead of 2) if next time it is referenced as OP v1, r2, instead of Load v1, r1, OP r1, r2 Similarly, free (S,NS) variable v2 with only a cost of 3 (instead of 4) if we do a save and later reference the value as we did v1 above This is a peephole optimization technique Peephole Optimization