Microarchitecture of Superscalars (6) Register renaming Dezső Sima Spring 2008 (Ver. 2.0)  Dezső Sima, 2008.

Slides:

Advertisements

Similar presentations

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.

Advertisements

Real-time Signal Processing on Embedded Systems Advanced Cutting-edge Research Seminar I&III.

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.

Alpha Microarchitecture Onur/Aditya 11/6/2001.

Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

Microarchitecture of Superscalars (7) Preserving sequential consistency Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

DATAFLOW ARHITEKTURE. Dataflow Processors - Motivation In basic processor pipelining hazards limit performance –Structural hazards –Data hazards due to.

DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.

February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.

March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

Lecture 8 Shelving in Superscalar Processors (Part 1)

OOO execution © Avi Mendelson, 4/ MAMAS – Computer Architecture Lecture 7 – Out Of Order (OOO) Avi Mendelson Some of the slides were taken.

Microarchitecture of Superscalars (4) Decoding Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

Microarchitecture of Superscalars (5) Dynamic Instruction Issue Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.

Computer Architecture Computer Architecture Superscalar Processors Ola Flygt Växjö University +46.

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

15-740/ Computer Architecture Lecture 12: Issues in OoO Execution Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/7/2011.

Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.

CS203 – Advanced Computer Architecture ILP and Speculation.

15-740/ Computer Architecture Lecture 7: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University.

Dynamic Scheduling Why go out of style?

CSL718 : Superscalar Processors

Precise Exceptions and Out-of-Order Execution

Design of Digital Circuits Lecture 18: Out-of-Order Execution

PowerPC 604 Superscalar Microprocessor

Out of Order Processors

Dynamic Scheduling and Speculation

Microprocessor Microarchitecture Dynamic Pipeline

Lecture 12 Reorder Buffers

Flow Path Model of Superscalars

Sequential Execution Semantics

Instruction Level Parallelism and Superscalar Processors

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 11: Memory Data Flow Techniques

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

15-740/ Computer Architecture Lecture 5: Precise Exceptions

Krste Asanovic Electrical Engineering and Computer Sciences

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

7. Microarchitecture of Superscalars (5) Dynamic Instruction Issue

Adapted from the slides of Prof

Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011

Microarchitecture of Superscalars (4) Decoding

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution

Prof. Onur Mutlu Carnegie Mellon University

CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 11 – Out-of-Order Execution Krste Asanovic Electrical Engineering.

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Microarchitecture of Superscalars (6) Register renaming Dezső Sima Spring 2008 (Ver. 2.0)  Dezső Sima, 2008

Overview 1 The Principle of register renaming 2 Design space 2.1 Overview 2.2 Types of rename buffers 6 Examples 5 Implementation of renaming in superscalars 5.1 The chronology of introducing register renaming 5.2 Basic implementation schemes of register renaming 3 Operation of register renaming 4 Design parameters of register renaming

1. Principle of register renaming (1) Aim: Eliminating false data dependencies to relieve the issue bottleneck WAW False data dependencies WAR I1: mul r1, r2, r3 I2: addr2, r4, r5 Examples: Write After Read (Anti dependency) Write After Write: (Output dependency) I1: mul r1, r2, r3 I2: addr1, r4, r5

RB Results Retirement Ops. EU AR Source register numbers 1. Principle of register renaming (2) Figure 1.1: The principle of register renaming Basic principle to eliminate false data dependencies: Then - referenced source operands need to be fetched from the RB file, if they are actually renaned, else from the AR file, - during dispatching a new rename buffer need to be allocated to each instruction whose destination register causes false data depenency 1, - during retirement buffered results need to be transferred from the RB file to the AR file. 1 Usually, processors allocate to each dispatched instruction a rename buffer without checking for the existence of false data dependecies to reduce logic complexity. False data dependencies are eliminated by writing generated results temporarily to buffers, called the rename buffers (RB) instead of the referenced architectural registers (AR).

Layout of the rename buffers Scope of register renaming Rename rate Register renaming Layout of the register mapping 2. Design space of register renaming 2.1 Overview Type of rename buffers

Types of rename buffers Res. 2.2 Types of rename buffers AR RR Rename reg. file Ops. Reg. nrs. Ret.

AR FF Types of rename buffers Future file Ops. Reg. nrs. Res. Ret. 2.2 Types of rename buffers PowerPC 603 (1993) PowerPC 604 (1995) PowerPC 620 (1996) Power3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) AR RR Rename reg. file Ops. Reg. nrs. Ret.

AR FF Future file Ops. Reg. nrs. Res. Ret. Vali d Not valid Initialized Update if instruction is finished Invalidate by referring to the same register as destination The FF has as many entries as the AR and holds the most actual register values

AR, RR AR FF Merged arch. and rename register file Types of rename buffers Future file Ops. Reg. nrs. Res. Reg. nrs. Res. Ret. UltraSPARC III (1999) K7 (FX) (1999) K8 (FX) (2003) 2.2 Types of rename buffers PowerPC 603 (1993) PowerPC 604 (1995) PowerPC 620 (1996) Power3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) AR RR Rename reg. file Ops. Reg. nrs. Ret.

AR, RR Merged arch. and rename register file Ops. Reg. nrs. Res. Instruction is canceled Available not valid Instruction is completed Initialized RB, AR RB, valid Architectural register is reclaimed if this architectural register becomes renamed anew. Entry is allocated to a dispatched instruction Instruction is finished It needs a large number of physical registers. During completion no physical transfer is needed from the rename buffer to the referenced architetural register instead the former rename buffer changes its state and becomes the referenced architectural register.

AR, RR Power1 (1990) Power2 (1993) R10000 (1996) R12000 (1999) Alpha (1998) Pentium 4 (FP) (2000) K7 (FP) (1999) K8 (FP) (2003) AR FF ROB AR Merged arch. and rename register file Holding renamed values in the ROB Types of rename buffers Future file Ops. Reg. nrs. Ops. Res. Reg. nrs. Res. Ret. UltraSPARC III (1999) K7 (FX) (1999) K8 (FX) (2003) 2.2 Types of rename buffers PowerPC 603 (1993) PowerPC 604 (1995) PowerPC 620 (1996) Power3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) AR RR Rename reg. file Ops. Reg. nrs. Ret.

Allocated, valid Available Allocated, not valid Initialized if instruction is canceled Reclaim, Allocate, if instruction is dispatched is retired Reclaim, if instruction is finished Update, if instruction Res. Holding renamed values in the ROB ROB AR Ops. Reg. nrs. Ret. ROB entries are extended to hold results as well. During dispatching a new ROB entry with its result field is allocated to each dispatched instruction. (The result field serves as the allocated rename buffer).

AR, RR Power1 (1990) Power2 (1993) R10000 (1996) R12000 (1999) Alpha (1998) Pentium 4 (FP) (2000) K7 (FP) (1999) K8 (FP) (2003) K5 (1995) K6 (1997) Pentium Pro (1995) Pentium II (1997) Pentium III (1999) Pentium 4 (FX) (2000) Pentium M (2003) Core (2006) AR FF ROB AR Merged arch. and rename register file Holding renamed values in the ROB Types of rename buffers Future file Ops. Reg. nrs. Ops. Res. Reg. nrs. Res. Ret. UltraSPARC III (1999) K7 (FX) (1999) K8 (FX) (2003) 2.2 Types of rename buffers PowerPC 603 (1993) PowerPC 604 (1995) PowerPC 620 (1996) Power3 (1998) PA 8000 (1996) PA 8200 (1997) PA 8500 (1999) AR RR Rename reg. file Ops. Reg. nrs. Ret.

3. Operation of register renaming (1) The actual rename process depends on both the rename technique implemented and the underlying microarchitecture. Rename technique: using rename registers and mapping tables Assumptions:

Rename registers: Provide buffer space to temporarily hold instruction results Rename register file (RR) V During dispatching the Valid bit of the allocated rename register becomes invalidated (v  0) When the instruction becomes finished the result of the instruction is transferred to the allocated rename buffer entry and the Valid bit is set (V  1), to indicate that the corresponding value is available.

3. Operation of register renaming (1) The actual rename process depends on both the rename technique implemented and the underlying microarchitecture. Rename technique: using rename registers and mapping tables Assumptions:

A new entry is created while an instruction is dispatched by setting the „Entry valid” bit and writing the index of the allocated rename buffer („RB index”) to the entry that corresponds to the destination register of the dispatched instruction. A valid mapping is updated by writing a new „RB index” into it when the architectural register belonging to that entry is renamed again. An entry is invalidated when the instruction that actually belongs to that entry is retired. In this way the mapping table continuously holds the latest allocations. Mapping table: It includes an entry to each architectural register. Each entry has an „Entry valid” bit that indicates whether or not the corresponding architectural register is renamed and in case of a renaming it holds the index of the associated rename buffer (RB index). Entry valid RB index Mapping table Look-up for r "12" (RB index=12) 0 n-1

3. Operation of register renaming (1) The actual rename process depends on both the rename technique implemented and the underlying microarchitecture. Rename technique: using rename registers and mapping tables Underlying microarchitechture: in order dispatching dynamic instruction issue split FX and FP register files operand fetch policy both alteratives are discussed Assumptions:

3. Operation of register renaming (2) Considered part of the microarchitecture for both dispatch bound and issue bound operand fetching : it executes only FX-instructions, consists of an architectural register file (AR) and a single execution unit (EU).

Mapping table Architectural register file (AR) Rs1' Rs2' Update arch. rf. Op1 Op2 Rd' OC Rd, Rs1, Rs2 Decodedinstructions Update RR Update RS Result, Rd' OC, Rd', Op1, Op2 Rename register file (RR) OC Rd'Op1/Rs1' V1 Op2/Rs2' V2 EU Check valid bits Rs1, Rs2 V Bypassing Op1/Rs1' Op2/Rs2' Dispatch Issue Reservation station (RS) 3. Operation of register renaming (3) Figure 3.1: An FX-core assuming buffered issue and dispatch bound operand fetching Renaming destination and surce registers Fetching op.s if valid else tags When inst. retired updating the AR After instr. executed, updating RS, RR Issuing instr. when op.s ready

Mapping table Rename register file (RR) Architectural register file (AR) EU Result, Rd' Update RR Rs1', Rs2' Checking for availability of (Rs1'), (Rs2') Op1 Op2 OC, Rd' Decoded instructions OC Rd, Rs1, Rs2 OCRd’Rs1'Rs2' V Rd'Rs2' Rs1' Reservation station (RS) Bypassing Dispatch Issue 3. Operation of register renaming (4) Figure 3.2: An FX-core assuming buffered issue and issue bound operand fetching Renaming destination and source registers Dispatching instructions into the RS Issuing inst. when operands valid, fetching op.s Executing instr. updating RR when instr. finished Updating AR when inst. retires

Processor type/year of volume shipment Type of rename buffer Number of rename buffers Dispatch rate Width of the issue window Total number of rename buffers Reorder width FXFP(wdw)(nr)(nROB) RISC processors PowerPC 603 (1993)ren. reg. filena PowerPC 604 (1995)ren. reg. file PowerPC 620 (1996)ren. reg. file POWER3 (1998)ren. reg. file POWER4 (2001)merged *5 POWER5 (2004)merged *5 R10000 (1996) merged R12000 (1998)merged Alpha (1998)merged PA 8000 (1986)ren. reg. file PA 8200 (1987)ren. reg. file PA 8500 (1989)ren. reg. file PM1 (1996)merged Design parameters of register renaming (1) Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

Processor type/year of volume shipment Type of rename buffer Number of rename buffers Dispatch rate Width of the issue window Total number of rename buffers Reorder width FXFP(wdw)(nr)(nROB) CISC (x 86) processors Pentium Pro (1995)in the ROB Pentium II (1997)in the ROB Pentium III (1999)in the ROB Pentium 4 (2000) (Willamette) merged12832n.a Pentium 4 (2002) Northwoodmerged1283n.a.256?2*126? Pentium 4 (2004) Prescottmerged2563n.a.512?4*128? Pentium M (2003)in the ROB Core (2006)in the ROB K5 (1995)in the ROB164211(?)16 K6 (1996)in the ROB K7 (1999) in the ROB/ merged 72 n.a *3 K8 (2003) in the ROB/ merged *3 4. Design parameters of register renaming (2) Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

5. Implementation of renaming in superscalars 5.1 The chronology of introducing register renaming Figure 5.1: Chronology of introducing register renaming Source: Sima, D. „Register Renaming Techniques”, Computer Engineering Handbook, CRC PRESS 2006

5.2 The basic implementation schemes of register renaming Merged arch. and rename register file Holding renamed values in the ROB Types of rename buffers Future file Rename reg. file Dispatch bound Issue bound Dispatch bound Issue bound Dispatch bound Issue bound Dispatch bound Issue bound Types of ren.buffers Op. fet. poli. Proposals Examples Keller (75) Smith, Pleszkun, (85) Sohi,Vajapeyam (87) Johnson (87) PM1 (95) (SPARC 64) ES/9000 (92) POWER1 (90) POWER2 (93) Nx586 (94) R10000 (96) P2SC (96) R12000 (99) Pentium 4 (00) POWER4 (01) POWER5 (04) K7 (FP) (99) K8 (FP) (03) PowerPC 603 (93) PowerPC 604 (95) PowerPC 620 (96) POWER3 (98) PA 8000 (96) PA 8200 (97) PA 8500 (99) Pentium Pro (95) Pentium II (97) Pentium III (99) Pentium M (03) Core (06) K7 (FX) (99) K8 (FX) (03) Am29000 (95) K5 (95) Lightning* (91) K6* (97) UltraSPARC III (99)

6. Examples (1) Rename register file Source: Song, P. „IBM’s Power3 to Replace P2SC”, Microprocessor Report, Nov. 17, 1997 Figure 6.1: The microarchitecture of the POWER3

6. Examples (2) Future file Source: Horel, T. „UltraSPARC-III”, IEEE MICRO, May-June 99, pp WARF: Working and Architectural Register File (Future file) Figure 6.2: The microarchitecture of the UltraSPARC-III

6. Examples (3) Merged architectural and rename reg. Figure 6.3: The microarchitecture of the Alpha Source: Kessler, R.E. et al..„The Alpha Microprocessor Architecture”, h18002.www1.hp.com/alphaserver

6. Examples (4) Holding renamed values in the ROB Figure 6.4: The microarchitecture of the Core processor Source: Kanter, D., „Intel’s next Generation Microarchitecture Unveiled”, Real World Tech., 2006 March 9.