Out-of-Order Machine State Instruction Sequence: Inorder State: Look-ahead State: Architectural State: R3  A R7  B R8  C R7  D R4  E R3  F R8  G.

Slides:



Advertisements
Similar presentations
Krste Asanovic Electrical Engineering and Computer Sciences
Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
Lecture 7: Register Renaming. 2 A: R1 = R2 + R3 B: R4 = R1 * R R1 R2 R3 R4 Read-After-Write A A B B
EECS 470 Lecture 8 RS/ROB examples True Physical Registers? Project.
2/28/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Alpha Microarchitecture Onur/Aditya 11/6/2001.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
© A. Moshovos (ECE, Toronto) ECE1773 – Spring 2002 ILP, cont. Maintaining Sequential Appearance –Precise Interrupts –RUU approach to OoO Scheduling.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
National & Kapodistrian University of Athens Dep.of Informatics & Telecommunications MSc. In Computer Systems Technology Advanced Computer Architecture.
March 11, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 14 - Advanced Superscalars Krste Asanovic Electrical Engineering.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Critical Path Analysis w x y z w x y z (2) Max{3,2} (3) (2) (3) (2) 10 cyc8 cyc.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture ­Facilitate parallel execution ­Scale well with advancing.
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
ECE/CS 552: Introduction to Superscalar Processors Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes partially based.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
1 Lecture 7: Speculative Execution and Recovery using Reorder Buffer Branch prediction and speculative execution, precise interrupt, reorder buffer.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.
CS 152 Computer Architecture and Engineering Lecture 15 - Out-of-Order Memory, Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
Dataflow Order Execution  Use data copying and/or hardware register renaming to eliminate WAR and WAW ­register name refers to a temporary value produced.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
March 1, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
ECE/CS 552: Introduction to Superscalar Processors and the MIPS R10000 © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill,
Lecture: Out-of-order Processors
Dynamic Scheduling Why go out of style?
CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction John Wawrzynek Electrical Engineering.
Smruti R. Sarangi IIT Delhi
PowerPC 604 Superscalar Microprocessor
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
Out of Order Processors
Dr. George Michelogiannakis EECS, University of California at Berkeley
Microprocessor Microarchitecture Dynamic Pipeline
Sequential Execution Semantics
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
ECE 721 Modern Superscalar Microarchitecture
Presentation transcript:

Out-of-Order Machine State Instruction Sequence: Inorder State: Look-ahead State: Architectural State: R3  A R7  B R8  C R7  D R4  E R3  F R8  G R3  H R3  A R8  C R7  D R4  E R3  F R8  G R3  H R7  D R4  E R8  G R3  H gray=dispatched but not yet executed instructions

Elements of Modern Micro-dataflow inorder out-of-order inorder

Metaflow Datapath ICache issue DRIS (Renaming + Reservation Stations + Reorder Buff.) Retire Scheduler Register File Branch Pred. Speculative State In-order State

Logical vs. Physical Organization: Metaflow Example Lock1RN1ID1 Source 1 Lock2RN2ID2 Source 2 latestRDData Destination Dispatched Fxn UnitExecuted Status PC Rename Registers (RAM) Reorder Buffer (RAM/FIFO) Reservation Stations (CAM) forward (RAM)(CAM) issue (CAM) inverse map table

Elements of Register Renaming  A pool of extra registers ­managed as temporary, single-assignment registers (eliminates WAW and WAR) ­logically separate from architectural registers named in ISA ­many physical organizations are possible  An allocation and mapping mechanism ­given a logical/architectural register name, where is its current definition (value, location, ready or not) ­given a logical/architectural destination register name, 1. how to find an unused rename register 2. how to establish a new mapping for later references ­when to reclaim a mapping? when to reclaim a register?

Register Map Table with Separate Architectural and Rename Registers databusytag logical register name ARFMap Table data RRF rdy next to free next to allocate Operand Value/Tag Issues: How large is RRF? How to manage RRF? How to recover the Map Table on br. mispredict. & exceptions? Pentium-Pro

How to recover the Rename Map  Metaflow DRIS: remembers everything about all outstanding instructions ­associative lookup always gives the right answer from the perspective of the youngest instruction  Map Table: rename history of a logical register is overwritten after each WAW Solutions: 1.Discard all speculative state and revert to default ARF mapping 2.Record the rename history ­add another field to RRF to remember the last rename register mapped by the same logical register ­map table can be rebuilt by tracing backwards sequentially in the active RRF region (recall RRF is “program-ordered”) 3.Take snapshots of the entire rename table at strategic times, i.e. when predicting branches What about exceptions?

Register Map Table with unified Physical Registers tag logical register name Map Table data PRF (holds both inorder & speculative state) rdy Operand Value/Tag Issues: How large is PRF? How to manage PRF? How to recover the Map Table on br. mispredict. & exceptions? R10000, P4

Allocation and De-allocation of PRF  When speculative register state is committed, no copying is required (in fact, not much happens)  PRF registers are not de/allocated in order Why?  Freelist - stores a pool of unused registers  When to recycle a physical register? i.e. when do we know a register is no longer referred to (by logical name or by physical name) tag logical register name Map Table data PRF rdy 241 Freelist (FIFO) recycle freed registers Is every PR referenced by some data structure? What if it is not?

How to recover the Rename Map  Map Table: rename history of a logical register is overwritten after each WAW Solutions: 1.Discard all speculative state and revert to default ARF mapping 2.Record the rename history ­add another field to RRF to remember the last rename register mapped to the same logical register ­map table can be rebuilt by tracing backwards, one at a time ­Rename history must be stored with instruction entries in ROB Also useful for reclaiming physical registers for reuse! 3.Take snapshots of the entire rename table at strategic times, i.e. when predicting branches What about exceptions?

Reservation Stations / Instruction Queues  A place where instructions can wait for dataflow resolutions ­unresolved operand values are represented by a unique tag (usually the name of the “renamed” operand register or the instruction to produce the operand value, sometimes synonymous) ­a completing instruction can forward is result to all reservations holding the right tags  Management policies ­single vs. multi buffers ­random access or FIFO ­age-prioritized ­value vs. tag only ­when to deallocate Reservation stations for Ld/St units are more complicated

Queue Organizations distributed queues reservation stations DRIS statically aligned (inorder) Int FP Mem

Dataflow Dependence & Scheduling OpARdy or = = = write dest1 write dest2 write dest3 OpBRdyOpCRdy and request grant To PRF read ports priority scheduler

Reorder Buffer  An program-ordered record of all outstanding (out-of- order) instructions ­delay update of in-order program state by instructions completing out-of-order ­allow inorder commit of in-order program state in-order reg file Reorder Buffer in-order updateout-of-order result operand lookup Don’t forget about PC and memory writes!!

backup 2 backup 1 backup 0 Tracking the Three States: Other Options  Keeping history to rewind and restart execution ­rewinds to special points (i.e. branch instructions) in the program by checkpointing ­rewinds to arbitrary points (incremental history and recovery) reg file Checkpoint Stack reg file write history (FIFO) out-of-order writeback operand lookupdiscard history of completed inst’s in FIFO order out-of-order writeback displaced reg name & values sequential restoration

MIPS R10000 circa 1996  4-way superscalar (fetch, decode, dispatch and complete)  5 execution pipelines (2 Int, FP Add, FP Mult, Ld/St)  Micro-dataflow instruction scheduling (16+16 instruction window)  Register renaming + memory renaming 64 physical integer registers to hold 33 logical registers + renamed registers  Speculative execution pass 4 unresolved branches  Precise Interrupts

Design Choices  Register Renaming ­map table lookup + dependency check on simultaneous dispatches ­unified physical register file ­4-deep branch stack to backup the map table on branch predictions ­sequential (4-at-a-time) back-tracking to recover from exceptions  Instruction Queues ­separate 16-entry floating point and integer instruction queues ­prioritized, dataflow-ordered scheduling  Reorder Buffer ­one per outstanding instruction, FIFO ordered ­stores PC, logical destination number, old physical destination number Why not current physical destination number?

MIPS R xinst decode map table pre-decoded I-cache 8x4 entries Active List (ROB) 16-entry int. Q (R.S.) ALU1ALU2 64-entry Int GPR 7R3W LD/ST 64-entry FPR 5R3W ALU1ALU2 16-entry FP. Q (R.S.) map table (16R4W)

Flow Path Model of Superscalars I-cache FETCH DECODE COMMIT D-cache Branch Predictor Instruction Buffer Store Queue Reorder Buffer Integer Floating-point Media Memory Instruction Register Data Memory Data Flow EXECUTE (ROB) Flow