CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo.

Slides:



Advertisements
Similar presentations
Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.
Advertisements

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 5 CS252 Graduate Computer Architecture Spring 2014 Lecture 5: Out-of-Order Processing Krste Asanovic.
Computer Architecture: A Constructive Approach Six Stage Pipeline/Bypassing Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
A scheme to overcome data hazards
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
2/28/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 8 Instruction-Level Parallelism – Part 1 Benjamin Lee Electrical and Computer Engineering Duke.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
COMP25212 Advanced Pipelining Out of Order Processors.
CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
February 28, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Instruction-Level Parallelism (ILP)
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP II Steve Ko Computer Sciences and Engineering University at Buffalo.
February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste Asanovic Electrical Engineering.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache II Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Pipelining III Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Virtual Memory I Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache IV Steve Ko Computer Sciences and Engineering University at Buffalo.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP I Steve Ko Computer Sciences and Engineering University at Buffalo.
March 2, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer.
CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
March 4, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue and Register Renaming Krste Asanovic Electrical Engineering and Computer Sciences.
EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
CS 152 Computer Architecture and Engineering Lecture 12 - Complex Pipelines Krste Asanovic Electrical Engineering and Computer Sciences University of California.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
ENGS 116 Lecture 71 Scoreboarding Vincent H. Berk October 8, 2008 Reading for today: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS Reading for Monday:
CS 252 Graduate Computer Architecture Lecture 4: Instruction-Level Parallelism Krste Asanovic Electrical Engineering and Computer Sciences University of.
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
ECE 552 / CPS 550 Advanced Computer Architecture I Lecture 9 Instruction-Level Parallelism – Part 2 Benjamin Lee Electrical and Computer Engineering Duke.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
CET 520/ Gannod1 Section A.8 Dynamic Scheduling using a Scoreboard.
CS 152 Computer Architecture and Engineering Lecture 15 - Out-of-Order Memory, Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
Dataflow Order Execution  Use data copying and/or hardware register renaming to eliminate WAR and WAW ­register name refers to a temporary value produced.
COMP25212 Advanced Pipelining Out of Order Processors.
CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.
IBM System 360. Common architecture for a set of machines
CS 152 Computer Architecture and Engineering Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming John Wawrzynek Electrical Engineering.
Dynamic Scheduling Why go out of style?
CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction John Wawrzynek Electrical Engineering.
Lecture 10 - Complex Pipelines, Out-of-Order Issue, Register Renaming
Step by step for Tomasulo Scheme
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Dynamic Pipeline
High-level view Out-of-order pipeline
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
Electrical and Computer Engineering
CS 152 Computer Architecture and Engineering Lecture 13 - Out-of-Order Issue, Register Renaming, & Branch Prediction Krste Asanovic Electrical Engineering.
Krste Asanovic Electrical Engineering and Computer Sciences
Krste Asanovic Electrical Engineering and Computer Sciences
Krste Asanovic Electrical Engineering and Computer Sciences
Static vs. dynamic scheduling
Tomasulo Organization
Presentation transcript:

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Complex Pipelining II Steve Ko Computer Sciences and Engineering University at Buffalo

CSE 490/590, Spring Last time… Complex pipelining –Multiple functional units with variable access time Types of data hazards –RAW, WAR, WAW Dependency graph –How instructions are dependent on each other –Basis for out-of-order

CSE 490/590, Spring Complex Pipeline IF ID WB ALUMem Fadd Fmul Fdiv Issue GPR’s FPR’s Can we solve write hazards without equalizing all pipeline depths and without bypassing?

CSE 490/590, Spring When is it Safe to Issue an Instruction? Suppose a data structure keeps track of all the instructions in all the functional units The following checks need to be made before the Issue stage can dispatch an instruction Is the required function unit available? Is the input data available?  RAW? Is it safe to write the destination? WAR? WAW? Is there a structural conflict at the WB stage?

CSE 490/590, Spring Types of Data Hazards Consider executing a sequence of r k r i op r j type of instructions Data-dependence r 3  r 1 op r 2 Read-after-Write r 5  r 3 op r 4 (RAW) hazard Anti-dependence r 3  r 1 op r 2 Write-after-Read r 1  r 4 op r 5 (WAR) hazard Output-dependence r 3  r 1 op r 2 Write-after-Write r 3  r 6 op r 7 (WAW) hazard

CSE 490/590, Spring A Data Structure for Correct Issues Keeps track of the status of Functional Units The instruction i at the Issue stage consults this table FU available? check the busy column RAW?search the dest column for i’s sources WAR?search the source columns for i’s destination WAW?search the dest column for i’s destination An entry is added to the table if no hazard is detected; An entry is removed from the table after Write-Back NameBusyOpDestSrc1Src2 Int Mem Add1 Add2 Add3 Mult1 Mult2 Div

CSE 490/590, Spring Simplifying the Data Structure Assuming In-order Issue Suppose the instruction is not dispatched by the Issue stage if a RAW hazard exists or the required FU is busy, and that operands are latched by functional unit on issue: Can the dispatched instruction cause a WAR hazard ? WAW hazard ? NO: Operands read at issue YES: Out-of-order completion

CSE 490/590, Spring Simplifying the Data Structure... No WAR hazard  no need to keep src1 and src2 The Issue stage does not dispatch an instruction in case of a WAW hazard a register name can occur at most once in the dest column WP[reg#] : a bit-vector to record the registers for which writes are pending These bits are set to true by the Issue stage and set to false by the WB stage Each pipeline stage in the FU's must carry the dest field and a flag to indicate if it is valid “the (we, ws) pair”

CSE 490/590, Spring Scoreboard for In-order Issues Busy[FU#] : a bit-vector to indicate FU’s availability. (FU = Int, Add, Mult, Div) These bits are hardwired to FU's. WP[reg#] : a bit-vector to record the registers for which writes are pending. These bits are set to true by the Issue stage and set to false by the WB stage Issue checks the instruction (opcode dest src1 src2) against the scoreboard (Busy & WP) to dispatch FU available? RAW? WAR? WAW? Busy[FU#] WP[src1] or WP[src2] cannot arise WP[dest]

CSE 490/590, Spring Scoreboard Dynamics I 1 DIVDf6, f6,f4 I 2 LDf2,45(r3) I 3 MULTDf0,f2,f4 I 4 DIVDf8,f6,f2 I 5 SUBDf10,f0,f6 I 6 ADDDf6,f8,f2 Functional Unit Status Registers Reserved Int(1) Add(1) Mult(3) Div(4) WB for Writes t0 I 1 f6 f6 t1 I 2 f2 f6f6, f2 t2 f6 f2 f6, f2 I 2 t3 I 3 f0 f6 f6, f0 t4 f0 f6 f6, f0 I 1 t5 I 4 f0 f8 f0, f8 t6 f8 f0 f0, f8 I 3 t7 I 5 f10f8 f8, f10 t8 f8 f10 f8, f10 I 5 t9 f8 f8 I 4 t10 I 6 f6 f6 t11 f6 f6 I 6

CSE 490/590, Spring CSE 490/590 Administrivia Midterm on Friday, 3/4 Review on Wednesday, 3/2 Project 1 deadline: Friday, 3/11 Office hours this week: Wed after class until 2pm

CSE 490/590, Spring In-Order Issue Limitations: an example latency 1LDF2, 34(R2)1 2LDF4,45(R3)long 3MULTDF6,F4,F23 4SUBDF8,F2,F21 5DIVDF4,F2,F84 6ADDDF10,F6,F41 In-order: 1 (2,1) In-order restriction prevents instruction 4 from being dispatched

CSE 490/590, Spring Out-of-Order Issue Issue stage buffer holds multiple instructions waiting to issue. Decode adds next instruction to buffer if there is space and the instruction does not cause a WAR or WAW hazard. –Note: WAR possible again because issue is out-of-order (WAR not possible with in- order issue and latching of input operands at functional unit) Any instruction in buffer whose RAW hazards are satisfied can be issued (for now at most one dispatch per cycle). On a write back (WB), new instructions may get enabled. IFIDWB ALUMem Fadd Fmul Issue

CSE 490/590, Spring Issue Limitations: In-Order and Out-of-Order latency 1LDF2, 34(R2)1 2LDF4,45(R3)long 3MULTDF6,F4,F23 4SUBDF8,F2,F21 5DIVDF4,F2,F84 6ADDDF10,F6,F41 In-order: 1 (2,1) Out-of-order: 1 (2,1) Out-of-order execution did not allow any significant improvement!

CSE 490/590, Spring How many instructions can be in the pipeline? Which features of an ISA limit the number of instructions in the pipeline? Out-of-order dispatch by itself does not provide any significant performance improvement! Number of Registers

CSE 490/590, Spring Overcoming the Lack of Register Names Floating Point pipelines often cannot be kept filled with small number of registers. IBM 360 had only 4 floating-point registers Can a microarchitecture use more registers than specified by the ISA without loss of ISA compatibility ? Robert Tomasulo of IBM suggested an ingenious solution in 1967 using on-the-fly register renaming

CSE 490/590, Spring Instruction-level Parallelism via Renaming latency 1LDF2, 34(R2)1 2LDF4,45(R3)long 3MULTDF6,F4,F23 4SUBDF8,F2,F21 5DIVDF4’,F2,F84 6ADDDF10,F6,F4’1 In-order: 1 (2,1) Out-of-order: 1 (2,1) (3,5) X Any antidependence can be eliminated by renaming. (renaming  additional storage) Can it be done in hardware? yes!

CSE 490/590, Spring Acknowledgements These slides heavily contain material developed and copyright by –Krste Asanovic (MIT/UCB) –David Patterson (UCB) And also by: –Arvind (MIT) –Joel Emer (Intel/MIT) –James Hoe (CMU) –John Kubiatowicz (UCB) MIT material derived from course UCB material derived from course CS252