Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:

Slides:



Advertisements
Similar presentations
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
Advertisements

A scheme to overcome data hazards
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
COMP25212 Advanced Pipelining Out of Order Processors.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Expl. ILP & Dyn.Sched CSE 4711 How to improve (decrease) CPI Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 4: IT Students.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.
Scoreboarding Simulation Advanced Computer Architecture Linda Wills Animation by: Chris Lee.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
Instruction-Level Parallelism Dynamic Scheduling
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Images from Patterson-Hennessy Book Machines that introduced pipelining and instruction-level parallelism. Clockwise from top: IBM Stretch, IBM 360/91,
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
CS 352H: Computer Systems Architecture
Instruction Level Parallelism
/ Computer Architecture and Design
Morgan Kaufmann Publishers
Out of Order Processors
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
Single Clock Datapath With Control
Microprocessor Microarchitecture Dynamic Pipeline
Lecture 12 Reorder Buffers
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
Pipelining Multicycle, MIPS R4000, and More
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Computer Architecture
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Reduction of Data Hazards Stalls with Dynamic Scheduling
Adapted from the slides of Prof
September 20, 2000 Prof. John Kubiatowicz
Lecture 7 Dynamic Scheduling
Lecture 5: Pipeline Wrap-up, Static ILP
Conceptual execution on a processor which exploits ILP
Presentation transcript:

Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 1 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 2 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review ADD R3, R1, R2

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 3 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review ADD R2, R1, +5

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 4 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review LD R2, M[R1 + 5]

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 5 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review ST M[R1 – 4], R2

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 6 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review JMP + 3

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 7 / 50 MUXMUXMUXMUX PC 4 DataMemory Addr Data SignExtend 0 00MUXMUX1100MUXMUX111 Addr Data 1 11MUXMUX0011MUXMUX000 InstructionMemory Shift Left MUXMUX1100MUXMUX111 Adder Adder ALU RsRs RtRt Offset, Addr, Immediate RtRt RdRd Sel A Sel B Sel C Data A Data B Register File Data C CPU Operation Review JE R1, R2, + 3

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 8 / 50 Pipelining  Non Pipelined Process Fetch Instr.Get Operands Execute Store Result Instr. 1 Ʈ Mem Ʈ Reg Ʈ ALU Ʈ Mem PCPC PCPC Instr. Mem. Register File Data Mem. ALU

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 9 / 50 Pipelining  Non Pipelined Process Fetch Instr.Get Operands Execute Store Result Instr. 2 Ʈ Mem Ʈ Reg Ʈ ALU Ʈ Mem PCPC PCPC Instr. Mem. Register File Data Mem. ALU

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 10 / 50 Pipelining  Non Pipelined Process ●Clock Period = ●CPI (Clocks per Instruction) = Ʈ Mem Ʈ Reg Ʈ ALU Ʈ Mem PCPC PCPC Instr. Mem. Register File Data Mem. ALU

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 11 / 50 Instr. 1 Pipelining  Pipelined Process Fetch Instr.Get Operands Execute Store Result Instr. 2 Ʈ Mem Ʈ Reg Ʈ ALU Ʈ Mem PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult Instr. 3 Instr. 4 Instr. 5

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 12 / 50 Pipelining  Pipelined Process PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult TimeStage 1Stage 2Stage 3Stage 4 1Fetch Instr. 1 2Fetch Instr. 2Get Operands 1 3Fetch Instr. 3Get Operands 2Execute 1 4Fetch Instr. 4Get Operands 3Execute 2Store Result 1 5Fetch Instr. 5Get Operands 4Execute 3Store Result 2 6Fetch Instr. 6Get Operands 5Execute 4Store Result 3

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 13 / 50 Pipelining  Pipelined Process ●Clock Period = ●CPI = Ʈ Mem Ʈ Reg Ʈ ALU Ʈ Mem PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 14 / 50 Pipelining Hazards  Structural Hazards Hardware can’t support instruction combination at a certain time. Example: PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 15 / 50 Pipelining Hazards  Data Hazards One instruction has to wait for another to complete. Example: PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 16 / 50 Pipelining Hazards  Data Hazards One instruction has to wait for another to complete. PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 17 / 50 Pipelining Hazards  Data Hazards One instruction has to wait for another to complete. Forwarding: PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult ADDR3, R1, R2

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 18 / 50 Pipelining Hazards  Control Hazards Decision depends on the result of unfinished instruction. Example: PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult TimeStage 1Stage 2Stage 3Stage 4 1Fetch Instr. 24 2Fetch Instr. 28Get Operands 24 3 ? Get Operands 28Execute 24 4 ? Execute 28Store Result 24

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 19 / 50 Pipelining Hazards  Control Hazards Decision depends on the result of unfinished instruction. ●Stall ●Predict ●Delayed Branch PCPC PCPC Instr. Mem. Register File Data Mem. ALU IRIR IRIR X X Y Y ResultResult ResultResult

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 20 / 50 Multiple Issue  Multiple Instructions Execution (in single clock) ●CPI 1. ●Static / Dynamic ●Speculation Example:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 21 / 50 Static Multiple Issue  Compiler Assisted  Issue Packet ●Set of instructions issued in a given clock cycle. ●Simply, one large instruction with multiple operations.  Very Long Instruction Word (VLIW)

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 22 / 50 Single-Issue Datapath PCPC PCPC Instr. Mem. Data Mem. A L U IFIF IFIF Sign Extend Sel A Sel B Sel C Data A Data B Data C IDID IDID + + Shift Left 2 EXEX EXEX Exception Address ADDR Data STST STST Register File

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 23 / 50 Two-Issue Datapath PCPC PCPC Instr. Mem. Data Mem. A L U IFIF IFIF Sign Extend Sel A1 Sel B1 Sel C1 Data A1 Data B1 Data C1 IDID IDID + + EXEX EXEX Exception Address ADDR Data STST STST Sel A2 Sel B2 Data A2 Data B2 Sel C2 Data C2 Sign Extend

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 24 / 50 Two-Issue Datapath Example  Two 32-bit instructions ALU/JMP LD/ST  NOP Replacement Example: ALU/JMPLD/ST Loop:LD R1, M[R2] ADI R2, R2, – 4 ADD R1, R1, R3 JNE R2, 0, LoopST M[R2+4], R1

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 25 / 50 Single-Issue Datapath Example Loop:LD R1, M[R2] ADD R1, R1, R3 ST M[R2], R1 ADI R2, R2, – 4 JNE R2, 0, Loop ClockFetchDecodeExecuteMemoryStore 1LD R1, M[R2] ADD R1, R1, R3 PCPC PCPC Instr. Mem. Data Mem. A L U IFIF IFIF Sign Extend Sel A Sel B Sel C Data A Data B Data C IDID IDID + + EXEX EXEX Exception Address ADDR Data STST STST Register File

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 26 / 50 Single-Issue Datapath Example Loop:LD R1, M[R2] ADD R1, R1, R3 ST M[R2], R1 ADI R2, R2, – 4 JNE R2, 0, Loop ClockFetchDecodeExecuteMemoryStore 1LD 2R2 3R M[R2] 5ADDR ST 10ADI JNE Original Code:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 27 / 50 Single-Issue Datapath Example ClockFetchDecodeExecuteMemoryStore 1LD 2ADIR2 3R M[R2] 5ADDR ST 10JNE Loop:LD R1, M[R2] ADI R2, R2, – 4 ADD R1, R1, R3 ST M[R2+4], R1 JNE R2, 0, Loop Optimized Code:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 28 / 50 Two-Issue Datapath Example ALU/JMPLD/ST Loop:LD R1, M[R2] ADI R2, R2, – 4 ADD R1, R1, R3 JNE R2, 0, LoopST M[R2+4], R1

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 29 / 50 Two-Issue Datapath Example ALU/JMPLD/ST Loop:LD R1, M[R2] ADI R2, R2, – 4 ADD R1, R1, R3 JNE R2, 0, LoopST M[R2+4], R1 ClockFetchDecodeExecuteMemoryStore 1LD 2ADI 3 4 5ADD JNEST

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 30 / 50 Dynamic Multiple Issue  Compiler Assisted (to move dependencies apart)  Hardware Decided ●0, 1 or more instructions issued in a given clock cycle.  Superscalar Processors. ●Compiled code runs correctly independent of the issue rate or pipeline structure.

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 31 / 50 Dynamic Pipeline Scheduling  Extension to Dynamic Multiple Issue  Hardware Decided ●Choose which instruction to execute in a given clock cycle. ●Compiled code runs correctly independent of the issue rate or pipeline structure Example:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 32 / 50 Dynamic Pipeline Scheduling  Instruction Fetch, Decode & Issue Unit  Multiple Functional Units  Commit Unit Instruction Fetch & Decode Integer Functional Unit Floating Point Functional Unit Commit Unit Floating Point Functional Unit Integer Functional Unit Reservation Station

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 33 / 50 Dynamic Pipeline Scheduling  Out-of-Order (O-o-O) Execution An operand may be in a register, reorder buffer or yet to be produced by a functional unit.  In-Order Issue  In-Order Commit

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 34 / 50 Speculative Execution  Hardware-Based ●Branch Predictions ●Load Addresses  In-Order Commit ●Assures correctness in case of wrong prediction

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 35 / 50 Out-of-Order Scheduling Scoreboard  Scoreboarding (CDC 6600) Pipeline: ●IF ●IS ●RD ●EX ●WB Floating Point Multiply Register File Floating Point Multiply Floating Point Divide Floating Point Add Integer Unit Scoreboard

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 36 / 50 Out-of-Order Scheduling Scoreboard  Scoreboarding (CDC 6600) Pipeline: ●IF ●IS ●RD ●EX ●WB Instruction Issue:  If the functional unit is available.  If no other active instruction has the same destination register. 

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 37 / 50 Out-of-Order Scheduling Scoreboard  Scoreboarding (CDC 6600) Pipeline: ●IF ●IS ●RD ●EX ●WB Read Operands:  No previously issued instruction has my operand as its destination. 

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 38 / 50 Out-of-Order Scheduling Scoreboard  Scoreboarding (CDC 6600) Pipeline: ●IF ●IS ●RD ●EX ●WB Write Back Results:  Stalls instructions which write results to registers pending reads.  Example:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 39 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 40 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U. IFISRDEXWB Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1 F.P Mul2 F.P. Div F.P. Add Integer

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 41 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U.Add IFISRDEXWB Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1 F.P Mul2 F.P. Div F.P. Add+F3F14 Integer

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 42 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1×F5F1F2 F.P Mul2 F.P. Div F.P. Add+F3F14 Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U.AddAdd Mul 1 IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 43 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1×F5F1F2 F.P Mul2 F.P. Div÷F6F1F3Add F.P. Add Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U.AddAdd Mul 1 Div IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 44 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1 F.P Mul2 F.P. Div÷F6F1F3Add F.P. Add–F4F25 Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U.Add Mul 1 Div IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 45 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1×F1F5F2Mul F.P Mul2 F.P. Div÷F6F1F3Add F.P. Add–F4F25 Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U. Mul 1 AddDiv IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 46 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1×F1F5F2Mul F.P Mul2 F.P. Div F.P. Add Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U. Mul 1 AddDiv IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 47 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1×F1F5F2Mul F.P Mul2 F.P. Div F.P. Add Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U. Mul 1 IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 48 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1×F1F5F2Mul F.P Mul2 F.P. Div F.P. Add Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U. Mul 1 IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 49 / 50 Out-of-Order Scheduling Example Example: Instruction Status Instruction Issue Rd Opr Exec Wr Res ADDF3, F1, 4 SUBF4, F2, 5 MULF5, F1, F2 DIVF6, F1, F3 MULF1, F5, F2 Functional Unit Status Unit Busy Op FiFi FjFj FkFk QjQj QkQk RjRj RkRk F.P Mul1 F.P Mul2 F.P. Div F.P. Add Integer Register Result Status F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F.U. Mul 1 IFISRDEXWB

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. 50 / 50 Register Renaming  Tomasulo’s Algorithm (IBM 360/91) ●Architectural Registers ●Physical (Hardware) Registers ●Dynamic Remap Example:

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept. Chapter 4

Princess Sumaya University – Computer Arch. & Org (2) Computer Engineering Dept.  Exercise 4.12  Exercise 4.13  Exercise 4.14  Exercise 4.16  Exercise 4.17  Exercise 4.20  Exercise 4.21  Exercise 4.22  Exercise 4.23  Exercise 4.25  Exercise 4.28  Exercise 4.29  Exercise 4.30  Exercise 4.31  Exercise 4.32  Exercise 4.33  Exercise 4.35  Exercise 4.39