CS203 – Advanced Computer Architecture

Slides:



Advertisements
Similar presentations
Tomasulo without Re-order Buffer Opcode Operand1 Operand2 Reservation station MUL1 RS MUL2RS Store1 Multiply unit 1 Mul unit 2 Store unit 1 RS Store2 Store.
Advertisements

CSE 502: Computer Architecture
CS6290 Speculation Recovery. Loose Ends Up to now: –Techniques for handling register dependencies Register renaming for WAR, WAW Tomasulo’s algorithm.
School of Engineering & Technology Computer Architecture Pipeline.
A. Moshovos ©ECE Fall ‘07 ECE Toronto Out-of-Order Execution Scheduling.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
A scheme to overcome data hazards
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.
CS6290 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
FPU structure. Assumptions (to shorten execution trace) – 2 instructions dispatched in order per cycle – execution begins in same cycle as dispatch –
Datorteknik F1 bild 1 Instruction Level Parallelism Scalar-processors –the model so far SuperScalar –multiple execution units in parallel VLIW –multiple.
Duke Compsci 220 / ECE 252 Advanced Computer Architecture I
ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.
Computer Architecture
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Instruction-Level Parallelism Dynamic Scheduling
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.
CS6461 – Computer Architecture Fall 2015 Morris Lancaster Adapted from Professor Stephen Kaisler’s Slides Lecture 8 Instruction level Parallelism (continued)
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –
COMP25212 Advanced Pipelining Out of Order Processors.
Code Example LD F6,34(R2) LD F2,45(R3) MULTI F0,F2,F4 SUBD F8,F6,F2
Tomasulo algorithm 윤진훈.
IBM System 360. Common architecture for a set of machines
/ Computer Architecture and Design
Tomasulo’s Algorithm Born of necessity
Out of Order Processors
Dynamic Scheduling and Speculation
Step by step for Tomasulo Scheme
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
CS203 – Advanced Computer Architecture
CS5100 Advanced Computer Architecture Hardware-Based Speculation
Lecture 6 Score Board And Tomasulo’s Algorithm
Lecture 10 Tomasulo’s Algorithm
Lecture 12 Reorder Buffers
Advantages of Dynamic Scheduling
High-level view Out-of-order pipeline
CMSC 611: Advanced Computer Architecture
A Dynamic Algorithm: Tomasulo’s
Out of Order Processors
CS203 – Advanced Computer Architecture
CS203 – Advanced Computer Architecture
ECE 2162 Reorder Buffer.
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Out-of-Order Execution Scheduling
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Advanced Computer Architecture
Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1
Static vs. dynamic scheduling
Tomasulo Algorithm Example
Tomasulo Organization
Reduction of Data Hazards Stalls with Dynamic Scheduling
Midterm 2 review Chapter
/ Computer Architecture and Design
John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
15-740/ Computer Architecture Lecture 10: Out-of-Order Execution
September 20, 2000 Prof. John Kubiatowicz
Loading… Please Wait $ $ $100 $100 $100 $100 $100 $200 $200 $200 $200 $300 $300 $300 $300 $300 $400 $400 $400 $400 $400 $500 $500 $500 $500 $500.
Lecture 7 Dynamic Scheduling
Tomasulo Speculative Example
Instruction Level Parallelism
Conceptual execution on a processor which exploits ILP
Presentation transcript:

CS203 – Advanced Computer Architecture Tomasulo Algorithm

Cycle 1: LD1 – Issue/GenAddr ADD F6, F8, F2 DIV F10, F0, F6 SUB F8, F6, F2 MLT F0, F2, F4 LD F2, 45(R3) LD F6, 34(R2)

Cycle 2: LD1 – LD Buffer LD2 – Issue/GenAddr ADD F6, F8, F2 DIV F10, F0, F6 SUB F8, F6, F2 MLT F0, F2, F4 LD F2, 45(R3) LD F6, 34(R2)

Cycle 3: LD1 – Loading LD2 – LD Buffer MUL – Issue/Wait on F2 ADD F6, F8, F2 DIV F10, F0, F6 SUB F8, F6, F2 MLT F0, F2, F4 LD F2, 45(R3) LD F6, 34(R2)

Cycle 4: LD1 – Broadcast CDB LD2 – Loading MUL – Wait on F2 SUB – Issue/Wait on F2 ADD F6, F8, F2 DIV F10, F0, F6 SUB F8, F6, F2 LD F2, 45(R3) MLT F0, F2, F4 LD F6, 34(R2)

Cycle 5: LD2 – Broadcast CDB MUL – F2 Ready/ Exec SUB – F2 Ready/ Exec DIV – Issue/ Wait on F0 ADD F6, F8, F2 DIV F10, F0, F6 MLT F0, F2, F4 SUB F8, F6, F2 LD F2, 45(R3)

Cycle 6: MUL –Exec SUB –Exec DIV – Wait on F0 Add – Issue/ Wait on F8 ADD F6, F8, F2 DIV F10, F0, F6 SUB F8, F6, F2 MLT F0, F2, F4

Cycle 7: MUL –Exec SUB – Exec Done DIV – Wait on F0 Add – Wait on F8 DIV F10, F0, F6 ADD F6, F8, F2 SUB F8, F6, F2 MLT F0, F2, F4

Cycle 8: MUL –Exec SUB – Broadcast CDB DIV – Wait on F0 Add – F8 Ready/ Exec DIV F10, F0, F6 ADD F6, F8, F2 SUB F8, F6, F2 MLT F0, F2, F4

Cycle 9: MUL –Exec DIV – Wait on F0 Add – Exec DIV F10, F0, F6 ADD F6, F8, F2 MLT F0, F2, F4

Cycle 10: MUL –Exec DIV – Wait on F0 Add – Exec Done DIV F10, F0, F6 ADD F6, F8, F2 MLT F0, F2, F4

Cycle 11: MUL –Exec DIV – Wait on F0 Add – Broadcast CDB Note: We don’t overwrite F6, because F6 value from LD1 was buffered in the RS at Issue (Cycle 5) DIV F10, F0, F6 ADD F6, F8, F2 MLT F0, F2, F4

Cycle 12-14: MUL –Exec DIV – Wait on F0 Cycle 15: MUL –Exec Done DIV F10, F0, F6 MLT F0, F2, F4

Cycle 16: MUL – Broadcast CDB DIV – F0 Ready/ Exec DIV F10, F0, F6 MLT F0, F2, F4

Cycle 17-55: DIV – Exec Cycle 56: DIV – Exec Done DIV F10, F0, F6

Cycle 57: DIV – Broadcast CDB DIV F10, F0, F6