Computer Architecture

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advertisements

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

A scheme to overcome data hazards

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

COMP25212 Advanced Pipelining Out of Order Processors.

Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 7: Dynamic Scheduling and Branch Prediction * Jeremy R. Johnson Wed. Nov. 8, 2000.

CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.

Data Hazards RAW Hazard ADD.D F3, F1, F2 SUB.D F5, F6, F3 No Solution, normal property of programs WAW Hazard DIV.D F3, F1, F2 SUB.D F3, F6, F5 This instruction.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)

1 Recap (Scoreboarding). 2 Dynamic Scheduling Dynamic Scheduling by Hardware – – Allow Out-of-order execution, Out-of-order completion – – Even though.

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)

CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Computer Architecture

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

Out-of-order execution: Scoreboarding and Tomasulo Week 2

1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,

2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.

CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.

Chapter 3 Instruction Level Parallelism Dr. Eng. Amr T. Abdel-Hamid Elect 707 Spring 2011 Computer Applications Text book slides: Computer Architec ture:

MS108 Computer System I Lecture 6 Scoreboarding Prof. Xiaoyao Liang 2015/4/3 1.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –

COMP25212 Advanced Pipelining Out of Order Processors.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

Code Example LD F6,34(R2) LD F2,45(R3) MULTI F0,F2,F4 SUBD F8,F6,F2

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

/ Computer Architecture and Design

Tomasulo’s Algorithm Born of necessity

Approaches to exploiting Instruction Level Parallelism (ILP)

Out of Order Processors

Dynamic Scheduling and Speculation

Step by step for Tomasulo Scheme

CS203 – Advanced Computer Architecture

Lecture 6 Score Board And Tomasulo’s Algorithm

Lecture 12 Reorder Buffers

Chapter 3: ILP and Its Exploitation

Advantages of Dynamic Scheduling

High-level view Out-of-order pipeline

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

Out of Order Processors

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

CS 704 Advanced Computer Architecture

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Advanced Computer Architecture

September 20, 2000 Prof. John Kubiatowicz

Tomasulo Organization

Reduction of Data Hazards Stalls with Dynamic Scheduling

CS5100 Advanced Computer Architecture Dynamic Scheduling

Adapted from the slides of Prof

CS252 Graduate Computer Architecture Lecture 6 Tomasulo, Implicit Register Renaming, Loop-Level Parallelism Extraction Explicit Register Renaming February.

/ Computer Architecture and Design

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

September 20, 2000 Prof. John Kubiatowicz

Lecture 7 Dynamic Scheduling

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Computer Architecture CS423: Lecture 12 Dynamic Scheduling Jahangir Ikram

COMPARISON FP PIPELINE VS SCOREBOARD

Revision

Multiple Cycle Floating Point Pipeline EX Mem WB IF ID A 1 2 3 4 M .. 7 Divide Function Unit Latency Initiation /Re-Issue Interval Integer ALU 1 Load/Store FP Add 3 FP/Int Multiply 6 FP/Int Divide 24 25

Scoreboard of CDC 6600 Read Operands EX Mem Read Operands EX WB ISSUE Register File Scoreboard of CDC 6600 Read Operands EX Mem Read Operands EX WB ISSUE Read Operands A 1 A 2 A 3 A 4 Register File Read Operands M 1 M 2 .. M 7 Read Operands Divide Check for WAW, FU Check for RAW, Read Values from Register File when free Check for WAR

IS RO EX WR Ask students to fill this and compare with 2 slides before L.D F0,0(R2) L.D F4,0(R3) MUL.D F0,F0,F4 ADD.D F2,F0,F2 DADDUI R3,R3,8 DAADUI R3,R3,8 DSUBU R5,R4,R2 BNEZ R5, Loop

IS RO EX WR L.D F0,0(R2) L.D F4,0(R3) MUL.D F0,F0,F4 ADD.D F2,F0,F2 DADDUI R3,R3,8 DAADUI R3,R3,8 DSUBU R5,R4,R2 BNEZ R5, Loop

Data Hazards RAW Hazard WAW Hazard WAR Hazard ADD.D F3, F1, F2 SUB.D F5, F6, F3 WAW Hazard DIV.D F3, F1, F2 SUB.D F3, F6, F5 WAR Hazard DIV.D F3, F1, F2 SUB.D F5, F6, F3 ADD.D F3, F6, F7

TRUE and False Dependencies Find Dependencies in this code DIV.D F0,F2,F4 ADD.D F6,F0,F8 S.D F6,0(R1) SUB.D F8,F10,F14 MUL.D F6,F10,F8

Type B/W RAW 1,2 F0 2,3 F6 4,5 F8 WAW 2,5 WAR 2,4 Struc ADDER WAR and WAW Data Dependencies Type B/W Register/FU RAW 1,2 F0 2,3 F6 4,5 F8 WAW 2,5 WAR 2,4 Struc ADDER DIV.D F0,F2,F4 ADD.D F6,F0,F8 S.D F6,0(R1) SUB.D F8,F10,F14 MUL.D F6,F10,F8

Name Dependencies WAW and WAR dependencies are also called name dependencies: they do not carry a value between two instructions Can be removed by avoiding use of the same name: rename the destination register whenever a new value is created Both compiler (statically) and processor (dynamically) can do that

Register Renaming: Compiler DIV.D F0,F2,F4 ADD.D F6,F0,F8 S.D F6,0(R1) SUB.D F20,F10,F14 MUL.D F21,F10,F20 Only RAW or struc. hazards left

Dynamic Register Renaming Use some architecture invisible registers for renaming, called rename registers to avoid WAW. Read and keep a copy of available operands at the time of issue, this will avoid WAR. The values are stored in reservation station.

Tomasulo’s Algorithm Tag FP ADD Wait for Operands Wait for Operands EX LD/ST Wait for Operands Wait for Operands EX TAC Mem Access Mem Access DATA Register FILE Tag Wait for Operands Wait for Operands Wait for Operands EX Integer Integer CDB CDB ISSUE/ Rename to RS ISSUE/ Rename to RS FP ADD FP Wait for Operands Wait for Operands Wait for Operands A 1 A 1 A 2 A 2 A 3 A 3 A 4 A 4 Register FILE Check for RS Wait for Operands Wait for Operands Wait for Operands M 1 M 1 M 2 M 2 .. .. M 7 M 7 Wait for Operands Wait for Operands Wait for Operands Divide Divide Check for RAW

MIPS FP Unit Using Tomasulo’s Algorithm From Instruction Unit FP registers Instruction Queue Load / Store Unit FP Operations Operand Busses Address unit Stop Buffers Reservation Stations Address Data FP multipliers Memory unit FP Adders Common Data Bus (CDB)

Structure of Reservation Station Qj,Qk: Like scoreboard Vj, Vk: Contains values of two operands. Value are valid if Qj and Qk is zero Busy OpCode A: For Target address TA or Imm value Registers have Qi field as before

Tomasulo’s Example Write j k complete Result F6 R2 1 3 4 F2 R3 2 5 F0 Instruction status Execution Write instruction j k issue complete Result L.D F6 34+ R2 1 3 4 F2 45+ R3 2 5 MUL.D F0 F4 15 16 SUB.D F8 7 8 DIV.D F10 55 57 ADD.D 6 10 11