Kosarev Nikolay MIPT Apr, 2010

Slides:

Advertisements

Similar presentations

UTCS CS352, S07 Lecture 10 1 Pipelining Cycle F Instruction RXMW FRXMW FRXMW FRXMW FRXM FRX

Advertisements

Out-of-Order Execution & Register Renaming

Asanovic/Devadas Spring Advanced Superscalar Architectures Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.

Tomasulo without Re-order Buffer Opcode Operand1 Operand2 Reservation station MUL1 RS MUL2RS Store1 Multiply unit 1 Mul unit 2 Store unit 1 RS Store2 Store.

Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.

CH14 Instruction Level Parallelism and Superscalar Processors

1 Review of Chapters 3 & 4 Copyright © 2012, Elsevier Inc. All rights reserved.

Instruction Level Parallelism

CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.

The IA-64 Architectural Innovations Hardware Support for Software Pipelining José Nelson Amaral 1.

SE-292 High Performance Computing

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 30, 2002 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Tomasulo’s.

2 x /10/2015 Know Your Facts!. 8 x /10/2015 Know Your Facts!

Topics Left Superscalar machines IA64 / EPIC architecture

EXAMPLE 3 DIV Unit is not Pipelined. So second instruction waits in ID stage although it is independent. DIV.D F0,F1,F2 IFID DIV1DIV1 DIV2DIV2 DIV3DIV3.

RAT R1 R2 R3 R4 R5 R6 R7 Fetch Q RS MOB ROB Execute Retire.

CPU performance CPU power consumption

Chapter 3 – Dynamic Scheduling

Instruction-Level Parallelism

Lecture 9 – OOO execution © Avi Mendelson, 5/ MAMAS – Computer Architecture Lecture 9 – Out Of Order (OOO) Dr. Avi Mendelson Some of the slides.

ILP: Software Approaches

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

A scheme to overcome data hazards

COMP4611 Tutorial 6 Instruction Level Parallelism

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

Tomasulo’s Approach and Hardware Based Speculation

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Data Dependencies A dependency type that can cause a stall.

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

CS203 – Advanced Computer Architecture ILP and Speculation.

Lecture: Out-of-order Processors

Dynamic Scheduling Why go out of style?

CSL718 : Superscalar Processors

/ Computer Architecture and Design

Out of Order Processors

CS203 – Advanced Computer Architecture

Single Clock Datapath With Control

Lecture 10 Tomasulo’s Algorithm

Lecture 12 Reorder Buffers

Advantages of Dynamic Scheduling

Lecture 6: Advanced Pipelines

Lecture 10: Out-of-order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Checking for issue/dispatch

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Instruction Level Parallelism (ILP)

CS152 Computer Architecture and Engineering Lecture 16 Compiler Optimizations (Cont) Dynamic Scheduling with Scoreboards.

Lecture 7 Dynamic Scheduling

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Kosarev Nikolay MIPT Apr, 2010 Dynamic scheduling Kosarev Nikolay MIPT Apr, 2010

Agenda In-order execution Out-of-order execution. Tomasulo’s algorithm Implementation in hardware Demo Hardware speculation

In-order execution Pipeline DIV R1 = R2, R3 DIV R1 = R2, R3 ADD R9 = R1, R4 SUB R8 = R4, R5 DIV R1 = R2, R3 ADD R1 = R2, R4 SUB R6 = R1, R5 (but code has no sense) Data hazards - RAW, WAW. No WAR.

Out-of-order execution Split ID into 2 stages: Issue - IS Decode, check for structural hazards Read operands - RO Wait until no data hazards, read operands Pipeline Out-of-order execution implies out-of-order completion (WB) Hazards – RAW, WAW, WAR DIV R0 = R2, R4 ADD R6 = R0, R8 SUB R8 = R10, R14 MUL R6 = R10, R8

Tomasulo’s algorithm How are data hazards avoided? RAW – wait for availability of operands WAR, WAW – register renaming (переименование регистров) DIV R0 = R2, R4 ADD R6 = R0, R8 ADD R9 = R6, R1 SUB R8 = R10, R14 MUL R6 = R10, R8 DIV R0 = R2, R4 ADD A = R0, R8 ADD R9 = A, R1 SUB B = R10, R14 MUL R6 = R10, B

Implementation in HW

Tomasulo's algorithm for dynamic scheduling Demo LD F6 = R2, 2 LD F2 = R3, 4 MUL F0 = F2, F4 SUB F8 = F2, F6 DIV F10 = F0, F6 ADD F6 = F8, F2 Tomasulo's algorithm for dynamic scheduling

Hardware speculation Based on 3 key ideas: Dynamic branch prediction Speculative execution Dynamic scheduling Extra stage: instruction commit New buffer: ROB (reorder buffer) Pipeline

Hardware speculation

Demo Reorder buffer