CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
A scheme to overcome data hazards
COMP4611 Tutorial 6 Instruction Level Parallelism
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
COMP25212 Advanced Pipelining Out of Order Processors.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
CS 6461: Computer Architecture Instruction Level Parallelism
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
COMP4211 Seminar Intro to Instruction-Level Parallelism 04S1 Week 02 Oliver Diessel.
COMP4211 (Seminar) Intro to Instruction-Level Parallelism
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Cont. Computer Architecture.
Computer Architecture
Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
EENG449b/Savvides Lec /22/05 March 22, 2005 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
EENG449b/Savvides Lec /20/04 February 12, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
EECC551 - Shaaban #1 Fall 2002 lec# Floating Point/Multicycle Pipelining in MIPS Completion of MIPS EX stage floating point arithmetic operations.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
COMP381 by M. Hamdi 1 Pipelining (Dynamic Scheduling Through Hardware Schemes)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 5, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Scoreboarding)
EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
Pipelining and Exploiting Instruction-Level Parallelism (ILP)
Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
1 Lecture 5: Dependence Analysis and Superscalar Techniques Overview Instruction dependences, correctness, inst scheduling examples, renaming, speculation,
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
CSC 4250 Computer Architectures September 29, 2006 Appendix A. Pipelining.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –
Dataflow Order Execution  Use data copying and/or hardware register renaming to eliminate WAR and WAW ­register name refers to a temporary value produced.
COMP25212 Advanced Pipelining Out of Order Processors.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Sections 3.2 and 3.3 Dynamic Scheduling – Tomasulo’s Algorithm 吳俊興 高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.
Instruction-Level Parallelism and Its Dynamic Exploitation
IBM System 360. Common architecture for a set of machines
/ Computer Architecture and Design
Tomasulo’s Algorithm Born of necessity
Approaches to exploiting Instruction Level Parallelism (ILP)
Out of Order Processors
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Dynamic Pipeline
Chapter 3: ILP and Its Exploitation
High-level view Out-of-order pipeline
Out of Order Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Adapted from the slides of Prof
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Tomasulo Organization
Adapted from the slides of Prof
CSC3050 – Computer Architecture
Lecture 7 Dynamic Scheduling
Presentation transcript:

CSC 4250 Computer Architectures October 13, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation

CPI Equation Pipeline CPI=Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls

Three Types of Dependences 1. Data dependences (also called true data dependences) 2. Name dependences 3. Control dependences

Data Dependences An instruction j is data dependent on instruction i if either of the following holds:  Instruction i produces a result that may be used by instruction j, or  Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i.

Example of Data Dependences Example: Loop:L.DF0,0(R1);F0 = array element ADD.DF4,F0,F2;add scalar in F2 S.DF4,0(R1);store result DADDUIR1,R1,#−8;decrement pointer 8 bytes BNER1,R2,Loop;branch R1 != R2 The data dependences involve both FP data in F0 and F4, and integer data in R1

Name Dependences A name dependence occurs when two instructions use the same register or memory location, called a name, but there is no flow of data between the instructions associated with that name

Example of Name Dependences Code: Loop:L.DF0,0(R1);F0 = array element ADD.DF4,F0,F2;add scalar in F2 MUL.DF0,F4,F6;multiply by scalar in F6 SUB.DF4,F0,F8;subtract scalar in F8 S.DF4,0(R1);store result DADDUIR1,R1,#−8;decrement ptr 8 bytes BNER1,R2,Loop;branch R1 != R2 There are name dependences in F0 between ADD and MUL, in F4 between MUL and SUB, in F0 between Load and MUL, and in F4 between ADD and SUB.

Two Types of Name Dependences Instruction i precedes instruction j in program order:  An antidependence between instruction i and instruction j occurs when instruction j writes a register or memory location that instruction i reads. The original ordering must be preserved to ensure that i reads the correct value.  An output dependence occurs when instruction i and instruction j write the same register or memory location. The ordering between the instructions must be preserved to ensure that the value finally written corresponds to instruction j.

Example of Name Dependences Code: Loop:L.DF0,0(R1);F0 = array element ADD.DF4,F0,F2;add scalar in F2 MUL.DF0,F4,F6;multiply by scalar in F6 SUB.DF4,F0,F8;subtract scalar in F8 S.DF4,0(R1);store result DADDUIR1,R1,#−8;decrement ptr 8 bytes BNER1,R2,Loop;branch R1 != R2 Which are the antidependences? Which are the output dependences? Which are the true data dependences?

Register Renaming Since a name dependence is not a true dependence, instructions involved in a name dependence can execute simultaneously or be reordered, if the name (register or memory location) used in the instructions is changed so that the instructions do not conflict. This renaming is easily done for register operands ─ register renaming. IBM 360 computer family ─ Only four double-precision FP registers! F0, F2, F4, F6.

Pipeline Data Hazards A hazard is created whenever there is a dependence between instructions, and they are close enough that the overlap caused by pipelining would change the order of access to the operand involved in the dependence We must preserve the program order

Three Types of Data Hazards Instruction i precedes instruction j in program order: 1. RAW ─ j tries to read a source before i writes it, so j may incorrectly get the old value; this is the most common hazard and corresponds to a true data dependence. 2. WAW ─ j tries to write an operand before it is written by i, so operand may incorrectly end up with the value written by i; this hazard corresponds to an output dependence. 3. WAR ─ j tries to write a destination before it is read by i, so i may incorrectly get the new value; this hazard arises from an antidependence.

Example of Register Renaming (1) Code: DIV.DF0,F2,F4 ADD.DF6,F0,F S.DF6,0(R1) SUB.DF8,F10,F14 MUL.DF6,F10,F8 There is an antidependence between ADD.D and SUB.D, and an output dependence between ADD.D and MUL.D, leading to two possible hazards: a WAR hazard on the use of F8 by ADD.D and a WAW hazard since the ADD.D may finish later than the MUL.D. There are also three true data dependences: between DIV.D and ADD.D, between SUB.D and MUL.D, and between ADD.D and S.D.

Example of Register Renaming (2) Using two temporary registers S and T, the code can be rewritten without any name dependences: DIV.DF0,F2,F4 ADD.DS,F0,F8 S.DS,0(R1) SUB.DT,F10,F14 MUL.DF6,F10,T F6 in ADD.D is now S, eliminating the output dependence between ADD.D and MUL.D F8 in SUB.D is now T, eliminating the antidependence between ADD.D and SUB.D All subsequent uses of F8 must be replaced by T

Control Dependences A control dependence determines the ordering of an instruction with respect to a branch instruction so that the instruction is executed in correct program order and only when it should be. There are two constraints imposed by control dependences: 1. An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch. 2. An instruction that is not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch.

Violating Control Dependences Control dependence is not a critical property that must be preserved. We may be willing to execute instructions that should not have been executed, thereby violating the control dependences, if we can do so without affecting the correctness of the program. The two properties critical to program correctness ─ and normally preserved by maintaining both data and control dependences ─ are exception behavior and data flow.

Speculation Consider the code: DADDUR1,R2,R3 BEQZR12,skipnext DSUBUR4,R5,R6 DADDUR5,R4,R9 skipnext:ORR7,R8,R9 Suppose we know that the register destination R4 of DSUBU will be unused after the instruction named skipnext. If so, then changing the value of R4 just before the branch will not affect data flow since R4 will be dead (rather than live) in the code region after skipnext. Thus, if R4 were dead and DSUBU would not generate an exception, we could move DSUBU before the branch. This type of scheduling is called speculation, since the compiler is betting on the branch outcome; in this case, the bet is that the branch will not be taken.

Dynamic Scheduling using Tomasulo’s Algorithm CDC 6600─ 1964; Scoreboarding; 16 separate FUs. IBM 360/91 ─ double precision FP registers; One FP adder & one FP multiplier. Tomasulo invented scheme to reduce structural hazards (using reservation stations), RAW hazards (using tags), and WAW and WAR hazards (using register renaming).

Register Renaming in Tomasulo’s Scheme Register renaming is provided by the reservation stations, which buffer the operands of instructions waiting to issue, and by the issue logic. The basic idea is that a reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from a register. In addition, pending instructions designate the reservation station that will provide their input. Finally, when successive writes to a register overlap in execution, only the last one is actually used to update the register.

Reservation Stations Eliminate Hazards As instructions are issued, the register specifiers for pending operands are renamed to the names of the reservation station, which provide register renaming. Since there can be more reservation stations than registers, the technique can eliminate hazards arising from name dependences that cannot be eliminated by a compiler. We will see how renaming occurs and how it eliminates WAR and WAW hazards.

Two Properties of Hardware Hazard detection and execution control are distributed: The information held in the reservation stations at each FU determines when an instruction can begin execution at that unit. Results are passed directly to FUs from the reservation stations where they are buffered, rather than going through the registers. This bypassing is done with a common data bus (CDB) that allows all units waiting for an operand to be loaded simultaneously.

MIPS F-P Unit using Tomasulo’s Algorithm