Review of CS 203A Laxmi Narayan Bhuyan Lecture2.

Slides:

Advertisements

Similar presentations

CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Advertisements

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Instruction-level Parallelism Compiler Perspectives on Code Movement dependencies are a property of code, whether or not it is a HW hazard depends on.

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

A scheme to overcome data hazards

Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.

Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

COMP25212 Advanced Pipelining Out of Order Processors.

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.

Computer Architecture Lec 8 – Instruction Level Parallelism.

CMSC 611: Advanced Computer Architecture Scoreboard Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

EEL 5708 Speculation. Branch prediction. Superscalar processors. Lotzi Bölöni.

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

/ Computer Architecture and Design Instructor: Dr. Michael Geiger Summer 2014 Lecture 6: Speculation.

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.

1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.

Computer Architecture Lecture 18 Superscalar Processor and High Performance Computing.

1 Lecture 5 Branch Prediction (2.3) and Scoreboarding (A.7)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.

EECC551 - Shaaban #1 lec # 8 Winter Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to.

CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Nov. 9, Lecture 6: Dynamic Scheduling with Scoreboarding and Tomasulo Algorithm (Section 2.4)

1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.

1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.

1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

CIS 662 – Computer Architecture – Fall Class 11 – 10/12/04 1 Scoreboarding  The following four steps replace ID, EX and WB steps  ID: Issue –

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

COMP25212 Advanced Pipelining Out of Order Processors.

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Use of Pipelining to Achieve CPI < 1

Instruction-Level Parallelism and Its Dynamic Exploitation

CS 352H: Computer Systems Architecture

/ Computer Architecture and Design

/ Computer Architecture and Design

Out of Order Processors

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

CS203 – Advanced Computer Architecture

Lecture 12 Reorder Buffers

Advantages of Dynamic Scheduling

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

Out of Order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

CS 704 Advanced Computer Architecture

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Tomasulo Organization

Adapted from the slides of Prof

September 20, 2000 Prof. John Kubiatowicz

CSC3050 – Computer Architecture

Lecture 7 Dynamic Scheduling

Conceptual execution on a processor which exploits ILP

Presentation transcript:

Review of CS 203A Laxmi Narayan Bhuyan Lecture2

M Review CS 203A - Pipelining Load Instr 1 Instr 2 Instr 3 Instr 4 ALU M Reg M ALU M Reg M ALU M Reg M ALU Reg M ALU M Reg M Can’t read same memory twice in same clock cycle Structural Hazard I n s t r. O r d e r Time (clock cycles)

Other Hazards Data Hazards – Due to data dependencies Control Hazards – Due to branches

Getting CPI < 1: Issuing Multiple Instructions/Cycle Superscalar MIPS: 2 instructions, 1 FP & 1 anything – Fetch 64-bits/clock cycle; Int on left, FP on right – Can only issue 2nd instruction if 1st instruction issues – More ports for FP registers to do FP load & FP op in a pair TypePipeStages Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB Int. instructionIFIDEXMEMWB FP instructionIFIDEXMEMWB

MIPS R4000 Pipeline

Comparison of Issue Capabilities Courtesy of Susan Eggers; Used with Permission

VLIW and Superscalar sequential stream of long instruction words instructions scheduled statically by the compiler number of simultaneously issued instructions is fixed during compile-time instruction issue is less complicated than in a superscalar processor Disadvantage: VLIW processors cannot react on dynamic events, e.g. cache misses, with the same flexibility like superscalars. The number of instructions in a VLIW instruction word is usually fixed. Padding VLIW instructions with no-ops is needed in case the full issue bandwidth is not be met. This increases code size. More recent VLIW architectures use a denser code format which allows to remove the no-ops. VLIW is an architectural technique, whereas superscalar is a microarchitecture technique. VLIW processors take advantage of spatial parallelism.

Multithreading How can we guarantee no dependencies between instructions in a pipeline? –One way is to interleave execution of instructions from different program threads on same pipeline – Micro context switching Interleave 4 threads, T1-T4, on non-bypassed 5-stage pipe T1: LW r1, 0(r2) T2: ADD r7, r1, r4 T3: XORI r5, r4, #12 T4: SW 0(r7), r5 T1: LW r5, 12(r1)

HW Schemes: Instruction Parallelism Out-of-order execution divides ID stage: 1. Issue—decode instructions, check for structural hazards, Issue in order if the functional unit is free and no WAW. 2.Read operands (RO)—wait until no data hazards, then read operands  ADDD would stall at RO, and SUBD could proceed with no stalls. Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions. (WAR?) IFISSUE …ROEX 1 … EX m ROEX 1 …EX n …ROEX 1 …EX p WB? WB …

FP unit and load-store unit using Tomasulo’s alg.

Four Steps of Speculative Tomasulo Algorithm 1. Issue— get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”) 2.Execution— operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute; checks RAW (sometimes called “issue”) 3.Write result— finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit— update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer (sometimes called “graduation”)