Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng

Slides:



Advertisements
Similar presentations
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Advertisements

CS152 Lec15.1 Advanced Topics in Pipelining Loop Unrolling Super scalar and VLIW Dynamic scheduling.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter Six 1.
Instruction-Level Parallelism (ILP)
Chapter 4 CSF 2009 The processor: Instruction-Level Parallelism.
Pipelined Processor II (cont’d) CPSC 321
Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Chapter Six Enhancing Performance with Pipelining
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
Part 5 – Superscalar & Dynamic Pipelining - An Extra Kicker! 5/5/04+ Three major directions that simple pipelines of chapter 6 have been extended If you.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
ECE200 – Computer Organization Chapter 6 – Enhancing Performance with Pipelining.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
PROCESSOR PIPELINING YASSER MOHAMMAD. SINGLE DATAPATH DESIGN.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Chapter Six.
CS 352H: Computer Systems Architecture
Single Clock Datapath With Control
Pipeline Implementation (4.6)
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
Chapter 4 The Processor Part 3
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Computer Architecture
Chapter Six.
Chapter Six.
Control unit extension for data hazards
Pipelining (II).
Control unit extension for data hazards
CSC3050 – Computer Architecture
Morgan Kaufmann Publishers The Processor
Control unit extension for data hazards
Systems Architecture II
Presentation transcript:

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng

Announcement Homework assignment #11, Due time – by April 8. Reading: Sections 6.8 Problems: 6.30 – 6.31 Project #3 is due on April 10, 2004 Final: Tuesday, May 4 th, 11:00-1:00PM Note: you must pass final to pass this course!

SW is In EX Stage ID/EX.MemWrite and MEM/WB.RegWrite and MEM/WB.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != ID/EX. RegisterRt and MEM/WB.RegisterRd != 0 Sign-Ext R-Type or lw sw R-Type ID/EX.MemWrite and EX/MEM.RegWrite and EX/MEM.RegisterRd = ID/EX.RegisterRt and EX/MEM.RegisterRd != 0

The Big Picture: Where are We Now? The Five Classic Components of a Computer Current Topics: Superscalar and Dynamic Pipeling Control Datapath Memory Processor Input Output

Is Faster Processor Possible? Potentially pipelining can provide CPI=1. Is it possible to design faster processor? Yes Superpipelining – longer pipelines Divide washer into 3 machines: wash, rinse, spin Superscaler – replicate the internal components of the computer so that it can launch multiple instructions per CC. Buy 3 washer, 3 dryer, etc. Dynamic pipelining – use hardware to avoid pipeline hazard Out of order execution is possible More complicated pipeline control and instruction execution model.

Issuing Multiple Instructions/Cycle Two main variations: Superscalar and VLIW Superscalar: varying no. instructions/cycle (1 to 6) Parallelism and dependencies determined/resolved by HW IBM PowerPC 604, Sun UltraSparc, DEC Alpha 21164, HP 7100 Very Long Instruction Words (VLIW): fixed number of instructions (16) parallelism determined by compiler Pipeline is exposed; compiler must schedule delays to get right result Explicit Parallel Instruction Computer (EPIC)/ Intel 128 bit packets containing 3 instructions (can execute sequentially) Can link 128 bit packets together to allow more parallelism Compiler determines parallelism, HW checks dependencies and forwards/stalls

Superscalar MIPS Assume two instructions are issued per clock cycle ALU operation or branch Memory access instructions ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB ALU or branch instructionIFIDEXMEMWB Load or store instructionIFIDEXMEMWB Instruction TypePipe stages

Additional Hardware Requirement Instructions be paired and aligned Extra ports in the register file – 2 instructions Separate adder for lw/sw address computation What will happen for load-use instructions?

Simple Superscalar Example How would this loop be scheduled on a superscalar pipeline for MIPS? Loop:lw$t0, 0($s1) addu$t0, $t0, $s2 sw$t0, 0($s1) addi$s1, $s1, -4 bne$s1, $zero, Loop Re-order the instructions to avoid as many pipeline stalls as possible Solution Hints: Figure out instructions with data dependencies – can not be out of order! Figure out load-use instructions requiring pipeline stalls Any performance (in CPI) improvement?

Loop Unrolling Purpose: To achieve more performance improvement from looping Idea: Schedule multiple copies of the loop body together The previous example: assume loop index is a multiple of 4 What is the performance improvement?

Dynamic Pipeline Scheduling The hardware performs the “scheduling” hardware tries to find instructions to execute out of order execution is possible speculative execution and dynamic branch prediction Basic Idea DPS tries to find later instructions to execute while waiting for a stall to be resolved Pipeline is divided into 3 major units: Instruction fetch and issue unit – IF, ID Execute unit – 5 to 10 independent functional units Commit unit – determine when to put the result back to register or memory In-order completion vs. out-of-order completion

Basic Idea

Summary All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction in parallel, 4 instructions per CC. PowerPC and Pentium/Itanium: branch history table, dynamic pipelining Compiler technology is important Dynamic pipelining combines with branch prediction is very challenging Commit unit should know how to “rollback” -- to discard instructions when prediction is wrong Dynamic execution is based on prediction: Hide memory latency Avoid stalls Execute instructions while waiting hazards to be resolved

Exercise 6.20 lw$2, 100($5)sw$2, 200($6) Do forwarding in which stage? How about hazard detection?

Forwarding Unit in EX Stage Mux 0101 Conditions?

Forwarding Unit in MEM Stage Is it possible? -- YES Steps: Change control unit s. t. RegDst is valid to select ID/EX.RegisterRt for sw instruction, even though sw does not require it Add multiplexer to the write port of data memory Conditions for the forwarding unit to generate the selector signal? RegDst RegisterRt Mux

Hazard Detection Conditions?

Questions?