Throughput = #instructions per unit time (seconds/cycles etc.)

Slides:



Advertisements
Similar presentations
COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Pipelining - Hazards.
Instruction-Level Parallelism (ILP)
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ECE 361 Computer Architecture Lecture 13: Designing a Pipeline Processor Start X:40.
Chapter 5 Pipelining and Hazards
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
DLX Instruction Format
Appendix A Pipelining: Basic and Intermediate Concepts
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
1 第三章 Instruction-Level Parallelism and Its Dynamic Exploitation 陈文智 浙江大学计算机学院 2011 年 09 月.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
CPE 731 Advanced Computer Architecture Pipelining Review Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of California,
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CPE 442 hazards.1 Introduction to Computer Architecture CpE 442 Designing a Pipeline Processor (lect. II)
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
10/11: Lecture Topics Execution cycle Introduction to pipelining
Lecture 18: Pipelining I.
Computer Organization
Lecture 15: Pipelining: Branching & Complications
CDA3101 Recitation Section 8
Review: Instruction Set Evolution
Pipelining: Hazards Ver. Jan 14, 2014
CDA 3101 Spring 2016 Introduction to Computer Organization
5 Steps of MIPS Datapath Figure A.2, Page A-8
Single Clock Datapath With Control
Appendix C Pipeline implementation
ECE232: Hardware Organization and Design
CDA 3101 Spring 2016 Introduction to Computer Organization
\course\cpeg323-08F\Topic6b-323
Appendix A - Pipelining
CpE 442 Designing a Pipeline Processor (lect. II)
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 3
CMSC 611: Advanced Computer Architecture
Pipelining review.
Pipelining in more detail
CSC 4250 Computer Architectures
\course\cpeg323-05F\Topic6b-323
Daxia Ge Friday February 9th, 2007
Instruction Execution Cycle
Electrical and Computer Engineering
Overview What are pipeline hazards? Types of hazards
Reducing pipeline hazards – three techniques
CMSC 611: Advanced Computer Architecture
Pipelining Hazards.
Presentation transcript:

Throughput = #instructions per unit time (seconds/cycles etc.) Throughput of an unpipelined machine 1/time per instruction Time per instruction = pipeline depth*time to execute a single stage. The time to execute a single stage can be rewritten as: Throughput of a pipelined machine 1/time to execute a single stage (assuming all stages take same time) Deriving the throughput equation for pipelined machine Unit time determined by units that are used to represent denominator Cycles  Instr/Cycles, seconds  Instr/second Time per instruction on unpipelined machine Pipeline depth Throughput = Time per instruction on unpipelined machine Depth of the pipeline

Physics of Clock Skew Basically caused because the clock edge reaches different parts of the chip at different times Capacitance-charge-discharge rates All wires, leads, transistors, etc. have capacitance Longer wire, larger capacitance Repeaters used to drive current, handle fan-out problems C is inversely proportional to rate-of-change of V Time to charge/discharge adds to delay Dominant problem in old integration densities. For a fixed C, rate-of-change of V is proportional to I Problem with this approach is power requirements go up Power dissipation becomes a problem. Speed-of-light propagation delays Dominates current integration densities as nowadays capacitances are much lower. But nowadays clock rates are much faster (even small delays will consume a large part of the clock cycle) Current day research  asynchronous chip designs

Return to pipelining Its Not That Easy for Computers Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) Control hazards: Pipelining of branches & other instructions that change the PC Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline

Speedup = average instruction time unpiplined average instruction time pipelined Remember that average instruction time = CPI*Clock Cycle And ideal CPI for pipelined machine is 1. 2

Structural Hazards Overlapped execution of instructions: Pipelining of functional units Duplication of resources Structural Hazard When the pipeline can not accommodate some combination of instructions Consequences Stall Increase of CPI from its ideal value (1)

Pipelining of Functional Units Fully pipelined M1 M2 M3 M4 M5 FP Multiply IF ID MEM WB EX Partially pipelined M1 M2 M3 M4 M5 FP Multiply IF ID MEM WB EX Not pipelined M1 M2 M3 M4 M5 FP Multiply IF ID MEM WB EX

To pipeline or Not to pipeline Elements to consider Effects of pipelining and duplicating units Increased costs Higher latency (pipeline register overhead) Frequency of structural hazard Example: unpipelined FP multiply unit in DLX Latency: 5 cycles Impact on mdljdp2 program? Frequency of FP instructions: 14% Depends on the distribution of FP multiplies Best case: uniform distribution Worst case: clustered, back-to-back multiplies

Resource Duplication Load Inst 1 Inst 2 Stall Inst 3 M Reg M Reg Reg M ALU Reg Inst 1 M Reg M ALU Inst 2 M Reg M Reg ALU Stall Inst 3 M Reg M Reg ALU

3

Three Generic Data Hazards InstrI followed by InstrJ Read After Write (RAW) InstrJ tries to read operand before InstrI writes it

Three Generic Data Hazards InstrI followed by InstrJ Write After Read (WAR) InstrJ tries to write operand before InstrI reads i Gets wrong operand Can’t happen in MIPS 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5

Three Generic Data Hazards InstrI followed by InstrJ Write After Write (WAW) InstrJ tries to write operand before InstrI writes it Leaves wrong result ( InstrI not InstrJ ) Can’t happen in DLX 5 stage pipeline because: All instructions take 5 stages, and Writes are always in stage 5 Will see WAR and WAW in later more complicated pipes

Examples in more complicated pipelines WAW - write after write WAR - write after read LW R1, 0(R2) IF ID EX M1 M2 WB ADD R1, R2, R3 IF ID EX WB SW 0(R1), R2 IF ID EX M1 M2 WB ADD R2, R3, R4 IF ID EX WB This is a problem if Register writes are during The first half of the cycle And reads during the Second half

Data Hazards IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg ADD R1, R2, R3 ALU IM Reg DM Reg SUB R4, R1, R5 ALU IM Reg DM Reg ALU AND R6, R1, R7 IM Reg DM Reg ALU OR R8, R1, R9 IM Reg DM XOR R10, R1, R11 ALU

Forwarding IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg IM ADD R1, R2, R3 ALU IM Reg DM Reg SUB R4, R1, R5 ALU IM Reg DM Reg ALU AND R6, R1, R7 IM Reg DM Reg ALU OR R8, R1, R9 IM Reg DM XOR R10, R1, R11 ALU

Stalls inspite of forwarding IM Reg DM Reg LW R1, 0(R2) ALU IM Reg DM Reg SUB R4, R1, R5 ALU IM Reg DM Reg ALU AND R6, R1, R7 IM Reg DM Reg ALU OR R8, R1, R9

Pipeline Interlocks IM Reg DM Reg IM Reg DM Reg Reg DM IM IM Reg LW R1, 0(R2) ALU IM Reg DM Reg SUB R4, R1, R5 ALU Reg DM IM ALU AND R6, R1, R7 IM Reg ALU OR R8, R1, R9 LW R1, 0(R2) IF ID EX MEM WB SUB R4, R1, R5 IF ID stall EX MEM WB AND R6, R1, R7 IF stall ID EX MEM WB OR R8, R1, R9 stall IF ID EX MEM WB

Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd

Effect of Software Scheduling LW Rb,b IF ID EX MEM WB LW Rc,c IF ID EX MEM WB ADD Ra,Rb,Rc IF ID EX MEM WB SW a,Ra IF ID EX MEM WB LW Re,e IF ID EX MEM WB LW Rf,f IF ID EX MEM WB SUB Rd,Re,Rf IF ID EX MEM WB SW d,Rd IF ID EX MEM WB LW Rb,b IF ID EX MEM WB LW Rc,c IF ID EX MEM WB LW Re,e IF ID EX MEM WB ADD Ra,Rb,Rc IF ID EX MEM WB LW Rf,f IF ID EX MEM WB SW a,Ra IF ID EX MEM WB SUB Rd,Re,Rf IF ID EX MEM WB SW d,Rd IF ID EX MEM WB

Compiler Scheduling Eliminates load interlocks Demands more registers Simple scheduling Basic block (sequential segment of code) Good for simple pipelines Percentage of loads that result in a stall FP: 13% Int: 25%

3