Winter 2002CSE 141 - Topic Branch Hazards in the Pipelined Processor.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

COMP381 by M. Hamdi 1 (Recap) Pipeline Hazards. COMP381 by M. Hamdi 2 I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11.
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Goal: Reduce the Penalty of Control Hazards
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Abstraction Question General purpose processors have an abstraction layer fixed at the ISA and have little control over the compilers or code run on the.
CS3350B Computer Architecture Winter 2015 Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions Marc Moreno Maza
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min. Latency: Time.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Computer Architecture and Design – ELEN 350 Part 8 [Some slides adapted from M. Irwin, D. Paterson. D. Garcia and others]
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
2/15/02CSE Data Hazzards Data Hazards in the Pipelined Implementation.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
1/24/ :00 PM 1 of 86 Pipelining Chapter 6. 1/24/ :00 PM 2 of 86 Overview of Pipelining Pipelining is an implementation technique in which.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 9: MIPS Pipeline.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Computer Organization CS224
Stalling delays the entire pipeline
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Chapter 4 The Processor Part 4
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Pipelining review.
The processor: Pipelining and Branching
Pipelining in more detail
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Control unit extension for data hazards
CSC3050 – Computer Architecture
Control unit extension for data hazards
Control unit extension for data hazards
Presentation transcript:

Winter 2002CSE Topic Branch Hazards in the Pipelined Processor

CSE Topic2 Dependences Data dependence: one instruction is dependent on another instruction to provide its operands. Control dependence (aka branch dependences): one instructions determines whether another gets executed or not. Control dependences are particularly critical with conditional branches. add $5, $3, $2 sub $6, $5, $2 beq $6, $7, somewhere and $9, $3, $1 data dependences control dependence

CSE Topic3 Branch Hazards Branch dependences can result in branch hazards (aka control hazards) when they are too close to be handled correctly in the pipeline.

CSE Topic4 When are branches resolved? Instruction Fetch Instruction DecodeExecute/ Address Calculation Memory AccessWrite Back Branch target address is put in PC during Mem stage. Correct instruction is fetched during branch’s WB stage.

CSE Topic5 Branch Hazards IMReg ALU DMReg IMReg ALU DM IMReg ALU DMRegIMReg ALU DMRegIMReg ALU DMReg CC1CC2CC3CC4CC5CC6CC7CC8 beq $2, $1, here here: lw... sub... lw... add... These instructions shouldn’t be executed! Finally, the right instruction

CSE Topic6 Dealing With Branch Hazards Software solution –insert no-ops (I don’t think any processors do this) Hardware solutions –stall until you know which direction branch goes –guess which direction, start executing chosen path (but be prepared to undo any mistakes!) static branch prediction: base guess on instruction type dynamic branch prediction: base guess on execution history –reduce the branch delay Software/hardware solution –delayed branch: Always execute instruction after branch. Compiler puts something useful (or a no-op) there.

CSE Topic7 Stalling for Branch Hazards beq $4, $0, there and $12, $2, $5 or... add... sw... IMReg DMReg IMReg IMReg DM IMReg DMReg IMReg DMReg CC1CC2CC3CC4CC5CC6CC7CC8 Bubble

CSE Topic8 Stalling for Branch Hazards All branches waste 3 cycles. –Seems wasteful, particularly when the branch isn’t taken. It’s better to guess whether branch will be taken –Easiest guess is “branch isn’t taken”

CSE Topic9 Assume Branch Not Taken beq $4, $0, there and $12, $2, $5 or... add... sw... IMReg DMReg IMReg IMReg DM IMReg DMReg IMReg DMReg CC1CC2CC3CC4CC5CC6CC7CC8 works pretty well when you’re right – no wasted cycles

CSE Topic10 Assume Branch Not Taken beq $4, $0, there and $12, $2, $5 or... add... there: sub $12, $4, $2 IMReg IMReg IM Reg IMReg DMReg CC1CC2CC3CC4CC5CC6CC7CC8 Flush same performance as stalling when you’re wrong Whew! none of these instruction have changed memory or registers.

CSE Topic11 Some other static strategies 1.Assume backwards branch is always taken, forward branch never is –“backwards” = negative displacement field –loops (which branch backwards) are usually executed multiple times. –“if-then-else” often takes the “then” (no branch) clause. 2.Compiler makes educated guess –sets “predict taken/not taken” bit in instruction

CSE Topic12 Reducing the Branch Delay it’s easy to reduce stall to 2-cycles

CSE Topic13 Reducing the Branch Delay it’s easy to reduce stall to 2-cycles

One-cycle branch misprediction penalty Target computation & equality check in ID phase. –This figure also shows flushing hardware.

CSE Topic15 Stalling for Branch Hazards with branching in ID stage beq $4, $0, there and $12, $2, $5 or... add... sw... IMReg DMReg IMReg IMReg DM IMReg DMReg IMReg DMReg CC1CC2CC3CC4CC5CC6CC7CC8 Bubble

CSE Topic16 Eliminating the Branch Stall There’s no rule that says we have to branch immediately. We could wait an extra instruction before branching. The original SPARC and MIPS processors used a branch delay slot to eliminate single- cycle stalls after branches. The instruction after a conditional branch is always executed in those machines, whether the branch is taken or not!

CSE Topic17 Branch Delay Slot beq $4, $0, there and $12, $2, $5 there: xor... add... sw... IMReg DMReg IMReg IMReg DM IMReg DMReg IMReg DMReg CC1CC2CC3CC4CC5CC6CC7CC8 Branch delay slot instruction (next instruction after a branch) is executed even if the branch is taken.

CSE Topic18 Filling the branch delay slot The branch delay slot is only useful if you can find something to put there. –Need earlier instruction that doesn’t affect the branch If you can’t find anything, you must put a nop to insure correctness. Worked well for early RISC machines. –Doesn’t help recent processors much –E.g. MIPS R10000, has a 5-cycle branch penalty, and executes 4 instructions per cycle. Meanwhile, delayed branch is a permanent part of the ISA.

CSE Topic19 Branch Prediction Static branch prediction isn’t good enough when mispredicted branches waste 10 or 20 instructions. Dynamic branch prediction keeps a brief history of what happened at each branch.

CSE Topic20 Branch Prediction program counter for (i=0;i<10;i++) {... }... add $i, $i, #1 beq $i, #10, loop This ‘1’ bit means, “the last time the program counter ended with 0100 and a beq instruction was seen, the branch was taken.” Hardware guesses it will be taken again Branch history table

CSE Topic21 Two-bit predictors are even better (Branch prediction is a hot research topic) This one means, “the last two branches at this location were not taken.” this state means, “the last two branches at this location were taken.”

CSE Topic22 Branch Hazards -- Key Points Branch (or control) hazards arise because we must fetch the next instruction before we know if we are branching or not. Branch hazards are detected in hardware. We can reduce the impact of branch hazards through: –computing branch target and testing early –branch delay slots –branch prediction – static or dynamic

CSE Topic23 Computer of the Day 1963: Seymour Cray’s CDC 6600 –First supercomputer. 10 MHz clock. (Individual transistors!) –Also first Register-Register (i.e. Load-Store) ISA machine. –10 multicycle functional units in the “Central” processor float + (4 cycle), 2 float x ’s (10 cyc), float divide (29 cyc), assorted boolean & integer units (most 3 cyc), branch (9 cyc) Unrelated instructions can be executed concurrently. –10 “Peripheral & Control” processors for I/O –60-bit words, 15-bit 3-address instructions (also has 30-bit inst’s) –60-bit general registers, plus 18-bit address & index regs –8 word instruction cache (no data cache) 28 or fewer instructions in loop for peak speed Programmer’s goal – provably optimal code

CSE Topic24 Quiz #2 You did well... Top quartile: 34 (out of 40) Median: 31.5 Third quartile: 27 I still grade on a curve... but average is about “B” Nobody got #6 right! –Yes, you can eliminate a MUX on the register write port –Yes, you need a MUX on the second register read port –But how do you set this MUX on the 2 nd cycle? If you choose “ rt ”, then you can’t execute R-type in 4 cycles. If you choose “ rd ”, then you can’t execute beq in 3 cycles. If you make it depend on instruction, you slow down control !!