Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 152 L12 RAW Hazards, Interrrupts (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards,

Similar presentations


Presentation on theme: "CS 152 L12 RAW Hazards, Interrrupts (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards,"— Presentation transcript:

1 CS 152 L12 RAW Hazards, Interrrupts (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards, RAW/WAR/WAW 2004-10-07 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/

2 CS 152 L12 RAW Hazards, Interrrupts (2)Fall 2004 © UC Regents Pipelining Review What makes it easy –all instructions are the same length –just a few instruction formats –memory operands appear only in loads and stores Hazards limit performance –Structural: need more HW resources –Data: need forwarding, compiler scheduling Data hazards must be handled carefully MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load)

3 CS 152 L12 RAW Hazards, Interrrupts (3)Fall 2004 © UC Regents Outline Pipelined Control Control Hazards RAW, WAR, WAW Brainstorm on pipeline bugs

4 CS 152 L12 RAW Hazards, Interrrupts (4)Fall 2004 © UC Regents MIPS Pipeline Data / Control Paths A (fast) Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0 IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control 0 1 ALU cntrl RegWrite MemWriteMemRead MemtoReg RegDst ALUOp ALUSrc Branch PCSrc EX MEM WB

5 CS 152 L12 RAW Hazards, Interrrupts (5)Fall 2004 © UC Regents MIPS Pipeline Data / Control Paths (debug) Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0 IF/ID Sign Extend ID/EXEX/MEM MEM/WB 0 1 ALU cntrl RegWrite MemWriteMemRead MemtoReg RegDst ALUOp ALUSrc Branch PCSrc Control Instr Control Instr Control Instr EXMEMWB

6 CS 152 L12 RAW Hazards, Interrrupts (6)Fall 2004 © UC Regents MIPS Pipeline Control (pipelined debug) Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0 IF/ID Sign Extend ID/EXEX/MEM MEM/WB 0 1 ALU cntrl RegWrite MemWriteMemRead MemtoReg RegDst ALUOp ALUSrc Branch PCSrc MEM WB Instr Control EX Instr Control Instr Control

7 CS 152 L12 RAW Hazards, Interrrupts (7)Fall 2004 © UC Regents Control Hazards When the flow of instruction addresses is not what the pipeline expects; incurred by change of flow instructions –Conditional branches ( beq, bne ) –Unconditional branches ( j ) Possible solutions –Stall –Move decision point earlier in the pipeline –Predict –Delay decision (requires compiler support) Control hazards occur less frequently than data hazards; there is nothing as effective against control hazards as forwarding is for data hazards

8 CS 152 L12 RAW Hazards, Interrrupts (8)Fall 2004 © UC Regents Datapath Branch and Jump Hardware ID/EX Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Data Memory Address Write Data Read Data 1 0 IF/ID Sign Extend EX/MEM MEM/WB Control 0 1 ALU cntrl Forward Unit

9 CS 152 L12 RAW Hazards, Interrrupts (9)Fall 2004 © UC Regents Datapath Branch and Jump Hardware ID/EX Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Data Memory Address Write Data Read Data 1 0 IF/ID Sign Extend EX/MEM MEM/WB Control 0 1 ALU cntrl Forward Unit 0 1 Branch PCSrc Shift left 2 Add 0 1 Shift left 2 Jump PC+4[31-28]

10 CS 152 L12 RAW Hazards, Interrrupts (10)Fall 2004 © UC Regents Administrivia Finish Lab 3; meet with TA Friday Midterm Tue Oct 12 5:30 - 8:30 in 101 Morgan –Northwest corner of campus, near Arch and Hearst –Midterm review Sunday Oct 10, 7 PM, 306 Soda –Bring 1 page, handwritten notes, both sides –Nothing electronic: no calculators, cell phones, pagers, … –Meet at LaVal’s Northside afterwards for Pizza

11 CS 152 L12 RAW Hazards, Interrrupts (11)Fall 2004 © UC Regents stall Jumps Incur One Stall I n s t r. O r d e r j lw and ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Fortunately, jumps are very infrequent – only 2% of the SPECint instruction mix Jumps not decoded until ID, so one stall is needed

12 CS 152 L12 RAW Hazards, Interrrupts (12)Fall 2004 © UC Regents stall Review: Branches Incur Three Stalls I n s t r. O r d e r beq ALU IM Reg DMReg lw ALU IM Reg DMReg ALU and IM Reg DM Can fix branch hazard by waiting – stall – but affects throughput

13 CS 152 L12 RAW Hazards, Interrrupts (13)Fall 2004 © UC Regents Moving Branch Decisions Earlier in Pipe Move the branch decision hardware back to the EX stage –Reduces the number of stall cycles to two –Adds an and gate and a 2x1 mux to the EX timing path Add hardware to compute the branch target address and evaluate the branch decision to the ID stage –Reduces the number of stall cycles to one (like with jumps) –Computing branch target address can be done in parallel with RegFile read (done for all instructions – only used when needed) –Comparing the registers can’t be done until after RegFile read, so comparing and updating the PC adds a comparator, an and gate, and a 3x1 mux to the ID timing path –Need forwarding hardware in ID stage For longer pipelines, decision points are later in the pipeline, incurring more stalls, so we need a better solution

14 CS 152 L12 RAW Hazards, Interrrupts (14)Fall 2004 © UC Regents Early Branch Forwarding Issues Bypass of source operands from the EX/MEM if (IDcontrol.Branch and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd == IF/ID.RegisterRs)) ForwardC = 1 if (IDcontrol.Branch and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd == IF/ID.RegisterRt)) ForwardD = 1 Forwards the result from the second previous instr. to either input of the Compare MEM/WB dependency also needs to be forwarded If the instruction 2 before the branch is a load, then a stall will be required since the MEM stage memory access is occurring at the same time as the ID stage branch compare operation

15 CS 152 L12 RAW Hazards, Interrrupts (15)Fall 2004 © UC Regents Branch Prediction Resolve branch hazards by assuming a given outcome and proceeding without waiting to see the actual branch outcome 1.Predict not taken – always predict branches will not be taken, continue to fetch from the sequential instruction stream, only when branch is taken does the pipeline stall –If taken, flush instructions in the pipeline after the branch in IF, ID, and EX if branch logic in MEM – three stalls in IF if branch logic in ID – one stall –ensure that those flushed instructions haven’t changed machine state– automatic in the MIPS pipeline since machine state changing operations are at the tail end of the pipeline (MemWrite or RegWrite) –restart the pipeline at the branch destination

16 CS 152 L12 RAW Hazards, Interrrupts (16)Fall 2004 © UC Regents Flushing with Misprediction (Not Taken) 4 beq $1,$2,2 I n s t r. O r d e r ALU IM Reg DMReg ALU IM Reg DMReg 8 sub $4,$1,$5 To flush the IF stage instruction, add a IF.Flush control line that zeros the instruction field of the IF/ID pipeline register (transforming it into a noop )

17 CS 152 L12 RAW Hazards, Interrrupts (17)Fall 2004 © UC Regents flush Flushing with Misprediction (Not Taken) 4 beq $1,$2,2 I n s t r. O r d e r ALU IM Reg DMReg 16 and $6,$1,$7 20 or r8,$1,$9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg 8 sub $4,$1,$5 To flush the IF stage instruction, add a IF.Flush control line that zeros the instruction field of the IF/ID pipeline register (transforming it into a noop )

18 CS 152 L12 RAW Hazards, Interrrupts (18)Fall 2004 © UC Regents Branch Prediction, con’t Resolve branch hazards by statically assuming a given outcome and proceeding 2.Predict taken – always predict branches will be taken –Predict taken always incurs a stall (if branch destination hardware has been moved to the ID stage) As the branch penalty increases (for deeper pipelines), a simple static prediction scheme will hurt performance With more hardware, possible to try to predict branch behavior dynamically during program execution 3.Dynamic branch prediction – predict branches at run- time using run-time information

19 CS 152 L12 RAW Hazards, Interrrupts (19)Fall 2004 © UC Regents Dynamic Branch Prediction A branch prediction buffer (aka branch history table (BHT)) in the IF stage, addressed by the lower bits of the PC, contains a bit that tells whether the branch was taken the last time it was execute –Bit may predict incorrectly (may be from a different branch with the same low order PC bits, or may be a wrong prediction for this branch) but the doesn’t affect correctness, just performance –If the prediction is wrong, flush the incorrect instructions in pipeline, restart the pipeline with the right instructions, and invert the prediction bit The BHT predicts when a branch is taken, but does not tell where its taken to! –A branch target buffer (BTB) in the IF stage can cache the branch target address (or !even! the branch target instruction) so that a stall can be avoided

20 CS 152 L12 RAW Hazards, Interrrupts (20)Fall 2004 © UC Regents 1-bit Prediction Accuracy 1-bit predictor in loop is incorrect twice when not taken For 10 times through the loop we have a 80% prediction accuracy for a branch that is taken 90% of the time –Assume predict_bit = 0 to start (indicating branch not taken) and loop control is at the bottom of the loop code 1.First time through the loop, the predictor mispredicts the branch since the branch is taken back to the top of the loop; invert prediction bit (predict_bit = 1) 2.As long as branch is taken (looping), prediction is correct 3.Exiting the loop, the predictor again mispredicts the branch since this time the branch is not taken falling out of the loop; invert prediction bit (predict_bit = 0) Loop: 1 st loop instr 2 nd loop instr. last loop instr bne $1,$2,Loop fall out instr

21 CS 152 L12 RAW Hazards, Interrrupts (21)Fall 2004 © UC Regents 2-bit Predictors A 2-bit scheme can give 90% accuracy since a prediction must be wrong twice before the prediction bit is changed. Predict Taken Predict Not Taken Predict Taken Predict Not Taken Taken Not taken Taken Loop: 1 st loop instr 2 nd loop instr. last loop instr bne $1,$2,Loop fall out instr

22 CS 152 L12 RAW Hazards, Interrrupts (22)Fall 2004 © UC Regents 2-bit Predictors A 2-bit scheme can give 90% accuracy since a prediction must be wrong twice before the prediction bit is changed Predict Taken Predict Not Taken Predict Taken Predict Not Taken Taken Not taken Taken Loop: 1 st loop instr 2 nd loop instr. last loop instr bne $1,$2,Loop fall out instr wrong on loop fall out 0 1 1 right 9 times right on 1 st iteration 0

23 CS 152 L12 RAW Hazards, Interrrupts (23)Fall 2004 © UC Regents Delayed Decision First, move the branch decision hardware and target address calculation to the ID pipeline stage A delayed branch always executes the next sequential instruction – the branch takes effect after that next instruction –MIPS software moves an instruction to immediately after the branch that is not affected by the branch (a safe instruction) thereby hiding the branch delay As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot. –Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches –Growth in available transistors has made dynamic approaches relatively cheaper

24 CS 152 L12 RAW Hazards, Interrrupts (24)Fall 2004 © UC Regents Scheduling Branch Delay Slots A is the best choice, fills delay slot & reduces instruction count (IC) In B, the sub instruction may need to be copied, increasing IC In B and C, must be okay to execute sub when branch fails add $1,$2,$3 if $2=0 then delay slot A. From before branchB. From branch targetC. From fall through add $1,$2,$3 if $1=0 then delay slot add $1,$2,$3 if $1=0 then delay slot sub $4,$5,$6 becomes if $2=0 then add $1,$2,$3 if $1=0 then sub $4,$5,$6 add $1,$2,$3 if $1=0 then sub $4,$5,$6

25 CS 152 L12 RAW Hazards, Interrrupts (25)Fall 2004 © UC Regents Read After Write (RAW) Instr J tries to read operand before Instr I writes it Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. Forwarding handles many, but not all, RAW dependencies in 5 stage MIPS pipeline 3 Generic Data Hazards: RAW, WAR, WAW I: add r1,r2,r3 J: sub r4,r1,r3

26 CS 152 L12 RAW Hazards, Interrrupts (26)Fall 2004 © UC Regents Write After Read (WAR) Instr J writes operand before Instr I reads it Called an “anti-dependence” by compiler writers. This results from “reuse” of the name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Register Writes must be in stage 5 I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 3 Generic Data Hazards: RAW, WAR, WAW

27 CS 152 L12 RAW Hazards, Interrrupts (27)Fall 2004 © UC Regents Write After Write (WAW) Instr J writes operand before Instr I writes it. Called an “output dependence” by compiler writers This also results from the “reuse” of name “r1”. Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Register Writes must be in stage 5 Can see WAR and WAW in more complicated pipes I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 3 Generic Data Hazards: RAW, WAR, WAW

28 CS 152 L12 RAW Hazards, Interrrupts (28)Fall 2004 © UC Regents Supporting ID Stage Branches Read Address Instruction Memory PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr RegFile Read Data 1 ReadData 2 16 32 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0 IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control 0 1 ALU cntrl Branch PCSrc Forward Unit Hazard Unit 0 10 Compare Forward Unit Add IF.Flush 0

29 CS 152 L12 RAW Hazards, Interrrupts (29)Fall 2004 © UC Regents Brain storm on pipeline bugs Where are bugs likely to hide in a pipelined processor? 1. 2. … How can you write tests to uncover these likely bugs? 1. 2. … Once it passes a test, never need to run it again in the design process?

30 CS 152 L12 RAW Hazards, Interrrupts (30)Fall 2004 © UC Regents Brain storm on pipeline bugs Depending on branch solution (move to ID, delayed, static prediction, dynamic prediction), where are bugs likely to hide? 1. 2. … How can you write tests to uncover these likely bugs? 1. 2. … Once it passes a test, don’t need to run it again?

31 CS 152 L12 RAW Hazards, Interrrupts (31)Fall 2004 © UC Regents Peer Instruction Suppose we use with a 4 stage pipeline that combines memory access and write back stages for all instructions but load, stalling when there are structural hazards. Impact? 1. The branch delay slot is now 0 instructions 2. Most loads cause stall since often a structural hazard on reg. writes 3. Most stores cause stall since they have a structural hazard 4. Both 2 & 3: most loads&stores cause stall due to structural hazards 5. Most loads cause stall, but there is no load-use hazard anymore 6. Both 2 & 3, but there is no load-use hazard anymore 7. None of the above Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 1st add 2nd lw 3rd add IfetchReg/DecExecMem/Wr IfetchReg/DecExec IfetchReg/DecExecMem/Wr MemWr

32 CS 152 L12 RAW Hazards, Interrrupts (32)Fall 2004 © UC Regents Peer Instruction Suppose we use with a 4 stage pipeline that combines memory access and write back stages for all instructions but load, stalling when there are structural hazards. Impact? 1. The branch delay slot is now 0 instructions 2. Most loads cause stall since often a structural hazard on reg. writes 3. Most stores cause stall since they have a structural hazard 4. Both 2 & 3: most loads&stores cause stall due to structural hazards 5. Most loads cause stall, but there is no load-use hazard anymore 6. Both 2 & 3, but there is no load-use hazard anymore 7. None of the above Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7 1st add 2nd lw 3rd add IfetchReg/DecExecMem/Wr IfetchReg/DecExec IfetchReg/DecExecMem/Wr MemWr Q: Why not say every load stalls? A: Not all next instructions write in Wr stage

33 CS 152 L12 RAW Hazards, Interrrupts (33)Fall 2004 © UC Regents Summary: Designing a Pipelined Processor Go back and examine your data path and control diagram Associate resources with states –Be sure there are no structural hazards: one use / clock cycle Add pipeline registers between stages to balance clock cycle –Amdahl’s Law suggests splitting longest stage Resolve all data and control dependencies –If backwards in time in pipeline drawing to registers => data hazard: forward or stall to resolve them –If backwards in time in pipeline drawing to PC => control hazard: we’ll see next time 5 stage pipeline with reads early in same stage, writes later in same stage, avoids WAR/WAW hazards Assert control in appropriate stage Develop test instruction sequences likely to uncover pipeline bugs (If you don’t test it, it won’t work )


Download ppt "CS 152 L12 RAW Hazards, Interrrupts (1)Fall 2004 © UC Regents CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards,"

Similar presentations


Ads by Google