Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.

Similar presentations


Presentation on theme: "CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics."— Presentation transcript:

1 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics

2 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst The 5 Cycles in MIPS MIPS steps: 1.Fetch the instruction from RAM 2.Decode and read the regs 3.Execute the operation or calculate the effective address 4.Read/write RAM; store the regs 5.Save a RAM read into regs Pipelining principle: multiple instructions are overlapped in execution

3 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Basic Pipelining History CDC 6600 –One of the first pipeline processors –Dates back to 1970 –Designed by Seymour Cray Most modern CPUs, even in PCs and embedded chips, now include pipelining.

4 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Performance Possibilities Consider 1000 instructions to be pipelined Single cycle machine / non-pipelined –CCT = 8 ns due to longest datapath –CPI = 1 but 8 ns per instruction –8 ns * 1000 = 8000 ns Multi-cycle machine / pipelined –CCT = 2 ns due to longest stage in datapath –5 stages  10 ns per instruction –8 ns + 2 ns * 1000 = 2008 ns Speedup = 8000 / 2008 = 3.98  4 To “fill” the pipeline

5 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipeline Performance A single instruction takes more (or the same amount of ) time A group / sequence of instructions takes less time Pipelining increases throughput rather than decreasing execution time for an individual instruction Design principle: –Good designs demand good compromises

6 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst CPI Revisited CPI = total # of cycles total # of instructions Hypothetically,the CPI of a pipelined processor is 1

7 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Hazards Limits to Pipelined Performance

8 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Roadblocks to Pipelining Structural hazards –Multiple instructions vying for a single shared resource –Ex: RAM, ALU –Instruction! Data hazards –Later instruction uses the result of an earlier instruction –Ex: lw followed by an add that uses the loaded data Control hazards –Fetch of a later instruction relies on the result of an earlier instruction to determine the correct control path –Ex: conditional branches that are taken

9 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Structural Hazards Suppose Princeton architecture – one RAM for both instructions and data Structural hazard  two instructions require RAM in the same cycle Need to use Harvard architecture to accommodate this 2222222 Lw FDEMW FDEMW FDEMW F

10 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Structural Hazards Which “instruction” is coming from the I-MEM in any given cycle? –Need to replicate it! Structural hazards can (usually) be removed by adding duplicate hardware

11 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst

12 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst More Structural Hazards Which “instruction” is coming from the I-MEM in any given cycle? –Need to replicate it! Structural hazards can (usually) be removed by adding duplicate hardware How do I read and write to the register file at the same time?!?

13 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Requirements Decode is performed in the second half of the D stage –D stage involves a read from the register file Write back is performed in the first half of the W stage –W stage involves a write to the register file Not actually how it is implemented (but the concept works) 1234567 lw1 6 100 FDEMW lw2 6 101 FDEMW lw3 6 102 FDEMW

14 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Control & Data Hazards Solutions

15 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Hazard # 2 - Data Hazards nand cannot read reg 1 until add has stored it Since read/write can occur in the same cycle, must stall 2 cycles here before nand can proceed 12345678 add 1 2 3 FDEMW nand 5 1 4 F--DEMW

16 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Solutions to Data Hazards Forwarding / bypassing –Data is forwarded, as soon is it available, from one stage to another –Forwarding occurs prior to the M/W stages Result of add is forwarded from E stage (output reg from ALU of add ) to the E stage of nand (back to the ALU again, but for nand this time) –reg 1 is not written until W stage, but its value is used earlier anyway 123456 add 1 2 3 FDEMW nand 5 1 4 FDEMW

17 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst More Forwarding / Bypassing 123456 add1 2 3 FDEMW sw1 6 100 FDEMW 123456 add1 2 3 FDEMW sw1 6 100 FDEMW

18 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Data Hazards: Load Stalls Cannot forward “back in time” – must permit a “load stall” to wait on the result of the load –Forwarding can’t solve everything (unfortunately) 123456 lw1 6 100 FDEMW add2 1 3 FDEMW 1234567 lw1 6 100 FDEMW add2 1 3 FD-EMW

19 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Test Yourself Consider the following instruction sequence: What are the forwarding paths required to correctly implement this sequence? Are there any forwarding paths that conflict? 1234567 add1 2 2 FDEMW add1 1 2 FDEMW add1 1 1 FDEMW

20 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Hazard # 3 - Control Hazards The lw instruction should only complete if the branch fails! 1234567 add4 5 6 FDEMW beq1 2 loop FDEMW lw3 0 300 FDEMW

21 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Control Hazards (2) Stalls are “bubbles” in the pipeline – no useful work is accomplished in a stall The multi-cycle machine “resolves” branches in the E stage –Branch resolution could be completed in the D stage if we pass rA and rB thru a special “subtractor” and bypass the A and B regs –Resolving branches in the D stage requires only a single cycle of stalling in the pipeline (vs 2 if we stick to branch resolution in E) 123456789 add4 5 6 FDEMW beq1 2 loop FDEMW lw 3 0 300 FD add 2 5 6 F next instruction  FDEMW

22 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Simple Solutions to Control Hazards What to do about control hazards: 1.Always stall –Resolve branches fast – in the D stage to reduce the stall to 1 cycle 2.Guess! (ok, “predict”) –Gamble on the most likely outcome of the branch test, and fetch the instruction that would be executed –If wrong  undo the fetch, and get the correct instruction – Ex : always predict branch failure, or always predict branch success

23 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Branch Prediction Example (1) Predict failure  if correct, this sequence proceeds without a stall Branch failure is equivalent to nop since the branch instruction does nothing 1234567 add4 5 6 FDEMW beq1 2 loop FDEMW lw3 0 300 FDEMW

24 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Branch Prediction Example (2) Predict failure  if incorrect, must clear out the incorrect lw instruction and refetch the correct next instruction instead –Results in a 1 cycle stall when using early resolution 12345678 add4 5 6 FDEMW beq1 2 loop FDEMW lw3 0 300 F---- correct instruction FDEMW

25 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst (Somewhat) Clever Solutions to Control Hazards 3.Dynamic branch prediction –Predict the next instruction based on the past history of the branch instruction –Requires a table of recent results of all branches encountered – “branch prediction table” Could predict branches with a 1 bit predictor model: –Save the result of a branch in a 1 bit buffer –The buffer is a table indexed by the low order bits of the address of the branch instruction If buffer contents = 0  predict branch not taken If buffer contents = 1  predict branch taken

26 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst 1 Bit Dynamic Branch Prediction for (int i = 0; i < 10; i++) { … } becomes lw1 0 ten add2 0 0 loop:beq2 1 exit … If using simple “not taken” prediction, we’re wrong 90% of the time! With a 1-bit predictor: –On first iteration, prediction is “not take branch”  incorrect –On last iteration, prediction is “take branch”  incorrect –2 mispredictions out of 10 tests  80% correct for 90% branch success

27 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst taken 2 Bit Dynamic Branch Prediction Could predict branches with a 2 bit predictor/corrector FSM: –(basically a 2-bit “saturating” adder) On the same example, we get 90% correct with 90% branch success Predict taken Predict not taken taken not taken taken not taken (weak) (strong!)

28 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Modern Branch Prediction Modern branch prediction is extremely important! –Long pipelines  huge branch penalties –We need to be right as much as possible. Because of the importance, modern predictors are also –Extremely complex (some mimic AI routines in hardware) –Take up a lot of space (lots of memories to store historical information)

29 CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Branch Prediction – Some Stats Predict: –Not taken – ~50-60% accurate –NT but backwards taken – ~ 65% accurate –Same as last time – ~ 80% accurate Actual Designs –Pentium – ~85% accurate –Pentium Pro – ~92% accurate Researched Designs –Papers have demonstrated over 96-98% accuracy


Download ppt "CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics."

Similar presentations


Ads by Google