Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CSE 45432 SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.

Similar presentations


Presentation on theme: "1 CSE 45432 SUNY New Paltz Chapter Six Enhancing Performance with Pipelining."— Presentation transcript:

1 1 CSE 45432 SUNY New Paltz Chapter Six Enhancing Performance with Pipelining

2 2 CSE 45432 SUNY New Paltz Sequential Laundry Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? 30 TaskOrderTaskOrder B C D A Time 30 6 PM 7 8 9 10 11 12 1 2 AM

3 3 CSE 45432 SUNY New Paltz Pipelined Laundry Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Stall for dependencies 6 PM 789 Time B C D A 30 TaskOrderTaskOrder

4 4 CSE 45432 SUNY New Paltz Single Stage VS. Pipeline Performance Ideal speedup is number of stages in the pipeline.

5 5 CSE 45432 SUNY New Paltz Pipelining What makes it easy in MIPS –all instructions are the same length –just a few instruction formats –memory operands appear only in loads and stores What makes it hard? –structural hazards: suppose we had only one memory –control hazards: need to worry about branch instructions –data hazards: an instruction depends on a previous instruction We’ll build a simple pipeline and look at these issues

6 6 CSE 45432 SUNY New Paltz The Five Stages of Load Ifetch: Instruction Fetch –Fetch the instruction from the Instruction Memory Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IfetchReg/DecExecMemWrLoad

7 7 CSE 45432 SUNY New Paltz Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IfetchRegExecMemWr Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 LoadIfetchRegExecMemWr IfetchRegExecMem LoadStore Pipeline Implementation: IfetchRegExecMemWrStore Clk Single Cycle Implementation: LoadStoreWaste Ifetch R-type IfetchRegExecMemWrR-type Cycle 1Cycle 2

8 8 CSE 45432 SUNY New Paltz Basic Idea xecute/ address calculation MEM: Memory accessWB: Write back

9 9 CSE 45432 SUNY New Paltz Pipelined Datapath Walk through lw instruction Walk through sw instruction The design

10 10 CSE 45432 SUNY New Paltz Corrected Datapath

11 11 CSE 45432 SUNY New Paltz Graphically Representing Pipelines Can help with answering questions like: –how many cycles does it take to execute this code? –what is the ALU doing during cycle 4? –use this representation to help understand datapaths

12 12 CSE 45432 SUNY New Paltz Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg ALU Im Reg DmReg

13 13 CSE 45432 SUNY New Paltz Pipeline Control

14 14 CSE 45432 SUNY New Paltz Pass control signals along just like the data Pipeline Control

15 15 CSE 45432 SUNY New Paltz Datapath with Control

16 16 CSE 45432 SUNY New Paltz Designing a Pipelined Processor Go back and examine your datapath and control diagram associated resources with states ensure that flows do not conflict, or figure out how to resolve assert control in appropriate stage

17 17 CSE 45432 SUNY New Paltz Can pipelining get us into trouble? Yes: Pipeline Hazards –structural hazards: attempt to use the same resource two different ways at the same time –data hazards: attempt to use item before it is ready instruction depends on result of prior instruction still in the pipeline –control hazards: attempt to make a decision before condition is evaluated branch instructions Can always resolve hazards by waiting –pipeline control must detect the hazard –take action (or delay action) to resolve hazards

18 18 CSE 45432 SUNY New Paltz Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards Data Hazards r $2: DM Reg Reg Reg Reg DM

19 19 CSE 45432 SUNY New Paltz Have compiler guarantee no hazards Where should compiler insert “nop” instructions? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: –It happens too often to rely on compiler –It really slows us down! Software Solution

20 20 CSE 45432 SUNY New Paltz Use temporary results, don’t wait for them to be written Also, write register file during 1st half of clock and read during 2nd half Data Hazard Solution: Forwarding what if this $2 was $13? register $2 : DMReg Reg Reg Reg XXX–20XXXXXValue of EX/MEM : XXXX–20XXXXValue of MEM/WB : DM

21 21 CSE 45432 SUNY New Paltz Steer the result from precious instruction to the ALU EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd = 0) and (EX /MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd = 0) and (EX /MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd = 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd = 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Hazard Conditions

22 22 CSE 45432 SUNY New Paltz Forwarding it IF/ID I n s t r u c t i o n M u x Rd EX/MEM.RegisterRd MEM/WB.RegisterRd Rt Rt Rs IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs 00 Register file 01 Mem. or earlier ALU 10 Prior ALU

23 23 CSE 45432 SUNY New Paltz lw can still cause a hazard: –an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the load instruction Can't always forward

24 24 CSE 45432 SUNY New Paltz Stalling We can stall the pipeline by keeping an instruction in the same stage Repeat in clock cycle 4 what they did in clock cycle 3

25 25 CSE 45432 SUNY New Paltz Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward controls writing of the PC and IF/ID plus MUX

26 26 CSE 45432 SUNY New Paltz When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” –need to add hardware for flushing instructions if we are wrong Branch Hazards Reg

27 27 CSE 45432 SUNY New Paltz Flushing Instructions Reduce branch delay

28 28 CSE 45432 SUNY New Paltz Improving Performance Superpipelining: ideal maximum speedup is related to number of stages Superscalar: start more than one instruction in the same cycle Dynamic pipeline scheduling –Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1)

29 29 CSE 45432 SUNY New Paltz Dynamic Scheduling The hardware performs the “scheduling” –hardware tries to find instructions to execute –out of order execution is possible –speculative execution and dynamic branch prediction All modern processors are very complicated –DEC Alpha 21264: 9 stage pipeline, 6 instruction issue –PowerPC and Pentium: branch history table –Compiler technology important

30 30 CSE 45432 SUNY New Paltz Dynamic Scheduling in PowerPC 604 and Pentium Pro Both In-order Issue, Out-of-order execution, In-order Commit Pentium Pro central reservation station for any functional units with one bus shared by a branch and an integer unit

31 31 CSE 45432 SUNY New Paltz Dynamic Scheduling in Pentium Pro PPro doesn’t pipeline 80x86 instructions PPro decode unit translates the Intel instructions into 72-bit micro-operations (­ MIPS) Sends micro-operations to reorder buffer & reservation stations Takes 1 clock cycle to determine length of 80x86 instructions + 2 more to create the micro-operations Most instructions translate to 1 to 4 micro-operations Complex 80x86 instructions are executed by a conventional microprogram (8K x 72 bits) that issues long sequences of micro-operations

32 32 CSE 45432 SUNY New Paltz FYI: MIPS R3000 clocking discipline 2-phase non-overlapping clocks phi1 phi2 phi1 phi2 Edge-triggered


Download ppt "1 CSE 45432 SUNY New Paltz Chapter Six Enhancing Performance with Pipelining."

Similar presentations


Ads by Google