1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining

2  1998 Morgan Kaufmann Publishers Definition Pipeline is an implementation technique in which multiple instructions are overlapped in execution. We’ll use a laundry analogy for pipelining to explain the main concepts. There are four stages in doing the laundry: –put dirty clothes to the washer (wash) –placed washed clothes in the dryer (dry) –place the dry load on the table and fold (fold) –put clothes away (store) What about the MIPS instruction?

3  1998 Morgan Kaufmann Publishers Single-Cycle vs Pipelined Performance Look at lw, sw, add, sub,and, or, slt and beq. Operation time for major functional components: –2ns for memory access –2ns for ALU operation –1ns for register file read or write Total execution time for 3 instructions: –3x8=24 ns for a single-cycled,non-pipelined processor –14 ns (see Figure in next page) for a pipelined processor Total execution time for 1003 instructions: –1000x8ns + 24 ns = 8024 ns for a single-cycled,non-pipelined processor –1000x2ns + 14 ns= 2014 ns for a pipelined processor Speedup is less than the number of stages because: –stages may be imperfectly balanced –overhead involved

4  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput Each instruction still take the same time to execute Ideal speedup is number of stages in the pipeline. Do we achieve this? 2 ns Instruction fetch RegALU Data access Reg 2 ns2 ns2 ns2 ns2 ns Program execution order (in instructions)

5  1998 Morgan Kaufmann Publishers Pipelining in MIPS- What makes it easy All instructions are the same length: instruction fetch (1st pipeline stage) and decoding(2nd stage) are much easier MIPS has just a few instruction formats, source register field in the same location ==> register file read and instruction decoding can be done at the same time Memory operands appear only in loads and stores (as opposed to 80x86, where we could operate on the operands in memory) Operands must be aligned in memory: need not worry about a single data transfer instruction requiring two data memory accesses.

6  1998 Morgan Kaufmann Publishers Pipelining in MIPS- What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction

7  1998 Morgan Kaufmann Publishers Structural Hazards If we have a fourth instruction in the following figure? What happens between time 6 and 8 ns? 2 ns Instruction fetch RegALU Data access Reg 2 ns2 ns2 ns2 ns2 ns Program execution order (in instructions)

8  1998 Morgan Kaufmann Publishers Control Hazards Possible solution: –stall: to pause before continuing the pipeline, not efficient if we have a long pipeline –pipeline stall is also known as bubble 24681012 1416 Program execution order (in instructions) The above figure assumes that we have extra hardware in place to resolve the branch in the second stage. Otherwise the pause will be longer than 4ns.

9  1998 Morgan Kaufmann Publishers Control Hazards Another solution: Predict 101214 Instruction fetch RegALU Data access Reg 2 ns 4 ns bubblebubble bubblebubblebubble Program execution order (in instructions)

10  1998 Morgan Kaufmann Publishers Control Hazards Delayed branch: 012 14 2 ns (Delayed branch slot) Program execution order (in instructions)

11  1998 Morgan Kaufmann Publishers Data Hazards Look at the following example: add $s0, $t0, $t1 sub $t2, $s0, $t3 We need the result $s0 from the add instruction to do the subtraction. Is the data ready? Compiler cannot handle this issue Solution: forwarding or bypassing, i.e., getting the missing item early from the internal resources.

12  1998 Morgan Kaufmann Publishers Graphical representation of the instruction pipeline IF: instruction fetch ID: instruction decode EX: execution MEM: memory access WB: write back Shading: element used, White: element not used Right-shading: read, Left-Shading: write Time 246810 add $s0, $t0, $t1 IFID WB EX MEM

13  1998 Morgan Kaufmann Publishers Forwarding As soon as ALU add is finished, forward the result add $s0, $t0, $t1 sub $t2, $s0, $t3 Program execution order (in instructions) IFIDWBEX IFID MEM EX Time 246810 MEM WBMEM

14  1998 Morgan Kaufmann Publishers Forwarding with stall For R-format instruction following a load that tries to use the data, load-use data hazard will occur. Need to stall in this case. bblebubble

15  1998 Morgan Kaufmann Publishers Reordering Code to Avoid Pipeline Stalls Original code: # register $t1 has the address of v[k] lw $t0, 0($t1) # reg $t0 = v[k] lw $t2, 4($t1) # reg $t1=v[k+1] sw $t2, 0($t1) # v[k] = reg $t2 sw $t0, 4($t1) # v[k+1]= reg $t0 Data hazard occurs on register $t2 between the second lw and the first sw Modified code removes the hazard # register $t1 has the address of v[k] lw $t0, 0($t1) # reg $t0 = v[k] lw $t2, 4($t1) # reg $t1=v[k+1] sw $t0, 4($t1) # v[k+1]= reg $t0 sw $t2, 0($t1) # v[k] = reg $t2

16  1998 Morgan Kaufmann Publishers A Pipelined Datapath What do we need to add to actually split the datapath into stages? xecute/ address calculation MEM: Memory accessWB: Write back

17  1998 Morgan Kaufmann Publishers Pipelined Datapath Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data 1 ALU result M u x ALU Zero ID/EX Data memory Address

18  1998 Morgan Kaufmann Publishers Corrected Datapath

19  1998 Morgan Kaufmann Publishers Graphically Representing Pipelines Can help with answering questions like: –how many cycles does it take to execute this code? –what is the ALU doing during cycle 4? –use this representation to help understand datapaths ALU ALU

20  1998 Morgan Kaufmann Publishers Pipeline Control

21  1998 Morgan Kaufmann Publishers We have 5 stages. What needs to be controlled in each stage? –Instruction Fetch and PC Increment –Instruction Decode / Register Fetch –Execution –Memory Stage –Write Back How would control be handled in an automobile plant? –a fancy control center telling everyone what to do? –should we use a finite state machine? Pipeline control

22  1998 Morgan Kaufmann Publishers Pass control signals along just like the data Pipeline Control

23  1998 Morgan Kaufmann Publishers Datapath with Control

24  1998 Morgan Kaufmann Publishers Problem with starting next instruction before first is finished –dependencies that o backward in time?are data hazards Dependencies

25  1998 Morgan Kaufmann Publishers Have compiler guarantee no hazards Where do we insert the ops?? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution

26  1998 Morgan Kaufmann Publishers Use temporary results, don’t wait for them to be written –register file forwarding to handle read/write to same register –ALU forwarding Forwarding what if this $2 was $13?

27  1998 Morgan Kaufmann Publishers Forwarding

28  1998 Morgan Kaufmann Publishers Load word can still cause a hazard: –an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to stall the load instruction Can't always forward

29  1998 Morgan Kaufmann Publishers Stalling We can stall the pipeline by keeping an instruction in the same stage

30  1998 Morgan Kaufmann Publishers Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward

31  1998 Morgan Kaufmann Publishers When we decide to branch, other instructions are in the pipeline! We are predicting branch not taken –need to add hardware for flushing instructions if we are wrong Branch Hazards

32  1998 Morgan Kaufmann Publishers Flushing Instructions

33  1998 Morgan Kaufmann Publishers Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Add a branch delay slot –the next instruction after a branch is always executed –rely on compiler to fill the slot with something useful

34  1998 Morgan Kaufmann Publishers More on improving performances Superpipelining: decompose the stage further (not always practical) Superscalar: start more than one instruction in the same cycle (extra coordination required) –CPI can be less than 1 –IPC: instruction per clock cycle Dynamic pipelining: lw $t0, 20($s2) addu $t1, $t0, $t2 sub $s4, $s4, $t3 slti $t5, $s4, 20 –Combine extra hardware resources so later instructions can proceed in parallel. –More complicated pipeline control –More complicated instruction execution model

35  1998 Morgan Kaufmann Publishers Superscalar MIPS Assume two instructions are issued per clock cycle, say one integer ALU operation or branch, the other load or store. Need to fetch and decode 64 bits of instruction Extra resources are required.

36  1998 Morgan Kaufmann Publishers Dynamic Scheduling The hardware performs the scheduling? –hardware tries to find instructions to execute –out of order execution is possible –speculative execution and dynamic branch prediction All modern processors are very complicated –DEC Alpha 21264: 9 stage pipeline, 6 instruction issue –PowerPC and Pentium: branch history table –Compiler technology important

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

Similar presentations

Presentation on theme: "1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.

Similar presentations

Presentation on theme: "1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining."— Presentation transcript:

Similar presentations

About project

Feedback