Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPT 334 Computer Organization Chapter 4 The Processor (Pipelining) [Adapted from Computer Organization and Design 5 th Edition, Patterson & Hennessy,

Similar presentations


Presentation on theme: "CMPT 334 Computer Organization Chapter 4 The Processor (Pipelining) [Adapted from Computer Organization and Design 5 th Edition, Patterson & Hennessy,"— Presentation transcript:

1 CMPT 334 Computer Organization Chapter 4 The Processor (Pipelining) [Adapted from Computer Organization and Design 5 th Edition, Patterson & Hennessy, © 2014, MK]

2 Improving Performance Ultimate goal: improve system performance One idea: pipeline the CPU Pipelining is a technique in which multiple instructions are overlapped in execution. It relies on the fact that the various parts of the CPU aren’t all used at the same time Let’s look at an analogy

3 Sequential Laundry Four roommates need to do laundry How long to do laundry sequentially? ▫Washer, dryer, “folder”, “storer” each take 30 minutes ▫Total time: 8 hours for four loads

4 Pipelined Laundry How long to do if can overlap tasks? ▫Only 3.5 hours!

5 Pipelining Notes Pipelining doesn’t help latency of single task, it helps throughput of entire workload ▫How many instructions can we execute per second? Potential speedup = number of stages

6 MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

7 Stages of the Datapath Stage 1: Instruction Fetch ▫No matter what the instruction, the 32-bit instruction word must first be fetched from memory ▫Every time we fetch an instruction, we also increment the PC to prepare it for the next instruction fetch  PC = PC + 4, to point to the next instruction

8 Stages of the Datapath Stage 2: Instruction Decode ▫First, read the opcode to determine instruction type and field lengths ▫Second, read in data from all necessary registers  For add, read two registers  For addi, read one register  For jal, no register read necessary

9 Stages of the Datapath Stage 3: Execution ▫Uses the ALU ▫The real work of most instructions is done here: arithmetic, logic, etc. ▫What about loads and stores – e.g., lw $t0, 40($t1)  Address we are accessing in memory is 40 + contents of $t1  We can use the ALU to do this addition in this stage

10 Stages of the Datapath Stage 4: Memory Access ▫Only the load and store instructions do anything during this stage; the others remain idle Stage 5: Register Write ▫Most instructions write the result of some computation into a register ▫Examples: arithmetic, logical, shifts, loads, slt ▫What about stores, branches, jumps?  Don’t write anything into a register at the end  These remain idle during this fifth stage

11 MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

12 Datapath Walkthrough: LW, SW lw $s3, 17($s1) ▫Stage 1: fetch this instruction, increment PC ▫Stage 2: decode to find it’s a lw, then read register $s1 ▫Stage 3: add 17 to value in register $s1 (retrieved in Stage 2) ▫Stage 4: read value from memory address compute in Stage 3 ▫Stage 5: write value read in Stage 4 into register $s3 sw $s3, 17($s1) ▫Stage 1: fetch this instruction, increment PC ▫Stage 2: decode to find it’s a sw, then read registers $s1 and $s3 ▫Stage 3: add 17 to value in register $1 (retrieved in Stage 2) ▫Stage 4: write value in register $3 (retrieved in Stage 2) into memory address computed in Stage 3 ▫Stage 5: go idle (nothing to write into a register)

13 Datapath Walkthrough: SLTI, ADD slti $s3,$s1,17 ▫Stage 1: fetch this instruction, increment PC ▫Stage 2: decode to find it’s an slti, then read register $s1 ▫Stage 3: compare value retrieved in Stage 2 with the integer 17 ▫Stage 4: go idle ▫Stage 5: write the result of Stage s3 in register $s3 add $s3,$s1,$s2 ▫Stage 1: fetch this instruction, increment PC ▫Stage 2: decode to find it’s an add, then read registers $s1 and $s2 ▫Stage 3: add the two values retrieved in Stage 2 ▫Stage 4: idle (nothing to write to memory) ▫Stage 5: write result of Stage 3 into register $s3

14 Pipeline Performance Assume time for stages is ▫100ps for register read or write ▫200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps

15 Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

16 Pipeline Speedup If all stages are balanced ▫i.e., all take the same time ▫Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less

17 Limits to Pipelining: Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards ▫A required resource is busy Data hazard ▫Need to wait for previous instruction to complete its data read/write Control hazard ▫Deciding on control action depends on previous instruction

18 Data Hazards An instruction depends on completion of data access by a previous instruction ▫ add$s0, $t0, $t1 sub$t2, $s0, $t3 stall the pipeline

19 Exercise 4.8 IFIDEXMEMWB 250ps350ps150ps300ps200ps R-typebeqlwsw 45%20% 15% What is the clock cycle time in a pipelined and non-pipelined processor? Pipelined Single-cycle 350 ps 1250 ps

20 Exercise 4.8 IFIDEXMEMWB 250ps350ps150ps300ps200ps R-typebeqlwsw 45%20% 15% What is the total latency of an lw instruction in a pipelined and non-pipelined processor? Pipelined Single-cycle 1250 ps

21 Exercise 4.8 IFIDEXMEMWB 250ps350ps150ps300ps200ps R-typebeqlwsw 45%20% 15% What is the total latency of an lw instruction in a pipelined and non-pipelined processor? Pipelined Single-cycle 1250 ps

22 Exercise 4.8 IFIDEXMEMWB 250ps350ps150ps300ps200ps R-typebeqlwsw 45%20% 15% What is the utilization of the data memory? 35%

23 Exercise 4.8 IFIDEXMEMWB 250ps350ps150ps300ps200ps R-typebeqlwsw 45%20% 15% What is the utilization of the write-register port of the “Registers” unit? 65%


Download ppt "CMPT 334 Computer Organization Chapter 4 The Processor (Pipelining) [Adapted from Computer Organization and Design 5 th Edition, Patterson & Hennessy,"

Similar presentations


Ads by Google