Presentation is loading. Please wait. # 1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.

## Presentation on theme: "1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle."— Presentation transcript:

1 ECE369 ECE369 Pipelining

2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle (like the register file, but this is likely not efficient to do in a real machine). All instructions use the same format (shown below), but not all instructions use all of the fields. Assume that each unused field is set to 0.

3 ECE369 InstrRegDstRegWriteMemReadMemWriteALUsrcMemToALUDataSrcPCSrc ALUOp addm x 0110010 Add

4 ECE369 Pipelining One CPU manufacturer has proposed the 10-stage pipeline shown below. Here are the correspondences between this and the MIPS pipeline: Instructions are fetched in the FET stage. Register reading is performed in the REG stage. ALU operations and memory accesses are both done in the EXE stage. Branches are resolved in the DET stage. WRB is the writeback stage. Write and Read on Memory or Register File can occur in the same cycle Without forwarding, how many stall cycles are needed for the following code? Show your work to get credit. lw \$t0, 0(\$a0) add \$v1, \$t0, \$t0

5 ECE369 Solution

6 ECE369 Assume that the initial value of R3 is R2+396, How many cycles does this loop take to execute? Loop: LWR1, 0(R2) ADDIR1, R1,#1 SWR1, 0(R2) ADDIR2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop -no forwarding or bypassing hardware. -all memory and register writes occur during the first half and reads occur during the second half of the clock cycle. (a register read and a register write in the same cycle forwards through the register file). -branching is handled by flushing the pipeline and branches are resolved in Memory stage.

7 ECE369 branches are resolved in MEM. Second iterations starts 17 clock cycles after the first instructions. Last iterations takes 18 cycles. Loop executes 99 times. => 98*17+18=1684cycles.

8 ECE369 Assume that the initial value of R3 is R2+396, How many cycles does this loop take to execute? Loop: LWR1, 0(R2) ADDIR1, R1,#1 SWR1, 0(R2) ADDIR2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop -with forwarding and bypassing hardware. -all memory and register writes occur during the first half and reads occur during the second half of the clock cycle. (a register read and a register write in the same cycle forwards through the register file). -Assume that branch is resolved in Memory stage and handled by predicting it as not taken. {Use (m) for branch mis-prediction in the table}

9 ECE369 branches are resolved in MEM. Second iterations starts 10 clock cycles after the first instructions. Last iterations takes 11 cycles. Loop executes 99 times. => 98*10+11=991cycles.

10 ECE369 Assume that the initial value of R3 is R2+396, How many cycles does this loop take to execute? Loop: LWR1, 0(R2) ADDIR1, R1,#1 SWR1, 0(R2) ADDIR2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop Assuming the MIPS pipeline with a single cycle delayed branch and normal forwarding and bypassing hardware, Schedule the instructions in the loop including the branch delay slot. You may reorder the instructions and modify the individual instruction operands, but do not undertake other loop transformations that change the number or opcode of the instructions in the loop. Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop.

11 ECE369 =98*6+10=598 clocks Loop: LWR1, 0(R2) ADDIR1, R1,#1 SWR1, 0(R2) ADDIR2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop

Download ppt "1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle."

Similar presentations

Ads by Google