Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daxia Ge Friday February 9th, 2007

Similar presentations


Presentation on theme: "Daxia Ge Friday February 9th, 2007"— Presentation transcript:

1 Daxia Ge Friday February 9th, 2007
EE 108B Review Session #4 Daxia Ge Friday February 9th, 2007

2 Admin Quiz #1 - feedback PA1 – already due. Nothing until next week.
Lab2 – due Tuesday HW3 – out next week Tape-ahead lecture Tuesday

3 Quiz #1

4 Pipeline: motivation

5

6 Pipelining Terms: Latency vs. Throughput Dependencies vs. Hazards
Forwarding vs. Bypassing Interlocking MIPS vs. MIPS

7 Test your understanding
1. Short Questions a. To increase throughput, the Intel Pentium 4 has a 20 stage pipeline while the AMD Athlon XP has only 9 stages. The Intel processor is therefore able to achieve a much higher throughput. In practice however, this is not necessarily the case. Give the most significant reason why the Intel Pentium 4 does not complete operations at twice the rate of the Athlon.

8 Test your understanding (cont’)
b. You need to make a choice between a pipelined MIPS implementation without any bypassing or interlocking and one with bypassing and interlocking. You know exactly which programs you are going to execute. Why might you get better performance from the implementation without any bypassing or interlocking?

9 MIPS: Pipelined MIPS was designed with pipelining in mind:
Split I & D memory Fixed length instructions Limited/fixed instruction formats Memory accesses limited to load/stores Memory access are aligned Write before Read*

10 Hazards Definition: anytime the next instruction cannot execute
Structural – “resource sharing” Data – dependence/forwarding/interlocking Can be helped out by compiler. Control – “Decisions Decisions.” (next week)

11 Structural Hazard To avoid structural hazards:
Use resource once per instruction Always in the same cycle IF RF/ID EX MEM WB 1 2 3 4 5 1 2 3 4 R-type IF RF/ID EX WB

12 Structural (cont’) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6
Clock Oops! We have a problem! R-type IF RF/ID EX WB R-type IF RF/ID EX WB Load IF RF/ID EX MEM WB ALU IF RF/ID EX WB R-type IF RF/ID EX WB

13 Data Dependency Dependencies for instruction j following instruction i
Read after Write (RAW) (true dependence) Instruction j tries to read before instruction i tries to write it Write after Write (WAW) Instruction j tries to write an operand before i writes its value Write after Read (WAR) Instruction j tries to write a destination before it is read by i

14 Dependencies vs. Hazards
Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF EX MEM WB add r1,r2,r3 Im Reg ALU Reg Dm I n s t r. O r d e ALU sub r4, r1, r3 Im Reg Dm Reg ALU and r6, r1, r7 Im Reg Dm Reg or r8, r1, r9 Im Reg ALU Dm Reg ALU xor r10, r1, r11 Im Reg Dm Reg

15 Dependencies vs. Hazards
Dependencies backwards in time are hazards Time (clock cycles) IF ID/RF EX MEM WB add r1,r2,r3 Im Reg ALU Reg Dm I n s t r. O r d e sub r4, r1, r3 ALU Im Reg Dm and r6, r1, r7 ALU Im Reg or r8, r1, r9 Im Reg xor r10, r1, r11 Im

16 Data forwarding

17 Sample Problem 3 Consider executing the following code on the pipelined datapath add $5, $6, $7 lw $6, 100($7) sub $7, $6, $8 How many cycles will it take to execute this code? What dependencies need to be resolved? Illustrate how the code will actually be executed (incorporating any stalls or forwarding) so as to resolve the identified problems.

18 Data Hazard: Back in time Dependency
Do we really need $6 so early? Do we actually have $6 so late?

19 1 stall is enough!

20 Or re-order in an independent instruction!
1 stall is enough! Or re-order in an independent instruction!

21 Sample Problem 4 IF: fetch the instruction
Suppose the MIPS instruction set had one additional instruction: addm r1, (r2), r3 that added the contents of the memory location specified by r2 to the contents of r3 and placed the result in r1. With this instruction, a pipelined implementation might use the following 6-stage pipeline: IF: fetch the instruction RF: decode the instruction and read the registers EA: calculate the effective address using a special address adder MEM: access data memory if needed ALU: all ALU operations are performed here WB: write the result into the registers (assume that both the read and write of the registers take the entire clock cycle and the value is available only after the completion of this cycle).

22 Sample Problem 4 a. For this part assume that the machine has no bypassing or forwarding. Show that stalls would be encountered in executing the following code sequence, by annotating the following code sequence, if a stall is longer than one clock cycle, make sure you say how many cycles it is. add r1, r2,r3 addm r4,(r5),r1 addm r6,(r4),r1 lw r7,0(r4)

23 Sample Problem 4a Cycle Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 add r1,r2,r3 addm r4,(r5),r1 addm r6,(r4),r1 lw r7,0(r4)

24 Sample Problem 4a Cycle Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 add r1,r2,r3 IF RF EA ME AL WB addm r4,(r5),r1 addm r6,(r4),r1 lw r7,0(r4)

25 Sample Problem 4 For this part assume that the machine has full bypassing or forwarding, and that results can be used as soon as they are available. Show what stalls would be encountered in executing the following code sequence, if a stall is longer than one clock cycle, make sure you say how many cycles it is. add r1, r2,r3 addm r4,(r5),r1 addm r6,(r4),r1 lw r7,0(r4)

26 Sample Problem 4b Cycle Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 add r1,r2,r3 addm r4,(r5),r1 addm r6,(r4),r1 lw r7,0(r4)

27 Sample Problem 4b Cycle Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 add r1,r2,r3 IF RF EA ME AL WB addm r4,(r5),r1 addm r6,(r4),r1 lw r7,0(r4)


Download ppt "Daxia Ge Friday February 9th, 2007"

Similar presentations


Ads by Google