Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Similar presentations


Presentation on theme: "CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation."— Presentation transcript:

1 CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation

2 MIPS FP Unit using Tomasulo’s Algorithm

3 MIPS Processor with Scoreboard

4 Three Steps in Execution for Tomasulo’s Alg. 1. Issue ─ if no structural hazards 2. Execute ─ if both operands are available 3. Write result on CDB (from there into reservation stations waiting for results) Recall that for Scoreboard: Four Steps in Execution 1. Issue ─ if no structural nor WAW hazards 2. Read operands ─ if no RAW hazards 3. Execute ─ if both operands are received 4. Write result ─ if no WAR hazards

5 How Hazards are Handled Structural Hazards ─ Reservation stations allow more instructions to be issued RAW Hazards ─ An instruction is executed only when its operands are available WAR and WAW Hazards ─ Register renaming eliminates these hazards by renaming all destination registers, including those with a pending read or write for an earlier instruction, so that the out-of-order write does not affect any instruction that depends on an earlier value of an operand

6 Tags Tag is a 4-bit quantity that denotes one of five reservation stations or one of six load buffers Tag fields are found in the reservation stations, the register file, and the store buffers

7 Example L.DF6,34(R2) L.DF2,45(R3) MUL.DF0,F2,F4 SUB.DF8,F2,F6 DIV.DF10,F0,F6 ADD.DF6,F8,F2

8 Three Tables (1st table is not part of hardware; 2nd and 3rd tables are distributed) 1. Instruction status ─ indicates which of three steps of instruction 2. Reservation stations ─ busy, op, Vj, Vk, Qj, Qk, A (V = value; Q = reservation station) 3. Register status ─ indicates which reservation station will write this register

9 Figure 0.0 InstructionIssueExecuteWrite Result L.D F6,34(R2)√√ L.D F2,45(R3)√√ MUL.D F0,F2,F4√ SUB.D F8,F2,F6 DIV.D F10,F0,F6 ADD.D F6,F8,F2 NameBusyOpVjVkQjQkA Load1YesLoad34+Reg[R2] Load2YesLoad45+Reg[R3] Add1No Add2No Add3No Mult1YesMultReg[F4]Load2 Mult2No F0F2F4F6F8F10F12…F30 QiMult1Load2Load1

10 Figure 0.1 InstructionIssueExecuteWrite Result L.D F6,34(R2)√√ L.D F2,45(R3)√√ MUL.D F0,F2,F4√ SUB.D F8,F2,F6√ DIV.D F10,F0,F6 ADD.D F6,F8,F2 NameBusyOpVjVkQjQkA Load1YesLoad34+Reg[R2] Load2YesLoad45+Reg[R3] Add1YesSubLoad2Load1 Add2No Add3No Mult1YesMultReg[F4]Load2 Mult2No F0F2F4F6F8F10F12…F30 QiMult1Load2Load1Add1

11 Figure 0.2 (Suppose LD is slow) InstructionIssueExecuteWrite Result L.D F6,34(R2)√√ L.D F2,45(R3)√√ MUL.D F0,F2,F4√ SUB.D F8,F2,F6√ DIV.D F10,F0,F6√ ADD.D F6,F8,F2 NameBusyOpVjVkQjQkA Load1YesLoad34+Reg[R2] Load2YesLoad45+Reg[R3] Add1YesSubLoad2Load1 Add2No Add3No Mult1YesMultReg[F4]Load2 Mult2YesDivMult1Load1 F0F2F4F6F8F10F12…F30 QiMult1Load2Load1Add1Mult2

12 Figure 0.3 (Suppose LD is slow) InstructionIssueExecuteWrite Result L.D F6,34(R2)√√ L.D F2,45(R3)√√ MUL.D F0,F2,F4√ SUB.D F8,F2,F6√ DIV.D F10,F0,F6√ ADD.D F6,F8,F2√ NameBusyOpVjVkQjQkA Load1YesLoad34+Reg[R2] Load2YesLoad45+Reg[R3] Add1YesSubLoad2Load1 Add2YesAddAdd1Load2 Add3No Mult1YesMultReg[F4]Load2 Mult2YesDivMult1Load1 F0F2F4F6F8F10F12…F30 QiMult1Load2Add2Add1Mult2

13 Figure 3.3 InstructionIssueExecuteWrite Result L.D F6,34(R2)√√√ L.D F2,45(R3)√√ MUL.D F0,F2,F4√ SUB.D F8,F2,F6√ DIV.D F10,F0,F6√ ADD.D F6,F8,F2√ NameBusyOpVjVkQjQkA Load1No Load2YesLoad45+Reg[R3] Add1YesSubMem[34+Reg[R2]]Load2 Add2YesAddAdd1Load2 Add3No Mult1YesMultReg[F4]Load2 Mult2YesDivMem[34+Reg[R2]]Mult1 F0F2F4F6F8F10F12…F30 QiMult1Load2Add2Add1Mult2

14 Figure 0.4 (2 nd load just completes) InstructionIssueExecuteWrite Result L.D F6,34(R2)√√√ L.D F2,45(R3)√√√ MUL.D F0,F2,F4√√ SUB.D F8,F2,F6√√ DIV.D F10,F0,F6√ ADD.D F6,F8,F2√ NameBusyOpVjVkQjQkA Load1No Load2No Add1YesSubMem[45+Reg[R3]]Mem[34+Reg[R2]] Add2YesAddMem[45+Reg[R3]]Add1 Add3No Mult1YesMultMem[45+Reg[R3]]Reg[F4] Mult2YesDivMem[34+Reg[R2]]Mult1 F0F2F4F6F8F10F12…F30 QiMult1Add2Add1Mult2

15 Figure 3.4 InstructionIssueExecuteWrite Result L.D F6,34(R2)√√√ L.D F2,45(R3)√√√ MUL.D F0,F2,F4√√ SUB.D F8,F2,F6√√√ DIV.D F10,F0,F6√ ADD.D F6,F8,F2√√√ NameBusyOpVjVkQjQkA Load1No Load2No Add1No Add2No Add3No Mult1YesMultMem[45+Reg[R3]]Reg[F4] Mult2YesDivMem[34+Reg[R2]]Mult1 F0F2F4F6F8F10F12…F30 QiMult1Mult2

16 Loop-Based Example Loop:L.DF0,0(R1) MUL.DF4,F0,F2 S.DF4,0(R1) DADDIUR1,R1,#−8 BNER1,R2,Loop

17 Figure 0.5. One active iteration of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F4,F0,F21√ S.D F4,0(R1)1√ L.D F0,0(R1)2 MUL.D F4,F0,F22 S.D F4,0(R1)2 NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2No Add1No Add2No Add3No Mult1YesMultReg[F2]Load1 Mult2No Store1YesStoreMult1Reg[R1] Store2No F0F2F4F6F8F10F12…F30 QiLoad1Mult1

18 Figure 0.6. One+ active iteration of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F4,F0,F21√ S.D F4,0(R1)1√ L.D F0,0(R1)2√ MUL.D F4,F0,F22 S.D F4,0(R1)2 NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1No Add2No Add3No Mult1YesMultReg[F2]Load1 Mult2No Store1YesStoreMult1Reg[R1] Store2No F0F2F4F6F8F10F12…F30 QiLoad2Mult1

19 Figure 0.7. One++ active iteration of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F4,F0,F21√ S.D F4,0(R1)1√ L.D F0,0(R1)2√√ MUL.D F4,F0,F22√ S.D F4,0(R1)2 NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1No Add2No Add3No Mult1YesMultReg[F2]Load1 Mult2YesMultReg[F2]Load2 Store1YesStoreMult1Reg[R1] Store2No F0F2F4F6F8F10F12…F30 QiLoad2Mult12

20 Figure 3.6. Two active iterations of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F4,F0,F21√ S.D F4,0(R1)1√ L.D F0,0(R1)2√√ MUL.D F4,F0,F22√ S.D F4,0(R1)2√ NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1No Add2No Add3No Mult1YesMultReg[F2]Load1 Mult2YesMultReg[F2]Load2 Store1YesStoreMult1Reg[R1] Store2YesStoreMult2Reg[R1]-8 F0F2F4F6F8F10F12…F30 QiLoad2Mult12

21 IBM 360/91 Great ideas:  Data tagging  Register renaming  Dynamic detection of memory hazards  Generalized forwarding Ideas broadly used now in microprocessors Was 360/91 successful commercially?

22 IBM 360/85 (1968) First commercial computer with a cache:  Slower clock time (80ns versus 60ns)  Less memory interleaving (4 versus 16)  Slower main memory (1.04 μs versus 0.75 μs)  Cheaper in price Which machine was faster on applications?


Download ppt "CSC 4250 Computer Architectures October 17, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation."

Similar presentations


Ads by Google