Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 4250 Computer Architectures October 20, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.

Similar presentations


Presentation on theme: "CSC 4250 Computer Architectures October 20, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation."— Presentation transcript:

1 CSC 4250 Computer Architectures October 20, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation

2 One More Example on Tomasulo’s Algorithm L.DF0,0(R0) ADD.DF0,F0,F2 MUL.DF0,F0,F4 ADD.DF0,F0,F2 MUL.DF0,F0,F4 S.DF0,0(R0) ADD.DF0,F4,F2

3 IBM 360 Assembly Language Only two operands. Advantage? Disadvantage? Example: L.DF0,0(R0) ADD.DF0,F2 MUL.DF0,F4 ADD.DF0,F2 MUL.DF0,F4 S.DF0,0(R0)…

4 Figure 0.1 InstructionIssueExecuteWrite Result L.D F0,0(R0)√ ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1No Add2No Add3No Mult1No Mult2No Store1No F0F2F4F6F8F10F12…F30 QiLoad1

5 Figure 0.2 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2No Add3No Mult1No Mult2No Store1No F0F2F4F6F8F10F12…F30 QiAdd1

6 Figure 0.3 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2No Add3No Mult1YesMultReg[F4]Add1 Mult2No Store1No F0F2F4F6F8F10F12…F30 QiMult1

7 Figure 0.4 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2No Store1No F0F2F4F6F8F10F12…F30 QiAdd2

8 Figure 0.5 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1No F0F2F4F6F8F10F12…F30 QiMult2

9 Figure 0.6 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0)√ ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1YesStoreMult20+Reg[R0] F0F2F4F6F8F10F12…F30 QiMult2

10 Figure 0.7 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0)√ ADD.D F0,F4,F2√ NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3YesAddReg[F4]Reg[F2] Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1YesStoreMult20+Reg[R0] F0F2F4F6F8F10F12…F30 QiAdd3

11 Figure 0.8 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0)√ ADD.D F0,F4,F2√√√ NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1YesStoreMult20+Reg[R0] F0F2F4F6F8F10F12…F30 Qi

12 Modified Loop-Based Example Loop:L.DF0,0(R1) MUL.DF0,F0,F2 ADD.DF0,F0,F4 S.DF0,0(R1) DADDIUR1,R1,#−8 BNER1,R2,Loop

13 Figure 0.1. One active iteration of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F0,F0,F21√ ADD.D F0,F0,F41√ S.D F0,0(R1)1√ L.D F0,0(R1)2 MUL.D F0,F0,F22 ADD.D F0,F0,F42 S.D F0,0(R1)2 NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2No Add1YesAddReg[F4]Mult1 Add2No Mult1YesMultReg[F2]Load1 Mult2No Store1YesStoreAdd1Reg[R1] Store2No F0F2F4F6F8F10F12…F30 QiAdd1

14 Figure 0.2. Two active iterations of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F0,F0,F21√ ADD.D F0,F0,F41√ S.D F0,0(R1)1√ L.D F0,0(R1)2√√ MUL.D F0,F0,F22√ ADD.D F0,F0,F42√ S.D F0,0(R1)2√ NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1YesAddReg[F4]Mult1 Add2YesAddReg[F4]Mult2 Mult1YesMultReg[F2]Load1 Mult2YesMultReg[F2]Load2 Store1YesStoreAdd1Reg[R1] Store2YesAdd2Reg[R1]-8 F0F2F4F6F8F10F12…F30 QiAdd2

15 Figure 0.2. Two active iterations of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F0,F0,F21√ ADD.D F0,F0,F41√ S.D F0,0(R1)1√ L.D F0,0(R1)2√√ MUL.D F0,F0,F22√ ADD.D F0,F0,F42√ S.D F0,0(R1)2√ NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1YesAddReg[F4]Mult1 Add2YesAddReg[F4]Mult2 Mult1YesMultReg[F2]Load1 Mult2YesMultReg[F2]Load2 Store1YesStoreAdd1Reg[R1] Store2YesAdd2Reg[R1]-8 F0F2F4F6F8F10F12…F30 QiAdd2

16 Dynamic Branch Prediction Static branch prediction in Appendix A Branch Prediction Buffer: a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not The prediction bit may have been placed there by another instruction

17 Figure A Branch Prediction Buffer Use the 4 low-order address bits of the branch (word address) to choose a row.

18 Nested Loops Loop1:L.DF2,1600(R1) DADDIUR2,R0,#80 Loop2:L.DF0,1000(R2) ADD.DF0,F0,F2 S.DF0,1000(R2) DADDIUR2,R2,#−8 BNEZR2,Loop2 DADDIUR1,R1,#−8 BNEZR1,Loop1

19 Figure 3.7. States in 2-bit Prediction Scheme

20 Figure 3.8. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer for SPEC89 Benchmarks

21 Figure 3.9. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer versus an infinite 2-bit Prediction Buffer for SPEC89


Download ppt "CSC 4250 Computer Architectures October 20, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation."

Similar presentations


Ads by Google