Download presentation
Presentation is loading. Please wait.
Published byKamron Hinchman Modified over 9 years ago
1
CSC 4250 Computer Architectures October 20, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation
2
One More Example on Tomasulo’s Algorithm L.DF0,0(R0) ADD.DF0,F0,F2 MUL.DF0,F0,F4 ADD.DF0,F0,F2 MUL.DF0,F0,F4 S.DF0,0(R0) ADD.DF0,F4,F2
3
IBM 360 Assembly Language Only two operands. Advantage? Disadvantage? Example: L.DF0,0(R0) ADD.DF0,F2 MUL.DF0,F4 ADD.DF0,F2 MUL.DF0,F4 S.DF0,0(R0)…
4
Figure 0.1 InstructionIssueExecuteWrite Result L.D F0,0(R0)√ ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1No Add2No Add3No Mult1No Mult2No Store1No F0F2F4F6F8F10F12…F30 QiLoad1
5
Figure 0.2 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2No Add3No Mult1No Mult2No Store1No F0F2F4F6F8F10F12…F30 QiAdd1
6
Figure 0.3 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2No Add3No Mult1YesMultReg[F4]Add1 Mult2No Store1No F0F2F4F6F8F10F12…F30 QiMult1
7
Figure 0.4 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2No Store1No F0F2F4F6F8F10F12…F30 QiAdd2
8
Figure 0.5 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0) ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1No F0F2F4F6F8F10F12…F30 QiMult2
9
Figure 0.6 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0)√ ADD.D F0,F4,F2 NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1YesStoreMult20+Reg[R0] F0F2F4F6F8F10F12…F30 QiMult2
10
Figure 0.7 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0)√ ADD.D F0,F4,F2√ NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3YesAddReg[F4]Reg[F2] Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1YesStoreMult20+Reg[R0] F0F2F4F6F8F10F12…F30 QiAdd3
11
Figure 0.8 InstructionIssueExecuteWrite Result L.D F0,0(R0)√√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ ADD.D F0,F0,F2√ MUL.D F0,F0,F4√ S.D F0,0(R0)√ ADD.D F0,F4,F2√√√ NameBusyOpVjVkQjQkA Load1YesLoad0+Reg[R0] Add1YesAddReg[F2]Load1 Add2YesAddReg[F2]Mult1 Add3No Mult1YesMultReg[F4]Add1 Mult2YesMultReg[F4]Add2 Store1YesStoreMult20+Reg[R0] F0F2F4F6F8F10F12…F30 Qi
12
Modified Loop-Based Example Loop:L.DF0,0(R1) MUL.DF0,F0,F2 ADD.DF0,F0,F4 S.DF0,0(R1) DADDIUR1,R1,#−8 BNER1,R2,Loop
13
Figure 0.1. One active iteration of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F0,F0,F21√ ADD.D F0,F0,F41√ S.D F0,0(R1)1√ L.D F0,0(R1)2 MUL.D F0,F0,F22 ADD.D F0,F0,F42 S.D F0,0(R1)2 NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2No Add1YesAddReg[F4]Mult1 Add2No Mult1YesMultReg[F2]Load1 Mult2No Store1YesStoreAdd1Reg[R1] Store2No F0F2F4F6F8F10F12…F30 QiAdd1
14
Figure 0.2. Two active iterations of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F0,F0,F21√ ADD.D F0,F0,F41√ S.D F0,0(R1)1√ L.D F0,0(R1)2√√ MUL.D F0,F0,F22√ ADD.D F0,F0,F42√ S.D F0,0(R1)2√ NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1YesAddReg[F4]Mult1 Add2YesAddReg[F4]Mult2 Mult1YesMultReg[F2]Load1 Mult2YesMultReg[F2]Load2 Store1YesStoreAdd1Reg[R1] Store2YesAdd2Reg[R1]-8 F0F2F4F6F8F10F12…F30 QiAdd2
15
Figure 0.2. Two active iterations of loop InstructionIterationIssueExecuteWrite Result L.D F0,0(R1)1√√ MUL.D F0,F0,F21√ ADD.D F0,F0,F41√ S.D F0,0(R1)1√ L.D F0,0(R1)2√√ MUL.D F0,F0,F22√ ADD.D F0,F0,F42√ S.D F0,0(R1)2√ NameBusyOpVjVkQjQkA Load1YesLoadReg[R1] Load2YesLoadReg[R1]-8 Add1YesAddReg[F4]Mult1 Add2YesAddReg[F4]Mult2 Mult1YesMultReg[F2]Load1 Mult2YesMultReg[F2]Load2 Store1YesStoreAdd1Reg[R1] Store2YesAdd2Reg[R1]-8 F0F2F4F6F8F10F12…F30 QiAdd2
16
Dynamic Branch Prediction Static branch prediction in Appendix A Branch Prediction Buffer: a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not The prediction bit may have been placed there by another instruction
17
Figure 3.14. A Branch Prediction Buffer Use the 4 low-order address bits of the branch (word address) to choose a row.
18
Nested Loops Loop1:L.DF2,1600(R1) DADDIUR2,R0,#80 Loop2:L.DF0,1000(R2) ADD.DF0,F0,F2 S.DF0,1000(R2) DADDIUR2,R2,#−8 BNEZR2,Loop2 DADDIUR1,R1,#−8 BNEZR1,Loop1
19
Figure 3.7. States in 2-bit Prediction Scheme
20
Figure 3.8. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer for SPEC89 Benchmarks
21
Figure 3.9. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer versus an infinite 2-bit Prediction Buffer for SPEC89
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.