Presentation is loading. Please wait.

Presentation is loading. Please wait.

F 1 E 1 F 2 E 2 F 3 E 3 F 1 E 1 F 2 E 2 F 3 E 3 I 1 I 2 I 3 I 1 I 2 I 3 Instruction (a) Sequential execution (c) Pipelined execution Figure 8.1. Basic.

Similar presentations


Presentation on theme: "F 1 E 1 F 2 E 2 F 3 E 3 F 1 E 1 F 2 E 2 F 3 E 3 I 1 I 2 I 3 I 1 I 2 I 3 Instruction (a) Sequential execution (c) Pipelined execution Figure 8.1. Basic."— Presentation transcript:

1 F 1 E 1 F 2 E 2 F 3 E 3 F 1 E 1 F 2 E 2 F 3 E 3 I 1 I 2 I 3 I 1 I 2 I 3 Instruction (a) Sequential execution (c) Pipelined execution Figure 8.1. Basic idea of instruction pipelining. Clock cycle1234 Instruction fetch unit Execution unit Interstage buffer B1 (b) Hardware organization Time T

2

3

4 E:Execute (ALU) (b) Position of the source and result registers in the processor pipeline

5 X Figure 8.9.Branch timing. F 1 D 1 E 1 W 1 I 2 (Branch) I 1 1234567Clock cycle F 2 D 2 F 3 X F k D k E k F k+1 D 1 I 3 I k I 1 W k E 1 (b) Branch address computed in Decode stage F 1 D 1 E 1 W 1 I 2 (Branch) I 1 1234567Clock cycle F 2 D 2 F 3 F k D k E k F k+1 D 1 I 3 I k I 1 W k E 1 (a) Branch address computed in Execute stage E 2 D 3 F 4 X I 4 8 Time T

6 FE FE FE FE FE FE FE Instruction Decrement Branch Shift (delay slot) Figure 8.13.Execution timing showing the delay slot being filled during the last two passes through the loop in Figure 8.12. Decrement (Branch taken) Branch Shift (delay slot) Add (Branch not taken) 12345678Clock cycle Time

7 F 1 F 2 I 1 (Compare) I 2 (Branch>0) I 3 D 1 E 1 W 1 F 3 F 4 F k D k D 3 X XI 4 I k Instruction Figure 8.14.Timing when a branch decision has been incorrectly predicted as not taken. E 2 Clock cycle 123456 D 2 /P 2 Time

8 Figure 8.15. State-machine representation of branch prediction algorithms. BTBNT BT BNT Branch taken (BT) Branch not taken (BNT) (a) A 2-state algorithm (b) A 4-state algorithm BT BNT BTBNTLNT LT LNT LTST SNT BT

9 X +[R1] F FD DE FD F F FD D D E X + [X +[R1]][[X +[R1]]] [X +[R1]] [[X +[R1]]] Load Next instruction Add Load Next instruction (a) Complex addressing mode (b) Simple addressing mode Figure 8.16. Equivalent operations using complex and simple addressing modes. W W 1234567 Clock cycle Time W Forward W W W

10 Figure 8.18.Datapath modified for pipelined execution, with Interstage buffers at the input and output of the ALU.

11 I 1 (Fadd) D 1 D 2 D 3 D 4 E 1A E 1B E 1C E 2 E 3A E 3B E 3C E 4 W 1 W 2 W 3 W 4 I 2 (Add) I 3 (Fsub) I 4 (Sub) Figure 8.21. Instruction completion in program order. 123456Clock cycle Time (a) Delayed write I 1 (Fadd) D 1 D 2 D 3 D 4 E 1A E 1B E 1C E 2 E 3A E 3B E 3C E 4 W 1 W 2 W 3 W 4 I 2 (Add) I 3 (Fsub) I 4 (Sub) 123456Clock cycle Time (b) Using temporary registers TW 2 4 F 1 F 2 F 3 F 4 7 7 F 1 F 2 F 3 F 4

12 Figure 8.23. Main building blocks of the UltraSPARC II processor.

13 ADDccR3,R4,R7 [R3]+[R4], Setconditioncodes BRZ,aLabelBranchifzero,setAnnulbitto1 FCMPF1,F5FP:Compare[F2]and[F5] FADDF2,F3,F6FP:F6[F2]+[F3] FMOVsF3,F4MovesingleprecisionoperandfromF3toF4... LabelFSUBF2,F3,F6FP:F6[F2]  [F3] LDSWR3,R4,R7Loadsinglewordatlocation[R3]+[R4]intoR7... (a) Program fragment ADDccR3,R4,R7 BRZ,aLabel FCMPF1,F5 FSUBF2,F3,F6 (b) Instruction grouping, branch taken ADDccR3,R4,R7 BRZ,aLabel FCMPF1,F5 FADDF2,F3,F6 (c) Instruction grouping, branch not taken Figure 8.25. Example of instruction grouping.   

14 Figure 8.30. Execution flow. Internal registers and execution units Data cache External cache Main memory Instruction cache Load/store Data Instructions Elastic interface queue Instruction buffer

15 Table 8.1 Examples of SPARC instructions. InstructionDescription ADDR5,R6,R7Integeradd:R7[R5]+[R6] ADDccR2,R3,R5 [R2]+[R3],setconditioncodeflags SUBR5,Imm,R7Integersubtract: R7[R5]Imm (sign-extended) ANDR3,Imm,R5BitwiseAND: R5[R3]ANDImm (sign-extended) XORR3,R4,R5BitwiseExclusiveOR:R5[R3]XOR[R4] FADDqF4,F12,F16Floating-pointadd,quadprecision: F12[F4]+[F12] FSUBsF2,F5,F7Floating-pointsubtract,singleprecision: F7[F2][F5] FDIVsF5,F10,F18Floating-pointdivide,singleprecision, F18[F5]/[F10] LDSWR3,R5,R7 32-bitwordat[R3]+[R5]signextendedtoa64-bit value LDXR3,R5,R7 64-bitextendedwordat[R3]+[R5] LDUBR4,Imm,R5Loadunsignedbytefrommemorylocation[R4]+Imm,the byteisloadedintotheleastsignificant8bitsofregisterR5, andallhigher-orderbitsarefilledwith0s STWR3,R6,R12StorewordfromregisterR3intomemorylocation[R6]+ [R12] LDFR5,R6,F3Loada32-bitwordataddress[R5]+[R6]intofloating pointregisterF3 LDDFR5,R6,F8Loaddoubleword(two32-bitwords)ataddress[R5]+[R6] intofloatingpointregistersF8andF9 STFF14,R6,ImmStorewordfromfloating-registerF14intomemorylocation [R6]+Imm BLEicc,LabelTesttheiccflagsandbranchtoLabeliflessthanorequal tozero BZ,pnxcc,LabelTestthexccflagsandbranchtoLabelifequaltozero, branchispredictednottaken BGT,a,pticc,LabelTestthe32-bitintegerconditioncodesandbranchtoLabel ifgreaterthanzero,setannulbit,branchispredictedtaken FBNE,pnLabelTestfloating-pointstatusflagsandbranchifnotequal, Theannulbitissettozeroandthebranchispredicted nottaken            


Download ppt "F 1 E 1 F 2 E 2 F 3 E 3 F 1 E 1 F 2 E 2 F 3 E 3 I 1 I 2 I 3 I 1 I 2 I 3 Instruction (a) Sequential execution (c) Pipelined execution Figure 8.1. Basic."

Similar presentations


Ads by Google