Download presentation

Presentation is loading. Please wait.

Published byGabriella Hicken Modified about 1 year ago

1
ECE 2162 Tomasulo’s Algorithm

2
Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available to satisfy data dependences –Removes name dependences through register renaming –Very similar to what is used today Almost all modern high-performance processors use a derivative of Tomasulo’s… much of the terminology survives to today. 2

3
Tomasulo’s Algorithm: The Picture 3

4
Issue (1) –Get next instruction from instruction queue. –Find a free reservation station for it (if none are free, stall until one is) –Read operands that are in the registers –If the operand is not in the register, find which reservation station will produce it –In effect, this step renames registers (reservation station IDs are “temporary” names) 4

5
Issue (2) F2=F4+F1 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) F1 F2 F3 F4 RAT F1 F2 F3 F4 Reg File To-Do list (from last slide): Get next inst from IB’s Find free reservation station Read operands from RF Record source of other operands Update source mapping (RAT) To-Do list (from last slide): Get next inst from IB’s Find free reservation station Read operands from RF Record source of other operands Update source mapping (RAT) F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F3 Instruction Buffers

6
Issue (2) F2=F4+F1F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) F1 F2 F3 F4 RAT F1 F2 F3 F4 Reg File To-Do list (from last slide): Get next inst from IB’s Find free reservation station Read operands from RF Record source of other operands Update source mapping (RAT) To-Do list (from last slide): Get next inst from IB’s Find free reservation station Read operands from RF Record source of other operands Update source mapping (RAT) (1)2.718 F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F3 Instruction Buffers

7
Execute (1) –Monitor results as they are produced –Put a result into all reservation stations waiting for it (missing source operand) –When all operands available for an instruction, it is ready (we can actually execute it) –Several ready instrs for one functional unit? Pick one. Except for load/store Load/Store must be done in the proper order to avoid hazards through memory (more loads/stores this in a later lecture) 7

8
Execute (2) F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs (1)2.718 F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F3 Instruction Buffers F1 F2 F3 F4 RAT F1 F2 F3 F4 Reg File F2=F4+F1 8

9
3 Execute (2) F4=F1-F2 F1=F2+F3 F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) (4)(1) To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs (1)2.718 F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F3 Instruction Buffers F1 F2 F3 F4 RAT F1 F2 F3 F4 Reg File F2=F4+F1 9

10
Execute (2) F4=F1-F2 F1=F2+F3 F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) (4)(1) To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs (1)2.718 F2=F4+F1 (1) F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F3 Instruction Buffers F1 F2 F3 F4 RAT F1 F2 F3 F4 Reg File F2=F4+F1 10

11
Execute (2) F4=F1-F2 F1=F2+F3 F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) (4)(1) To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs To-Do list (from last slide): Monitor results from ALUs Capture matching operands Compete for ALUs (1)2.718 F2=F4+F1 (1) F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F3 Instruction Buffers F1 F2 F3 F4 RAT F1 F2 F3 F4 Reg File 11

12
Execute (3) More than one ready inst for the same unit F4=F3-F2 F1=F2+F3 F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) (1) (1) F2=F4+F1 (1) Common heuristic: oldest first You can do whatever: it only affects performance, not correctness 12

13
Write Result (1) –When result is computed, make it available on the “common data bus” (CDB), where waiting reservation stations can pick it up –Result stored in the register file –Stores write to memory –This step frees the reservation station –For our register renaming, this recycles the temporary name (future instructions can again find the value in the actual register, until it is renamed again) 13

14
Write Result (2) F4=F1-F2 F1=F2+F3 F1=F2/F3 AdderFP-Cmplx A1 (1) A2 (2) A3 (3) C1 (4) C2 (5) (4) To-Do list (from last slide): Broadcast on CDB Writeback to RF Update Mapping Free reservation station To-Do list (from last slide): Broadcast on CDB Writeback to RF Update Mapping Free reservation station F1 F2 F3 F4 Reg File F1 F2 F3 F4 RAT (1) F2=F4+F (1) F1 = F2 + F3 F4 = F1 – F2 F1 = F2 / F F2 = F4 + F Only update RAT (and RF) if RAT still contains your mapping! Only update RAT (and RF) if RAT still contains your mapping! X 14

15
Tomasulo’s Algorithm: Load/Store The reservation stations take care of dependences through registers. Dependences also possible through memory –Loads and stores not reordered in original IBM 360 –We’ll talk about how to do load-store reordering later 15

16
Detailed Example – Cycle 0 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F2 Is ExW BusyOpVjVkQjQkA Reservation Stations … F0F2F4F6F8F10F12 RAT: LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is … Architecture Reg File: 16

17
Detailed Example – Cycle 1 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F2 1 Is ExW 1L.D134 BusyOpVjVkQjQkA Reservation Stations LD1… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is 2.5 RAT: 2.5… Architecture Reg File: 17

18
Detailed Example – Cycle 2 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW 1L.D134 1L.D245 BusyOpVjVkQjQkA Reservation Stations LD2LD1… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles 2 Assume R2 is 100 R3 is 200 F4 is 2.5 RAT: 2.5… Architecture Reg File: 18

19
Detailed Example – Cycle 3 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW 1L.D134 1L.D245 1MUL.D2.5LD2 BusyOpVjVkQjQkA Reservation Stations ML1LD2LD1… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles 2 3 Assume R2 is 100 R3 is 200 F4 is RAT: 2.5… Architecture Reg File: 19

20
Detailed Example – Cycle 4 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW 0 1L.D245 1SUB.D0.5LD2 1MUL.D2.5LD2 BusyOpVjVkQjQkA Reservation Stations ML1LD2AD1… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 20

21
Detailed Example – Cycle 5 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW 0 0 1SUB.D MUL.D DIV.D0.5ML1 BusyOpVjVkQjQkA Reservation Stations ML1AD1ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 21

22
Detailed Example – Cycle 6 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW 0 0 1SUB.D ADD.D1.5AD1 1MUL.D DIV.D0.5ML1 BusyOpVjVkQjQkA Reservation Stations ML1AD2AD1ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 22

23
Detailed Example – Cycle 8 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW ADD.D MUL.D DIV.D0.5ML1 BusyOpVjVkQjQkA Reservation Stations ML1AD2ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 23

24
Detailed Example – Cycle 9 1. L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW ADD.D MUL.D DIV.D0.5ML1 BusyOpVjVkQjQkA Reservation Stations ML1AD2ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 24

25
Detailed Example – Cycle L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW MUL.D DIV.D0.5ML1 BusyOpVjVkQjQkA Reservation Stations ML1ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 25

26
Detailed Example – Cycle L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW DIV.D BusyOpVjVkQjQkA Reservation Stations ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 26

27
Detailed Example – Cycle L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW DIV.D BusyOpVjVkQjQkA Reservation Stations ML2… F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 27

28
Detailed Example – Cycle L.DF6, 34(R2) 2. L.DF2, 45(R3) 3. MUL.DF0, F2, F4 4. SUB.DF8, F2, F6 5. DIV.DF10,F0,F6 6. ADD.DF6, F8, F Is ExW BusyOpVjVkQjQkA Reservation Stations … F0F2F4F6F8F10F12 LD1 LD2 AD1 AD2 AD3 ML1 ML2 Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Assume R2 is 100 R3 is 200 F4 is RAT: … Architecture Reg File: 28

29
Timing Example Kind of hard to keep track with previous table-based approach Simplified version to track timing only F6,34(R2)124L.D OperandsIsExecWrCommentsInst F2, 45(R3)235L.D F0,F2,F43616MUL.D F8,F2,F6468SUB.D F10,F0,F651757DIV.D F6,F8,F26911ADD.D Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles Load: 2 cycles Add: 2 cycles Mult: 10 cycles Divide: 40 cycles 29

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google