Presentation is loading. Please wait.

Presentation is loading. Please wait.

04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;

Similar presentations


Presentation on theme: "04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;"— Presentation transcript:

1 04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8; stalls waiting for F0 SUBDF12,F8,F13; Let this instr. bypass the ADDD Enables out-of-order execution => out-of-order completion IFIDMWBEX Instr. gets stuck here Two historical schemes used in recent machines: Scoreboard dates back to CDC 6600 in 1963 Tomasulo in IBM 360/91 in 1967

2 04/03/2016 slide 2 Scoreboard pipeline IF Read operands Write Issue Int FP Add FP Mult1 FP Mult2 FP Div Mem Scoreboard ID stage Issue: Decode and check for structural hazards Read operands: wait until no data hazard, then read operands All data hazards are handled by the scoreboard mechanism

3 04/03/2016 slide 3 Scoreboard complications Out-of-order completion => WAR, WAW hazards WAR: instruction is stalled in the WB stage until a previous instruction has read the operand WAW: instruction is stalled in the Issue stage until a previous instruction has written its result Scoreboard keeps track of dependencies and state of operations

4 04/03/2016 slide 4 Scoreboard functionality Issue: Instruction is issued when: No structural hazard for a functional unit No WAW with an instruction in execution Read: Instruction reads operands when they become available (RAW) EX: normal execution Write: Instruction writes when all previous instructions have read this operand The scoreboard is updated when an instruction proceeds to a new stage

5 04/03/2016 slide 5 Data structures in the scoreboard 1.Instruction status—keeps track of in which stage an instruction is. 2.Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each FU: Busy: Indicates whether the unit is busy or not Op: Operation to perform in the unit (e.g. add or sub) Fi: Destination register name Fj, Fk: Source register names Qj, Qk: Name of functional unit producing regs Fj, Fk Rj, Rk: Flags indicating when Fj and Fk are ready 3.Register result status—Indicates which functional unit will write to each register, if any.

6 04/03/2016 slide 6 Scoreboard example

7 04/03/2016 slide 7 Detailed Scoreboard Pipeline Control

8 04/03/2016 slide 8 Scoreboard example, cycle 1

9 04/03/2016 slide 9 Scoreboard example, cycle 2 Issue 2nd load?

10 04/03/2016 slide 10 Scoreboard example, cycle 3 Issue MULT?

11 04/03/2016 slide 11 Scoreboard example, cycle 4

12 04/03/2016 slide 12 Scoreboard example, cycle 5

13 04/03/2016 slide 13 Scoreboard example, cycle 6

14 04/03/2016 slide 14 Scoreboard example, cycle 7

15 04/03/2016 slide 15 Scoreboard example, cycle 8a

16 04/03/2016 slide 16 Scoreboard example, cycle 8

17 04/03/2016 slide 17 Scoreboard example, cycle 9 Read operands for MULT & SUB Issue ADDD?

18 04/03/2016 slide 18 Scoreboard example, cycle 11 SUBD completes execution

19 04/03/2016 slide 19 Scoreboard example, cycle 12 Read operands for DIVD?

20 04/03/2016 slide 20 Scoreboard example, cycle 13 Issue ADDD

21 04/03/2016 slide 21 Scoreboard example, cycle 14

22 04/03/2016 slide 22 Scoreboard example, cycle 16 Can ADDD write result?

23 04/03/2016 slide 23 Scoreboard example, cycle 17 ADDD stalls, waiting for DIVD to read F6 Resolves a WAR hazard!

24 04/03/2016 slide 24 Scoreboard example, cycle 19

25 04/03/2016 slide 25 Scoreboard example, cycle 20

26 04/03/2016 slide 26 Scoreboard example, cycle 21

27 04/03/2016 slide 27 Scoreboard example, cycle 22 Now ADDD can safely write its result in F6

28 04/03/2016 slide 28 Scoreboard example, cycle 61

29 04/03/2016 slide 29 Scoreboard example, cycle 62

30 04/03/2016 slide 30 Limitations with scoreboards The scoreboard technique is limited by: Number of scoreboard entries (window size) Number and types of functional units Number of ports to the register bank Hazards caused by name dependencies Tomasulo’s algorithm addresses the last two limitations

31 04/03/2016 slide 31 Tomasulo’s Algorithm In IBM 360/91, 4 years after the CDC 6600 Goal: High performance without compiler support Differences between Tomasulo & Scoreboard: Control & Buffers distributed with FUs (called reservation stations) vs. centralised in Scoreboard Register names in instructions replaced by pointers to reservation station buffer (HW register renaming) Common Data Bus broadcasts results to all FUs Loads and Stores treated as FUs as well This technique has been adopted in many recent machines (e.g. PowerPC)

32 04/03/2016 slide 32 Hardware Organization

33 04/03/2016 slide 33 Three stages of Tomasulo’s Alg. 1. Issue—get instruction from FP Op Queue Issue if no structural hazard for a reservation station 2.Execution—operate on operands (EX) Execute when both operands are available; if not ready, watch Common Data Bus (CDB) for result 3.Write result—finish execution (WB) Write on CDB to all awaiting functional units; mark reservation station available Normal bus: data + destination Common Data Bus: data + source (snooping)

34 04/03/2016 slide 34 Tomasulo example, cycle 0

35 04/03/2016 slide 35 Tomasulo example, cycle 1

36 04/03/2016 slide 36 Tomasulo example, cycle 2

37 04/03/2016 slide 37 Tomasulo example, cycle 3

38 04/03/2016 slide 38 Tomasulo example, cycle 4

39 04/03/2016 slide 39 Tomasulo example, cycle 5

40 04/03/2016 slide 40 Tomasulo example, cycle 6

41 04/03/2016 slide 41 Tomasulo example, cycle 7

42 04/03/2016 slide 42 Tomasulo example, cycle 8

43 04/03/2016 slide 43 Tomasulo example, cycle 10

44 04/03/2016 slide 44 Tomasulo example, cycle 11

45 04/03/2016 slide 45 Tomasulo example, cycle 15

46 04/03/2016 slide 46 Tomasulo example, cycle 16

47 04/03/2016 slide 47 Tomasulo example, cycle 56

48 04/03/2016 slide 48 Tomasulo example, cycle 57

49 04/03/2016 slide 49 Example of WAR hazards in Tomasulo’s Algorithm Example: LFF6, 34(R2) DIVFF10, F6, F0 ADDFF6, F8, F2 ADDF can safely finish before DIVF has read register F6 because: DIVF has renamed register F6 to point at LFs functional unit LF broadcasts its result on the Common Data Bus Register renaming can thus be done: statically by the compiler dynamically by the hardware


Download ppt "04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;"

Similar presentations


Ads by Google