Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 620 NOTE8 1 Instruction Level Parallelism and Tomasulo’s approach.

Similar presentations


Presentation on theme: "CSCI 620 NOTE8 1 Instruction Level Parallelism and Tomasulo’s approach."— Presentation transcript:

1 CSCI 620 NOTE8 1 Instruction Level Parallelism and Tomasulo’s approach

2 CSCI 620 NOTE8 2 Instruction Level Parallelism Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls Reduce stalls, reduce CPI Reduce CPI, increase IPC Instruction-level parallelism (ILP) seeks to reduce stalls Importance of ILP is more visible in Loop-level parallelism: for (i=1; i<1000;i=i+1) { x[i]= x[i] + y[i]; }

3 CSCI 620 NOTE8 3 Major Techniques to increase ILP TechniquesReducesSection Forwarding and bypassingPotential data hazard stalls Delayed branches and simple branch scheduling Control hazard stalls Basic dynamic scheduling (scoreboarding)Data hazard stalls from true dependences Dynamic scheduling with renamingData hazard stalls and stalls from antidependences and output dependences Dynamic branch predictionControl stalls Issuing multiple instructions per cycleIdeal CPI SpeculationData hazards and control hazard stalls Dynamic memory disambiguationData hazard stalls with memory Loop unrollingControl hazard stalls Basic compiler pipeline schedulingData hazard stalls Compiler dependence analysisIdeal CPI, data hazard stalls Software pipelining, trace schedulingIdeal CPI, data hazard stalls Compiler speculationIdeal CPI, data, control stalls

4 CSCI 620 NOTE8 4 Instruction Level Parallelism ILP by SW (static) or HW (dynamic) techniques HW intensive ILP dominates desktop and server markets SW compiler intensive approaches more likely seen in embedded systems—but IA-64 uses the approach

5 CSCI 620 NOTE8 5 Dependences Two instructions are parallel if they can execute simultaneously in a pipeline without causing any stalls (assuming no structural hazards) and can be reordered Two instructions that are dependent are not parallel and cannot be reordered—must be executed in-order—even though they can be partially overlapped Three types of dependences –Data dependences(=true data dependences) –Name dependences –Control dependences

6 CSCI 620 NOTE8 6 Dependences Dependences are properties of programs Whether a dependence results in an actual hazard(& the length of stalls) are properties of the pipeline organization Dependence 1)indicates the potential for a hazard 2)Determines the order in which results must be calculated 3)Sets an upperbound for ILP Problems caused by Dependences can be solved by: 1)Try to avoid by rescheduling 2)Eliminate by transforming the code (alter the code) Compiler concerned about dependences in program, whether or not a HW hazard occurs depends on a given pipeline

7 CSCI 620 NOTE8 7 Review of Data Hazards Consider instructions i and j, where i occurs before j. RAW (read after write) — j tries to read a source before i writes it, so j gets the old value WAW (write after write) — j tries to write an operand before it is written by i (only possible in pipelines that write in more than one pipe stage or allow an instruction to proceed even when a previous instruction is stalled) WAR (write after read) — j tries to write a destination before it is read by i, so i incorrectly gets the new value (only possible when some instructions can write results early in the pipeline and other instructions can read sources late in the pipeline)

8 CSCI 620 NOTE8 8 (1) Data Dependences (True) Data dependences –Instruction i produces a result used by instruction j(directly), or –Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i (inderectly). j  k  i j  i Easy to determine in cases of registers (fixed names) Harder to determine for memory: –Does 100(R4) = 20(R6)? –From different loop iterations, does 20(R4) = 20(R4)? –Will see hardware technique in chap 2 i: ADD.D F0, F2, F4 j: SUB.D F6, F0, F8

9 CSCI 620 NOTE8 9 (2) Name Dependences Second type of dependences called name dependence: two instructions use same name (same register or memory location) but don’t exchange data Antidependence –Instruction j writes a register or memory location that instruction i reads from and instruction i must be executed first—if not, then WAR hazard Output dependence –Instruction i and instruction j write the same register or memory location; ordering between instructions must be preserved—if not, then WAW * Name Dependences are harder to handle for memory accesses –Does 100(R4) = 20 (R6)? –From different loop iterations, does 20(R4) = 20(R4)? i : ADD.D F0, F2, F4 j : SUB.D F2, F6, F8 i : ADD.D F0, F2, F4 j : SUB.D F0, F6, F8

10 CSCI 620 NOTE8 10 Register Renaming eliminates WAR & WAW Assuming temporary registers S and T : DIV.D F0, F2, F4DIV.D F0, F2, F4 ADD.D F6, F0, F8 ADD.D S, F0, F8 S.D F6, 0(R1) S.D S, 0(R1) SUB.D F8, F10, F14 SUB.D T, F10, F14 MUL.D F6, F10, F8MUL.D F6, F10, T (True) Data Dependences ? Antidependences(WAR) ? Output dependences(WAW) ? Which dependences are eliminated by renaming?  Subsequent F8 must be replaced by T  How about F6? Not needed to be replaced as F8 because MULT.D will change F6 (True) Data Dependences= (1) DIV.D— ADD.D (2) ADD.D—S.D (3) SUB.D— MUL.D Antidependences = ADD.D—SUB.D Output dependences = ADD.D—MUL.D Register renaming WAR & WAW are eliminated by register renaming— will be implemented in hardware

11 CSCI 620 NOTE8 11 (3) Control Dependence Final kind of dependence called control dependence Example if pl {S1; }; if p2 {S2; } S1 is control dependent on p1 and S2 is control dependent on p2 but not on p1. Note that S2 could be data dependent on S1.

12 CSCI 620 NOTE8 12 Control Dependences Two (obvious) constraints on control dependences: –An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch –An instruction that is not control dependent on a branch cannot be moved to after the branch so that its execution is controlled by the branch if p1 {S1; }; if p2 {S2; } S1; if p1 {S1; }; if p2 {S2; } if pl {S1; }; S3; if p2 {S2; } if pl {S1; }; S3; if p2 {S2; } S3

13 CSCI 620 NOTE8 13 Limitations of Scoreboarding(Scoreboard hardware onnext slide) No forwarding hardware Limited to instructions in basic block (small window) Small number of functional units (structural hazards), especially integer/load/store units—only one each Can not issue if structural or WAW hazards Must wait until WAR hazards resolved Imprecise exceptions due to out-of-order execution Improvement? Tomasulo’s Approach

14 CSCI 620 NOTE8 14 Figure A.50 The basic structure of a MIPS processor with a scoreboard  Scoreboard Integer unit FP add FP divide FP mult RegistersData buses Control/status Data flows Control/status flows Scoreboard originally proposed in CDC6600 (Seymore Cray,1964) Scoreboard Hardware— centralized control by Scoreboard

15 CSCI 620 NOTE8 15 Busy – Indicates whether the unit is busy or not Op – Operation to perform in the unit (e.g., add or subtract) Fi – Destination register Fj, Fk – Source-register numbers Qj, Qk – Functional units producing source registers Fj, Fk Rj, Rk – Flags indicating when Fj, Fk are available and not yet read.

16 CSCI 620 NOTE8 16 Tomasulo’s Algorithm For IBM 360/91 about 3 years after CDC 6600 (Late 1960s) Goal: High performance without special compilers Differences between Tomasulo’s Algorithm & Scoreboard (Similar to Scoreboarding, but added Register Renaming) –Control & buffers (called “reservation stations”) distributed with functional units vs. centralized in scoreboard—Scoreboard/Inst buffer  Reservation Stations for each FU –Registers in instructions replaced by pointers to reservation station buffer –HW renaming of registers to avoid WAR, WAW hazards –Common data bus (CDB) broadcasts results to functional units –Load and stores treated as functional units as well  Very Importantly – Tomasulo’s algorithm are adopted to many modern CPUs; Alpha 21264, HP PA-8000, MIPS R10K, Pentium III, Pentium 4, PowerPC 604, etc…

17 CSCI 620 NOTE8 17 Key concept: Reservation Stations(RS) Distributed (rather than centralized) control scheme – Bypassing(data directly to RS rather than via registers) is allowed via Common Data Bus (CDB) to RS – Register Renaming eliminates WAR/WAW hazards Scoreboard/Instruction Buffer => Reservation Stations – Fetch and Buffer operands as soon as available Eliminates need to always get values from registers at execute – Pending instructions designate reservation stations that will provide their inputs – Successive writes to a register cause only the last one to update the register

18 CSCI 620 NOTE8 18 MIPS Floating-point unit using Tomasulo’s Algorithm

19 CSCI 620 NOTE8 19 Details Each reservation station holds instructions that has been issued and waiting for execution—an instruction may already have all the operands or it has the name(s) of RS or the names of load buffers which will provide them. These name fields are called “tags”—4-bits each to denote one of 5 RSs & 6 Load buffers—RSs are used for renaming Load buffer & Store buffer behave almost exactly like RS All results from the FUs and from memory are sent on the Common Data Bus which is connected to everywhere except the Load buffer

20 CSCI 620 NOTE8 20 Three Stages of Tomasulo’s Algorithm 1. Issue: Get the next instruction from FP operation queue (FIFO) If reservation station free (if Not free  stall (=structural hazard)), issues instruction & sends operands (if available in register, else provide name of FU(=renaming)). Avoids WAR & WAW 2. Execution: Operate on operands (EX) When both operands ready(already in Vj/Vk or from CDB), get them, then execute; if not ready, watch common data bus for result. RAW avoided 3. Write result: Finish execution (WB) Write on common data bus so that all awaiting FUs can hear; mark reservation station as available. Common data bus: 64 bit data + 4 bit source (“come from”)

21 CSCI 620 NOTE8 21 Data Buses in Tomasulo’s Algorithm Compare to Normal data bus which has: data + destination (“go to” bus) CDB(Common Data Bus): data + source (“come from” bus) – 64 bits of data + 4 bits of Functional Unit source address (RS’s number) – Any receiving unit(Store buffer, RSs, FP registers) will accept(Write) if the RS’s number matches the expected number

22 CSCI 620 NOTE8 22 Reservation Station Components Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy Register result status – Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register.

23 CSCI 620 NOTE8 23 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

24 CSCI 620 NOTE8 24 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

25 CSCI 620 NOTE8 25 Load & Store require 2 steps: Step 1: Compute effective addr(ea) Step 2: Place ea in buffer Execution(Load or Store) can start when memory unit is not busy Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

26 CSCI 620 NOTE8 26 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

27 CSCI 620 NOTE8 27 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

28 CSCI 620 NOTE8 28 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

29 CSCI 620 NOTE8 29 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

30 CSCI 620 NOTE8 30 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

31 CSCI 620 NOTE8 31 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

32 CSCI 620 NOTE8 32 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

33 CSCI 620 NOTE8 33 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

34 CSCI 620 NOTE8 34 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

35 CSCI 620 NOTE8 35 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

36 CSCI 620 NOTE8 36 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

37 CSCI 620 NOTE8 37 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

38 CSCI 620 NOTE8 38 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

39 CSCI 620 NOTE8 39 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

40 CSCI 620 NOTE8 40 Wait until DIVD finishes Divide takes 40 cycles

41 CSCI 620 NOTE8 41 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

42 CSCI 620 NOTE8 42 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

43 CSCI 620 NOTE8 43 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

44 CSCI 620 NOTE8 44 Why take longer on scoreboard of CDC 6600? Structural Hazards Lack of forwarding Both in-order issue and out-of-order execution Scoreboard cannot handle WAR & WAW Tomasulo can with register renaming Both will stall with Branch instruction—later see Tomasulo with Speculation Assuming(for Scoreboard): Add takes 2 clock cycles, multiply=10, divide=40 Scoreboard Tomasulo Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

45 CSCI 620 NOTE8 45 Let’s try this site-- http://www.ecs.umass.edu/ece/koren/architecture/ Tomasulo/AppletTomasulo.html http://www.ecs.umass.edu/ece/koren/architecture/ Tomasulo/AppletTomasulo.html

46 CSCI 620 NOTE8 46

47 CSCI 620 NOTE8 47 Tomasulo’s Algorithm: A Loop-Based Example Loop:LD F0 0(R1) MULTD F4 F0 F2 SD F4 0(R1) SUBI R1 R1 #8 BNEZ R1 Loop Multiply takes 4 clocks Assume first load takes 8 clocks (cache miss), second load takes 1 clock (hit)—on a cache miss, a block(several words) is brought into the cache Reality: integer instructions run ahead

48 CSCI 620 NOTE8 48 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

49 CSCI 620 NOTE8 49 Cache miss occurs, so LD must wait for 8 cycles Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

50 CSCI 620 NOTE8 50 Cache miss occurs, so LD must wait for 8 cycles Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

51 CSCI 620 NOTE8 51 Cache miss occurs, so LD must wait for 8 cycles Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

52 CSCI 620 NOTE8 52 Cache miss occurs, so LD must wait for 8 cycles Since SUBI is executed by Integer unit, it is not shown here—we only show the FP unit here Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

53 CSCI 620 NOTE8 53 Cache miss occurs, so LD must wait for 8 cycles Since BNEZ is executed by Integer unit, it is not shown here—we only show the FP unit here Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

54 CSCI 620 NOTE8 54 Cache miss occurs, so LD must wait for 8 cycles This is “register renaming” Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

55 CSCI 620 NOTE8 55 Cache miss occurs, so LD must wait for 8 cycles This is “register renaming” Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

56 CSCI 620 NOTE8 56 Cache miss occurs, so LD must wait for 8 cycles Higher ILP ! Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

57 CSCI 620 NOTE8 57 Cache is finally ready, so read from memory Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

58 CSCI 620 NOTE8 58 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

59 CSCI 620 NOTE8 59 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

60 CSCI 620 NOTE8 60 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

61 CSCI 620 NOTE8 61 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

62 CSCI 620 NOTE8 62 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

63 CSCI 620 NOTE8 63 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

64 CSCI 620 NOTE8 64 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

65 CSCI 620 NOTE8 65 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

66 CSCI 620 NOTE8 66 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

67 CSCI 620 NOTE8 67 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

68 CSCI 620 NOTE8 68 Op – Operation to perform in the unit (e.g., + or – ) Qj, Qk – The name of Reservation stations that will produce source registers—no values stored here Vj, Vk – Registers that store the Value of source operands—temp registers for renaming Busy – Indicates reservation station and FU is busy

69 CSCI 620 NOTE8 69 Tomasulo Summary Reservation stations: renaming to larger set of registers + buffering source operands –Prevents registers becoming bottleneck –Distribute RAW hazard detection—to RSs –Avoids WAR, WAW hazards of scoreboard by Register Renaming –Allows loop unrolling in HW –Tag match in CDB requires many associative compares –Common Data Bus  Achilles heal of Tomasulo  Multiple writebacks (multiple CDBs) expensive

70 CSCI 620 NOTE8 70 Tomasulo Summary Lasting Contributions—Most of modern processors employ the algorithm –Dynamic scheduling –Register renaming –Load/store disambiguation– Load address compared with store address in store buffer If match found load instruction is not sent to load buffer—avoids which hazard? RAW 360/91 descendants are Pentium III, IV; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264


Download ppt "CSCI 620 NOTE8 1 Instruction Level Parallelism and Tomasulo’s approach."

Similar presentations


Ads by Google