Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 586 Computer Architecture Lecture 3

Similar presentations


Presentation on theme: "CSE 586 Computer Architecture Lecture 3"— Presentation transcript:

1 CSE 586 Computer Architecture Lecture 3
Jean-Loup Baer CSE 586 Spring 00

2 Highlights from last week
Branch prediction Modern processors use dynamic branch prediction Becomes increasingly important because of deep pipes and multiple issue of instructions BPT (Branch prediction table) Prediction occurs during ID cycle BPT either indexed by some bits on the PC or organized cache-like BPT: either separate table or part of the “metadada” of the I-cache Use of 2-bit saturating counters for the prediction per se CSE 586 Spring 00

3 Highlights from last week (c’ed)
BTB (Branch Target Buffer) = BPT + Target address Prediction and target address “computation” occur during IF cycle Possibility of decoupling a (large) BPT and a (smaller) BTB Correlated -- or 2-level – branch prediction Relies on history of outcome of previous branches to predict current branch Many variations on the number of (shift) registers recording branch history and the number of Pattern History Tables (PHT) storing the 2-bit saturating counters CSE 586 Spring 00

4 Highlights from last week (c’ed)
Basic pipelining How to handle precise exceptions Single issue processor with multiple pipes How to handle sharing the WB stage How to avoid WAW hazards CSE 586 Spring 00

5 How to improve (decrease) CPI
Recall: CPI = Ideal CPI + CPI contributed by stalls Ideal CPI =1 for single issue machine even with multiple pipes Ideal CPI will be less than 1 if we have several pipes and we can issue (and “commit”) multiple instructions in the same cycle, i.e., Instruction Level Parallelism (ILP) CSE 586 Spring 00

6 Instruction Level parallelism (ILP)
ILP will increase throughput and decrease CPU execution time ILP will increase structural hazards Cannot issue 2 instructions to the same pipe ILP makes reduction in other stalls even more important A “bubble” costs more than the loss of a single instruction issue ILP will make the design more complex WAW and WAR hazards can occur Out-of-order completion can occur Precise exception handling is more difficult CSE 586 Spring 00

7 Where can we optimize? (control)
CPI contributed by control stalls can be decreased by compiler optimizations Reduce the number of branches: loop unrolling Speculative execution Static (software) or dynamic (hardware) branch prediction Code movement : execute instructions before knowing that they will need to be executed (beware of exceptions) Predication CSE 586 Spring 00

8 Where can we optimize? (data dependencies)
CPI contributed by data hazards can be decreased by Compiler optimizations Load scheduling, dependence analysis, software pipelining, trace scheduling Hardware (run-time) techniques Forwarding (RAW) Register renaming (WAW, WAR) CSE 586 Spring 00

9 Compiler optimizations
Basic blocks are small. Difficult to find parallelism Loop unrolling Software pipelining Avoid dependencies (data, name, control) at the instruction level to increase ILP Data dependence translates into RAW Load scheduling and code reorganization Name dependence (register allocation): Anti-dependence ( WAR) and output dependence (WAW) Control dependence = branches (predicated execution) CSE 586 Spring 00

10 Data dependencies (RAW)
Instruction (statement) Sj dependent on Si if Transitivity: Instruction j dependent on k and k dependent on i Dependence is a program property Hazards (RAW in this case) and their (partial) removals are a pipeline organization property Code scheduling goal Maintain dependence and avoid hazard (pipeline is exposed to the compiler) CSE 586 Spring 00

11 Loop dependences and optimizations
Dependence within one instance of the loop Not much opportunity to optimize Loop-carried dependences Dependence between different iterations of the loop Potential for loop unrolling and software pipelining The further the dependence distance between iterations, the more potential for loop unrolling Detecting these dependences is not easy (so-called memory dependence problem: recognizing that two addresses are the same) A compiler has to be conservative and in doubt assume the worse, i.e. assume a dependence. CSE 586 Spring 00

12 Loop-level parallelism
Parallelizing compilers have for main goal loop-level parallelism but can be used for enhancing ILP DO-all (all instances of the loop can be executed in parallel; no dependences): good for loop unrolling DO-sequential (iteration i has to finish before iteration i + 1 can start): not good for anything! DO-across (part of iteration i has to finish before iteration i + 1 can start): can apply software pipelining methods Detailed analysis outside the scope of this course CSE 586 Spring 00

13 Loop unrolling Pros Cons
Decrease loop overhead (branches, counter settings) Allows better scheduling Longer basic blocks hence better opportunities to hide latency of “long” operations and to prevent load delays Cons Increases register pressure Increases code length (I-cache occupancy) Requires prologue or epilogue CSE 586 Spring 00

14 Software pipelining Reorganize loops to avoid loop-carried dependences
new code : statements of distinct iterations of original code Take an “horizontal” slice of several DO-across iterations Original code New code Store Rx Use Rx Load Rx Iter. i Iter. i + 1 Iter. i + 2 CSE 586 Spring 00

15 Name dependence Anti dependence Si: …<- R1+ R2; ….; Sj: R1 <- …
At the instruction level, this is WAR hazard if instruction j finishes first Output dependence Si: R1 <- …; ….; Sj: R1 <- … At the instruction level, this is a WAW hazard if instruction j finishes first In both cases, not really a dependence but a “naming” problem Register renaming (compiler by register allocation, in hardware see later) CSE 586 Spring 00

16 Control dependencies Branches restrict the scheduling of instructions
Speculation (i.e., executing an instruction that might not be needed) must be: Safe (no additional exception) Legal (the end result should be the same as without speculation) Speculation can be implemented by: Compiler (code motion, predication) Hardware (branch prediction) Both (branch prediction, conditional operations) CSE 586 Spring 00

17 Static vs. dynamic scheduling
Assumptions (for now): 1 instruction issue / cycle Ideal CPI still 1, but real CPI will be closer to 1 Same techniques will be used when we look at multiple issue Several pipelines with a common IF and ID Static scheduling (optimized by compiler) When there is a stall (hazard) no further issue of instructions Of course, the stall has to be enforced by the hardware Dynamic scheduling (enforced by hardware) Instructions following the one that stalls can issue if they do not produce structural hazards or dependencies CSE 586 Spring 00

18 Dynamic scheduling Implies possibility of:
Out of order issue (recall: we say that an instruction is issued once it has passed the ID stage) and hence out of order execution Out of order completion (also possible in static scheduling but less frequent) Imprecise exceptions (will take care of it later) Example (different pipes for add/sub and divide) R1 = R2/ R (long latency) R2 = R1 + R (stall, no issue, because of RAW on R1) R6 = R7 - R (can be issued, executed and completed before the other 2) CSE 586 Spring 00

19 Issue and Dispatch Split the ID stage into: Example revisited.
Issue : decode instructions; check for structural hazards -- at least, maybe more hazards such as WAW depending on implementations -- . Stall if there are any. Instructions pass in this stage in order Dispatch: wait until no data hazards then read operands. At the next cycle a functional unit, i.e. EX of a pipe, can start executing Example revisited. R1 = R2/ R (long latency; in execution) R2 = R1 + R (issue but no dispatch because of RAW on R1) R6 = R7 - R (can be issued, executed and completed before the other 2) CSE 586 Spring 00

20 Implementations of dynamic scheduling
In order to compute correct results, need to keep track of : execution pipes register usage for read and write completion etc. Two major techniques Scoreboard (invented by Seymour Cray for the CDC 6600 in 1964) Tomasulo’s algorithm (used in the IBM 360/91 in 1967) CSE 586 Spring 00

21 Scoreboarding -- The example machine (cf. Figure 4.3 in your book)
Registers Data buses Functional units (pipes) scoreboard Control lines /status CSE 586 Spring 00

22 Scoreboard basic idea The scoreboard keeps a record of all data dependencies Keeps track of which registers are used as sources and destinations and which functional units use them The scoreboard keeps a record of all pipe occupancies The original CDC 6600 was not pipelined but conceptually the scoreboard does not depend on pipelining The scoreboard decides if an instruction can be issued Either the first time it sees it (no hazard) or, if not, at every cycle thereafter The scoreboard decides if an instruction can store its result This is to prevent WAR hazards CSE 586 Spring 00

23 An instruction goes through 5 steps
We assume that the instruction has been successfully fetched 1. Issue The execution unit must be free (no structural hazard) There should be no WAW hazard If either of these conditions is false the instruction stalls. No further issue is allowed There can be more fetches if there is an instruction fetch buffer (like there was in the CDC 6660) CSE 586 Spring 00

24 Execution steps under scoreboard control
2. Dispatch (Read operands) When the instruction is issued, the execution unit is reserved (becomes busy) Operands are read in the execution unit when they are both ready (i.e., are not results of still executing instructions). This prevents RAW hazards (this conservative approach was taken because the CDC 6600 was not pipelined) 3. Execution One or more cycles depending on functional unit latency When execution completes, the unit notifies the scoreboard it’s ready to write the result CSE 586 Spring 00

25 Execution steps under scoreboard control (c’ed)
4. Write result Before writing, check for WAR hazards. If one exists, the unit is stalled until all WAR hazards are cleared (note that an instruction in progress, i.e., whose operands have been read, won’t cause a WAR) 5. Delay Because forwarding is not implemented, there should be one unit of delay between writing and reading the same register (this restriction seems artificial to me and is “historical”). Similarly, it takes one unit of time between the release of a unit and its possible next occupancy CSE 586 Spring 00

26 Optimizations and Simplifications
There are opportunities for optimization such as: Forwarding Buffering for one copy of source operands in execution units (this allows reading of operands one at a time and minimizing the WAR hazards) We have assumed that there could be concurrent updates to (different) registers. Can be solved (dynamically) by grouping execution units together and preventing concurrent writes in the same group CSE 586 Spring 00

27 What is needed in the scoreboard
Status of each functional unit Free or busy Operation to be performed The names of the result Fi and source Fj, Fk registers Flags Rj, Rk indicating whether the source registers are ready Names Qj,Qk of the units (if any) producing values for Fj, Fk Status of result registers For each Fi the name of the unit (if any), say Pi that will produce its contents (redundant but easy to check) The instruction status Been issued, dispatched, in execution, ready to write, finished? CSE 586 Spring 00

28 Condition checking and scoreboard setting
Issue step Unit free, say Ua and no WAW Dispatch (Read operand )step Rj and Rk must be free Execution step At end ask for writing permission (no WAR) Write result Check if Pi is an Fj, Fk(Rj , Rk= no) in preceding instrs. Issue step Ua busy and record Fi,Fj,Fk Record Qj, Qk and Rj,Rk Record Pi = Ua Dispatch (Read operand) step Execution step Write result For preceding instrs, if Qj(Qk) = Ua, set Rj(Rk) to yes Ua free and Pi = 0 CSE 586 Spring 00

29 Example Load F6, 34(r2) Load f-p register F6
Load F2, 45(r3) Load latency 1 cycle MulF F0,F2,F Mult latency 10 cycles Sub F8, F6,F Add/sub latency 2 cycles DivF F10,F0,F Divide latency 40 cycles Add F6,F8,F2 Assume that the 2 Loads have been issued, the first one completed, the second ready to write. The next 3 instructions have been issued (but not dispatched). CSE 586 Spring 00

30 Instruction Issue Dispatch Executed Result written
Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes Mul F0, F2, F yes Sub F8, F6, F yes Div F10, F0, F yes Add F6,F8,F2 Functional Unit status No Name Busy Fi Fj Fk Qj Qk Rj Rk Int yes F2 r3 Mul yes F0 F2 F No Y Mul no Add yes F8 F6 F Y No Div yes F10 F0 F No Y Register result status F0 (2) F2 (1) F4 ( ) F6( ) F8 (4) F10 (5) F12 ... CSE 586 Spring 00

31 Instruction Issue Dispatch Executed Result written
Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes 1 cycle after 2nd load has written its result Sub F8, F6, F yes yes Div F10, F0, F yes Add F6,F8,F2 Functional Unit status No Name Busy Fi Fj Fk Qj Qk Rj Rk Int no Mul yes F0 F2 F Y Y Mul no Add yes F8 F6 F Y Y Div yes F10 F0 F No Y Register result status F0 (2) F2 ( ) F4 ( ) F6( ) F8 (4) F10 (5) F12 ... CSE 586 Spring 00

32 Instruction Issue Dispatch Executed Result written
Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes yes yes Div F10, F0, F yes Add F6,F8,F yes yes yes Functional Unit status Mul in execution; Sub has completed;Div issues; Add waits for writing No Name Busy Fi Fj Fk Qj Qk Rj Rk Int no Mul yes F0 F2 F Y Y Mul no Add yes F6 F8 F Y Y Div yes F10 F0 F No Y Register result status F0 (2) F2 ( ) F4 ( ) F6(4) F8 ( ) F10 (5) F12 ... CSE 586 Spring 00

33 Instruction Issue Dispatch Executed Result written
Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes yes yes Sub F8, F6, F yes yes yes yes Div F10, F0, F yes yes Add F6,F8,F yes yes yes Functional Unit status Mul is finished; Div can dispatch; Add will write at next cycle No Name Busy Fi Fj Fk Qj Qk Rj Rk Int no Mul no Mul no Add yes F6 F8 F Y Y Div yes F10 F0 F Y Y Register result status F0 ( ) F2 ( ) F4 ( ) F6(4) F8 ( ) F10 (5) F12 ... CSE 586 Spring 00

34 Instruction Issue Dispatch Executed Result written
Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes yes yes Sub F8, F6, F yes yes yes yes Div F10, F0, F yes yes Add F6,F8,F yes yes yes yes Functional Unit status No Name Busy Fi Fj Fk Qj Qk Rj Rk Only Div is not finished Int no Mul no Mul no Add no Div yes F10 F0 F Y Y Register result status F0 ( ) F2 ( ) F4 ( ) F6( ) F8 ( ) F10 (5) F12 ... CSE 586 Spring 00

35 Tomasulo’s algorithm “Weaknesses” in scoreboard:
Centralized control No forwarding (more RAW than needed) Tomasulo’s algorithm as implemented first in IBM 360/91 Control decentralized at each functional unit Forwarding Concept and implementation of renaming registers that eliminates WAR and WAW hazards CSE 586 Spring 00

36 Reservation stations With each functional unit, have a set of buffers or reservation stations Keep operands and function to perform Operands can be values or names of units that will produce the value (register renaming) with appropriate flags Not both operands have to be “ready” at the same time When both operands have values, functional unit can execute on that pair of operands When a functional unit computes a result, it broadcasts its name and the value. Might not store a result in a real register CSE 586 Spring 00

37 Tomasulo’s solution to resolve hazards
Structural hazards No free reservation station (stall at issue time). No further issue RAW dependency (detected in each functional unit --decentralized) The instruction with the dependency is stalled (but others might be issued) No WAR and WAW hazards Because of register renaming through reservation stations Forwarding Done at end of execution by use of a common (broadcast) data bus CSE 586 Spring 00

38 Example machine (cf. Figure 4.8)
From memory From I-unit Load buffers Fp registers Common data bus Store buffers Reservation stations To memory F-p units CSE 586 Spring 00

39 An instruction goes through 3 steps
Assume the instruction has been fetched 1. Issue, dispatch, and read operands Check for structural hazard (no free reservation station or no free load-store buffer for a memory operation). If there is one, stall until it is not present any longer Reserve the next reservation station Read source operands If they have values, put the values in the reservation station If they have names, store their names in the reservation station Rename result register with the name of the functional unit that will compute it CSE 586 Spring 00

40 An instruction goes through 3 steps (c‘ed)
2. Execute If any of the source operands is not ready (i.e., the reservation station holds a name), monitor the bus for broadcast of a result When both operands have values, execute 3. Write result Broadcast name of the unit and value computed. Note two more sources of structural hazard: Two reservation stations in the same functional unit are ready to execute in the same cycle: choose the “first” one Two functional units want to broadcast at the same time. Priority is encoded in the hardware CSE 586 Spring 00

41 Implementation All registers (except load buffers) contain a pair {value,tag} The tag (or name) can be: Zero (or a special pattern) meaning that the value is indeed a value The name of a load buffer The name of a reservation station within a functional unit A reservation station consists of : The operation to be performed 2 pairs (value,tag) (Vj,Qj) (Vk,Qk) A flag indicating whether the accompanying f-u is busy or not CSE 586 Spring 00

42 Instruction Issue Execute Write result
Load F6, 34(r2) yes yes yes Load F2, 45(r3) yes yes Mul F0, F2, F yes Sub F8, F6, F yes Initial Div F10, F0, F yes Add F6,F8,F yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add yes Sub (Load1) Load2 Add yes Add Add1 Load2 Add no Mul yes Mul (F4) Load2 Mul yes Div (Load1) Mul1 Register status F0 (Mul1) F2 (Load2) F4 ( ) F6(Add2 ) F8 (Add1) F10 (Mul2) F12... CSE 586 Spring 00

43 Instruction Issue Execute Write result
Load F6, 34(r2) yes yes yes Load F2, 45(r3) yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes Cycle after 2nd load has written its result Div F10, F0, F yes Add F6,F8,F yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add yes Sub (Load1) (Load2) Add yes Add (Load2) Add1 Add no Mul yes Mul (Load2) (F4) Mul yes Div (Load1) Mul1 Register status F0 (Mul1) F2 () F4 ( ) F6(Add2 ) F8 (Add1) F10 (Mul2) F12... CSE 586 Spring 00

44 Instruction Issue Execute Write result
Load F6, 34(r2) yes yes yes Load F2, 45(r3) yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes yes Cycle after sub has written its result Div F10, F0, F yes Add F6,F8,F yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add no Add yes Add (Add1) (Load2) Add no Mul yes Mul (Load2) (F4) Mul yes Div (Load1) Mul1 Register status F0 (Mul1) F2 () F4 ( ) F6(Add2 ) F8 () F10 (Mul2) F12... CSE 586 Spring 00

45 Instruction Issue Execute Write result
Load F6, 34(r2) yes yes yes Load F2, 45(r3) yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes yes Cycle after add has written its result Div F10, F0, F yes Add F6,F8,F yes yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add no Add no Add no Mul yes Mul (Load2) (F4) Mul yes Div (Load1) Mul1 Register status F0 (Mul1) F2 () F4 ( ) F6() F8 () F10 (Mul2) F12... CSE 586 Spring 00

46 Instruction Issue Execute Write result
Load F6, 34(r2) yes yes yes Load F2, 45(r3) yes yes yes Mul F0, F2, F yes yes yes Sub F8, F6, F yes yes yes Cycle after mul has written its result Div F10, F0, F yes yes Add F6,F8,F yes yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add no Add no Add no Mul no Mul yes Div (Mul1) (Load1) Register status F0 () F2 () F4 ( ) F6() F8 () F10 (Mul2) F12... CSE 586 Spring 00

47 Other checks/possibilities
In the example in the book there is no load/store dependencies but since they can happen Load/store buffers must keep the addresses of the operands On load, check if there is a corresponding address in store buffers. If so, get the value/tag from there (load buffers have tags) Better yet, have load/store functional units (still needs checking) The Tomasulo engine was intended only for f-p operations. We need to generalize to include Handling branches, exceptions etc In-order completion More general register renaming mechanisms Multiple instruction issues CSE 586 Spring 00

48 Conceptual execution on a processor with ILP
Instruction fetch and branch prediction Corresponds to IF in simple pipeline Complicated by multiple issue and the potential need to fetch more than one basic block at a time Instruction decode, dependence check, dispatch, issue Corresponds (many variations) to ID Although instructions are issued (i.e., assigned to functional units), they might not execute right away (cf. reservation stations) Instruction execution Corresponds to EX and/or MEM (with various latencies) Instruction commit Corresponds to WB but more complex because of speculation and out-of-order completion CSE 586 Spring 00

49 Register renaming - Generalities
Use a physical register file (or some other storage form, e.g., reservation station or reorder buffer) larger than the ISA logical one When instruction is decoded Give a new name to result register from the free list. The register is renamed Give source operands their physical names (from mapping table) Hence the need to: Keep a mapping table logical - physical correspondence Keep a free list of empty physical registers Note that several physical registers can be mapped to the same logical register (corresponding to instructions at different times) CSE 586 Spring 00

50 Register renaming – Scheme 1: File of physical registers
Extra set of registers At decode: Rename the result register (get from free list; update mapping table). If none available, we have a structural hazard When a physical register has been read for the last time, return it to the free list Have a counter associated with each physical register (+ when a source logical register is renamed to physical register; - when instruction uses physical register as operand; release when counter is 0) Simpler to wait till logical register has been assigned a new name by a later instruction and that later instruction has been committed CSE 586 Spring 00

51 Scheme 1: Example Before: add r3,r3,4 after add r17,r3,4
add r4,r7,r add r18,r7,r17 add r3, r2, r add r19,r2,r7 Free list r17,r18,r19 … At this point r3 is r2, r3, r4, r7 not renamed yet remapped from r17 to r19 When r19 commits, r17 will be returned to the free list (never needed again) CSE 586 Spring 00

52 Register renaming – Scheme 2; Reorder buffer
Use of a reorder buffer Reorder buffer = circular queue with head and tail pointers At issue (renaming time), an instruction is assigned an entry at the tail of the reorder buffer which becomes the name of (or a pointer to) the result register At end of functional-unit computation, value is put in the instruction reorder buffer’s position When the instruction reaches the head of the buffer, its value is stored in the logical or physical (other reorder buffer entry) register. Still need of a mapping table CSE 586 Spring 00

53 Scheme 2: Example Before: add r3,r3,4 after add rob6,r3,4
add r4,r7,r add rob7,r7,rob6 add r3, r2, r add rob8,r2,r7 Assume reorder buffer is initially at position 6 CSE 586 Spring 00

54 Data dependencies with register renaming
Register renaming does not get rid of RAW dependencies Still need for forwarding or for indicating whether a register has received its value Register renaming gets rid of WAW and WAR dependencies The reorder buffer, as its name implies, can be used for in-order completion CSE 586 Spring 00

55 More on reorder buffer Tomasulo’s scheme can be extended with the possibility of completing instructions in order Reorder buffer entry contains (this is not the only possible solution) Type of instruction (branch, store, ALU, or load) Destination (none, memory address, register) Value and its presence/absence Replaces load/store buffers Reservation station tags and “true” register tags are now ids of entries in the reorder buffer 5/19/2019 CSE 586 Spring 00

56 Example machine revisited
Reorder buffer From memory & CDB From I-unit To memory Fp registers Reservation stations To CDB F-p units 5/19/2019 CSE 586 Spring 00

57 Need for 4 stages In Tomasulo’s solution 3 stages: issue, execute, write Now 4 stages: issue, execute, write, commit Dispatch and Issue Check for structural hazards (reservation stations busy, reorder buffer full). If one exists, stall the instruction and those following If dispatch possible, send values to reservation station if the values are available in either the registers or the reorder buffer. Otherwise send tag. Allocate an entry in the reorder buffer and send its number to the reservation station When both operands are ready, issue to functional unit 5/19/2019 CSE 586 Spring 00

58 Need for 4 stages (c’ed) Execute Write Commit
Broadcast on common data bus the value and the tag (reorder buffer number). Reservation stations, if any match the tag, and reorder buffer (always) grab the value. Commit When instr. at head of the reorder buffer has its result in the buffer it stores it in the real register (for ALU) or memory (for store). The reorder buffer entry (and/or physical register) is freed. 5/19/2019 CSE 586 Spring 00

59 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes Mul F0, F2, F yes Sub F8, F6, F yes Div F10, F0, F yes Add F6,F8,F yes Reservation Stations Initial Name Busy Fm Vj Vk Qj Qk Add yes Sub (#1) (#2) Add yes Add (#4) (#2) Add no Mul yes Mul (F4) (#2) Mul yes Div (#1) (#3) Register status F0 (#3) F2 (#2) F4 ( ) F6(#6 ) F8 (#4) F10 (#5) F12... CSE 586 Spring 00

60 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes Cycle after 2nd load has committed Div F10, F0, F yes Add F6,F8,F yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add no Add yes Add (#2) (#4) Add no Mul yes Mul (#2) (F4) Mul yes Div (#1) (#3) Register status F0 (#3) F2( ) F4 ( ) F6(#6 ) F8 (#4) F10 (#5) F12... CSE 586 Spring 00

61 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes yes Div F10, F0, F yes Cycle after sub has written its result in reorder buffer but can’t commit yet Add F6,F8,F yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Add no Add yes Add (#2) (#4) Add no Mul yes Mul (#2) (F4) Mul yes Div (#1) (#3) Still waiting for #3 to commit Register status F0 (#3) F2( ) F4 ( ) F6(#6 ) F8 (#4) F10 (#5) F12... CSE 586 Spring 00

62 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes Sub F8, F6, F yes yes yes Div F10, F0, F yes Add F6,F8,F yes yes yes Reservation Stations Cycle after add has written its result in reorder buffer but cannot commit Name Busy Fm Vj Vk Qj Qk Add no Add no Add no Mul yes Mul (#2) (F4) Mul yes Div (#1) (#3) Still waiting for #3 to commit Register status F0 (#3) F2( ) F4 ( ) F6(#6 ) F8 (#4) F10 (#5) F12... Still waiting for #3,#4, #5 to commit CSE 586 Spring 00

63 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes yes yes Sub F8, F6, F yes yes yes Div F10, F0, F yes yes Add F6,F8,F yes yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Cycle after mul has written its result and committed Add no Add no Add no Mul no Mul yes Div (#3) (#1) Still waiting for #3 to commit Register status F0 () F2( ) F4 ( ) F6(#6 ) F8 (#4) F10 (#5) F12... Still waiting for #3,#4, #5 to commit CSE 586 Spring 00

64 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes yes yes Sub F8, F6, F yes yes yes yes Div F10, F0, F yes yes Add F6,F8,F yes yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk Now #3 can commit Add no Add no Add no Mul no Mul yes Div (#3) (#1) Register status F0 () F2( ) F4 ( ) F6(#6 ) F8 () F10 (#5) F12... Still waiting for #4, #5 to commit CSE 586 Spring 00

65 Entry # Instruction Issue Execute Write result Commit
Reorder buffer Entry # Instruction Issue Execute Write result Commit Load F6, 34(r2) yes yes yes yes Load F2, 45(r3) yes yes yes yes Mul F0, F2, F yes yes yes yes Sub F8, F6, F yes yes yes yes Div F10, F0, F yes yes Add F6,F8,F yes yes yes Reservation Stations Name Busy Fm Vj Vk Qj Qk The next “interesting event is completion of div; then commit of #5, then commit of #6 Add no Add no Add no Mul no Mul yes Div (#3) (#1) Register status F0 () F2( ) F4 ( ) F6(#6 ) F8 () F10 (#5) F12... Still waiting for #4, #5 to commit CSE 586 Spring 00

66 What’s next? Branches Exceptions
Impact of multiple issue on IF, ID, EX etc. CSE 586 Spring 00


Download ppt "CSE 586 Computer Architecture Lecture 3"

Similar presentations


Ads by Google