Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.

Similar presentations


Presentation on theme: "CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding."— Presentation transcript:

1 CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding

2 Idea: use intermediate data, do not wait for result to be finally written to the destination register. Two steps: 1. Detect data hazard 2. Forward intermediate data to resolve hazard

3 Review: MIPS Pipeline Data and Control Paths Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control ALU cntrl RegWrite MemWriteMemRead MemtoReg RegDst ALUOp ALUSrc Branch PCSrc How many bits wide is each pipeline register? PC – 32 bits IF/ID – 64 bits ID/EX – 9 + 32x4 + 10 = 147 EX/MEM – 5 + 1 + 32x3 + 5 = 107 MEM/WB – 2 + 32x2 + 5 = 71

4 Pipelined Datapath with Control II (as before) Control signals emanate from the control portions of the pipeline registers

5 Data Forwarding Plan: allow inputs to the ALU not just from ID/EX, but also later pipeline registers, and use multiplexors and control signals to choose appropriate inputs to ALU sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Fig 6.29 Dependencies between pipelines move forward in time

6 Possible Hazard Conditions can be detected by following notations during forwarding technique Hazard conditions: 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Eg., in the earlier example (Fig. 6-29), first hazard between sub $2, $1, $3 and and $12, $2, $5 is detected when the and is in EX stage and the sub is in MEM stage because EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a) Similar to above this time dependency between “sub” & “or” can be detected as MEM/WB.RegisterRd = ID/EX.RegisterRt = $2 (2b) The two dependencies on “sub”-”add” are not hazard Another form of forwarding but it occurs within the reg file There is no hazard between sub-sw

7 Remarks: -We don’t have any WB hazard Why? Assume that REG. file supplies the correct result if the next instruction in the ID stage can read the register written by the current instruction in the WB stage -Whether to forward also depends on: if the later instruction is going to write a register – if not, no need to forward, even if there is register number match as in conditions above If the destination register of the later instruction is $0 – in which case there is no need to forward value ($0 is always 0 and never overwritten)

8 Forwarding Hardware To Detect hazard forwarding unit should be added by inserting Mux’es to the ALU inputs (see Fig 6.30) For forwarding just the R-type instrucutions initially sub, add, and, …. Forward A and Forward B control lines to select MUX inputs that will go into ALU Forwarding unit will be in EX stage because the ALU forwarding MUX’es are found in that stage

9 Forwarding Hardware FIGURE 6.31 The control values for the forwarding multiplexors in Figure 6.30. The signed immediate that is another input to the ALU is described in the Elaboration at the end of this section.

10 Data Forwarding (Bypassing) Take the result from the earliest point that it exists in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by adding multiplexors to the inputs of the ALU connecting the Rd write data in EX/MEM or MEM/WB to either (or both) of the EX’s stage Rs and Rt ALU mux inputs adding the proper control hardware to control the new muxes Other functional units may need similar forwarding logic (e.g., the DM) With forwarding can achieve a CPI of 1 even in the presence of data dependencies

11 Forwarding Hardware Datapath before adding forwarding hardware Datapath after adding forwarding hardware FIGURE 6.30 On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths, and we show the forwarding unit. The new hardware is shown in color. This figure is a stylized drawing, how ever, leaving out details from the full datapath such as the sign extension hardware. Note that the ID/EX. RegisterRt field is shown twice, once to connect to the mux and once to the forwarding unit, but it is a single signal. As in the earlier discussion, this ignores forwarding of a store value to a store instruction. Also note that this mechanism works for slt instructions as well.

12 Elaboration on Forwarding Hardware FIGURE 6.33 The datapath modified to resolve hazards via forwarding. Compared with the datapath in Figure 6.30, the additions are the multiplexors to the inputs to the ALU. This figure is a more stylized drawing, however, leaving out details from the full datapath, such as the branch hardware and the sign extension hardware.

13 stall Review: One Way to “Fix” a Data Hazard I n s t r. O r d e r add $1, ALU IM Reg DMReg sub $4,$1,$5 and $6,$7,$1 ALU IM Reg DMReg ALU IM Reg DMReg Fix data hazard by waiting – stall – but impacts CPI

14 Review: Another Way to “Fix” a Data Hazard I n s t r. O r d e r add $1, ALU IM Reg DMReg sub $4,$1,$5 and $6,$7,$1 ALU IM Reg DMReg ALU IM Reg DMReg Fix data hazards by forwarding results as soon as they are available to where they are needed sw $4,4($1) or $8,$1,$1 ALU IM Reg DMReg ALU IM Reg DMReg Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.

15 Data Forwarding Control Conditions 1. EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 Forwards the result from the previous instr. to either input of the ALU Forwards the result from the second previous instr. to either input of the ALU 2. MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

16 Forwarding Illustration I n s t r. O r d e r add $1, sub $4,$1,$5 and $6,$7,$1 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg EX/MEM hazard forwarding MEM/WB hazard forwarding Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.

17

18 Yet Another Complication! I n s t r. O r d e r add $1,$1,$2 ALU IM Reg DMReg add $1,$1,$3 add $1,$1,$4 ALU IM Reg DMReg ALU IM Reg DMReg Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded?

19 Yet Another Complication! I n s t r. O r d e r add $1,$1,$2 ALU IM Reg DMReg add $1,$1,$3 add $1,$1,$4 ALU IM Reg DMReg ALU IM Reg DMReg Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded? Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded. Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded.

20 Corrected Data Forwarding Control Conditions 2. MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

21 Datapath with Forwarding Hardware PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control ALU cntrl Branch Forward Unit ID/EX.RegisterRt ID/EX.RegisterRs EX/MEM.RegisterRd MEM/WB.RegisterRd MEM/WB.RegWrite EX/MEM.RegWrite

22 Forwarding Hardware with Control Datapath with forwarding hardware and control wires – certain details, e.g., branching hardware, are omitted to simplify the drawing Note: so far we have only handled forwarding to R-type instructions…! Called forwarding unit, not hazard detection unit, because once data is forwarded there is no hazard!

23 Example Consider the following code sequence in which the dependencies have been highlighted sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 We’ll try to keep the example short. — We’ll skip the first two cycles, since they’re the same as before.

24 Example: Forwarding sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Execution example: Clock cycle 3 Clock cycle 4

25 Execution example (cont.): sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Clock cycle 5 Clock cycle 6 Example: Forwarding

26 Memory-to-Memory Copies I n s t r. O r d e r lw $1,4($2) ALU IM Reg DMReg sw $1,4($3) ALU IM Reg DMReg For loads immediately followed by stores (memory-to- memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input. Would need to add a Forward Unit and a mux to the memory access stage What if lw was replaced with add $1, - is forwarding still needed? From where, to where? What if $1 was used to compute the effective address (it would be a load-use data hazard and would require a stall insertion between the lw and sw)

27

28

29

30

31 Load-use Hazard Detection Unit Forwarding is not the solution for all data hazard conditions, For ex. When an instruction tries to read a register following a “lw”, that writes the same register  Problem will occur In clk 4 the correct value of reg 2 is not available at the beginning Therefore, In addition to FU, we need hazard detection unit (HDU) is to be operated in during ID stage HDU will insert stall between the “load” and its use

32 Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register therefore, we need a hazard detection unit to stall the pipeline after the load instruction Data Hazards and Stalls lw $2, 20($1) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 Slt $1, $6, $7 As even a pipeline dependency goes backward in time forwarding will not solve the hazard

33 Stalling Resolves a Hazard Same instruction sequence as before for which forwarding by itself could not resolve the hazard: lw $2, 20($1) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 Slt $1, $6, $7 Hazard detection unit inserts a 1-cycle bubble in the pipeline, after which all pipeline register dependencies go forward so then the forwarding unit can handle them and there are no more hazards AND instruction is turned into NOP all instructions beginning with AND instruction are delayed one cycle. Hazard forces the AND and OR instructions to repeat in clock cycle 4, what they did in clock cycle 3

34 Example: Forwarding with Load-use Data Hazards I n s t r. O r d e r lw $1,4($2) and $6,$1,$7 xor $4,$1,$5 or $8,$1,$9 ALU IM Reg DMReg ALU IM Reg DM ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg sub $4,$1,$5

35 stall Forwarding with Load-use Data Hazards I n s t r. O r d e r lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 xor $4,$1,$5 or $8,$1,$9 ALU IM Reg DMReg ALU IM Reg DM ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg sub $4,$1,$5 and $6,$1,$7 xor $4,$1,$5 or $8,$1,$9 The one case where forwarding cannot save anything when an instruction tries to read a register following a load instruction that writes the same register.

36 Load-use Hazard Detection Unit Need a Hazard detection Unit in the ID stage that inserts a stall between the load and its use  The first line tests to see if the instruction now in the EX stage is a lw ; the next two lines check to see if the destination register of the lw matches either source register of the instruction in the ID stage (the load-use instruction)  After this one cycle stall, the forwarding logic can handle the remaining data hazards Hazard detection unit implements the following check if to stall if ( ID/EX.MemRead // if the instruction in the EX stage is a load… and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs ) // and the destination register or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register // of the instruction in the ID stage, then… stall the pipeline

37 Stall Hardware Along with the Hazard Unit, we have to implement the stall Prevent the instructions in the IF and ID stages from progressing down the pipeline – done by preventing the PC register and the IF/ID pipeline register from changing Hazard detection Unit controls the writing of the PC ( PC.write ) and IF/ID ( IF/ID.write ) registers Insert a “bubble” between the lw instruction (in the EX stage) and the load-use instruction (in the ID stage) (i.e., insert a noop in the execution stream) Set the control bits in the EX, MEM, and WB control fields of the ID/EX pipeline register to 0 ( noop ). The Hazard Unit controls the mux that chooses between the real control values and the 0’s. Let the lw instruction and the instructions after it in the pipeline (before it in the code) proceed normally down the pipeline

38 Adding the Hazard Hardware Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control ALU cntrl Branch PCSrc Forward Unit Hazard Unit 0 1 ID/EX.RegisterRt 0 ID/EX.MemRead

39 Mechanics of Stalling If the check to stall verifies, then the pipeline needs to stall only 1 clock cycle after the load as after that the forwarding unit can resolve the dependency What the hardware does to stall the pipeline 1 cycle: does not let the IF/ID register change (disable write!) – this will cause the instruction in the ID stage to repeat, i.e., stall therefore, the instruction, just behind, in the IF stage must be stalled as well – so hardware does not let the PC change (disable write!) – this will cause the instruction in the IF stage to repeat, i.e., stall changes all the EX, MEM and WB control fields in the ID/EX pipeline register to 0, so effectively the instruction just behind the load becomes a nop – a bubble is said to have been inserted into the pipeline note that we cannot turn that instruction into an nop by 0ing all the bits in the instruction itself – recall nop = 00…0 (32 bits) – because it has already been decoded and control signals generated

40 Pipelined Datapath with Control II (as before) Control signals emanate from the control portions of the pipeline registers

41 Hazard Detection Unit + Forwarding Unit Datapath with forwarding hardware, the hazard detection unit and controls wires – certain details, e.g., branching hardware are omitted to simplify the drawing

42 Stalling Execution example: lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2

43 Stalling Execution example: lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2

44 Stalling Execution example: lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2

45 Stalling Execution example: lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2

46 Stalling Execution example: lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2

47 Stalling Execution example: lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2


Download ppt "CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding."

Similar presentations


Ads by Google