Presentation is loading. Please wait.

Presentation is loading. Please wait.

Delayed Branching Explained Winter, 2005 Print a copy of these and handout, not the old one.

Similar presentations


Presentation on theme: "Delayed Branching Explained Winter, 2005 Print a copy of these and handout, not the old one."— Presentation transcript:

1 Delayed Branching Explained Winter, 2005 Print a copy of these and handout, not the old one

2 Chapter Example: Impact of Basic Static Branch Policies in Deeply Pipelined Processors N otes: Assume that Target Address is calculated at the end of 3 rd stage and Branches are resolved at the end of 4th stage. Pipeline Stage: Since the processor has deeper pipeline We can name first few stages factitiously as IF, ID, TA (Target Address Calc), BC (Branch Condition Evaluation), etc

3 Exercise A3 Pipeline Example: Notes: Target Address is calculated at the end of ID stage and Branches are resolved at the end of EXE stage. Assumption: Processor has no static branch penalty reduction technique, e.g, Delayed branching, Predict Taken, etc. Note that the processor has a strategy to stall until BC is executed CASE: Jump J lablIFIDEXEMEMWB SuccIFflush TargetIFIDEXE Case: Branch Not Taken Branch labl IFIDEXEMEMWB SuccIFstallIDEXEMEM Succ+1IFIDEXE Case: Branch Taken Branch labl IFIDEXEMEMWB succIFflush TargIFID

4 Example Contd. (2). Case: Always Flush CASE: Jump J Instr.IFIDTABC…. SuccIFflush Succ+1IFflush TargetIFIDEXE Case: Branch Not Taken (3 instructions flushed) Branch Instr.IFIDTABC SuccIF flush Flush Succ+1IF flush Flush Succ+2IF flush Flush TargetIFIDEXE….….. Case: Branch Taken (same 3 instructions flushed) Branch Instr.IFIDTABC SuccIF flush Flush Succ+1IF flush Flush Succ+2IF flush Flush TargetIFIDEXE….…..

5 Example Contd. (3). Case: Predict Taken CASE: Jump J Instr.IFIDTABC…. SuccIFflush Succ+1IFflush TargetIFIDEXE Case: Branch Not Taken (3 instructions flushed) Branch Instr.IFIDTABC SuccIFFlush ….. Succ+1IFFlush ….. TargetIFFlush Wrong target fectched Succ (Inst. Recall) IFIDEXE….….. Case: Branch Taken (2 instructions flushed) Branch Instr.IFIDTABC… SuccIFFlush Succ+1IFFlush TargetIFIDEXE….…..

6 Example Contd. (4). Case: Predict Not Taken CASE: Jump (2 stalls) J Instr.IFIDTABC…. SuccIFflush Succ+1IFflush TargetIFIDEXE Case: Branch Not Taken (No Flush or Stall) Branch Instr.IFIDTABC SuccIFIDEXE…. Succ+1IFID Succ +2IF Case: Branch Taken (3 instructions flushed) Branch Instr.IFIDTABC… SuccIFIDEXEFlush Succ+1IFIDFlush Succ+2IFFlush TargetIFIDEXE….…..

7 Summary Simple Static Techniques PolicyJUMPBranch actually Taken Branch Actually Not Taken Always Flush 233 Predict Taken 223 Predict Not Taken 230

8 Study of MIPS 5-stage Integer Pipeline: Simple Array Copy program

9 MIPS AGGRESSIVE BRANCHING CASE ( Normal Behavior of Branch in Array Copy Program ) IF TAKEN BNEZ IFID (BC) EXEMEMWB ANDIF(S) FLUSH LD.DIF (T) ID (T) EXE (T) MEMWB IF NOT TAKEN BNEZ IFID (BC) EXEMEMWB ANDIF (S) ID (S) EXE (S) MEM (S) WB ORIF (S2) ID (S2) EXE (S2) MEM (S2) WB (S2)

10 Delayed Branching: Since there is always an instruction between the target and branch, hence delay. Foo: L.DF0,0(R4) S.DF0,0(R5) Suppose R5 = 2000 initially. DAADR1,R1,#8 DSLTUIR2,R1,#80 This instruction brought here to get R2 set in time for BNEZ instruction later on. DAADR4,R4,#8 BNEZR2,Foo DADDR5,R5,#8 Delay Slot, Always Executes with the branch instruction AND……R5 = 80+2000 After the loop completes?

11 Cancelled Branching: Instruction in delay slot NOT executed if branch is not taken (MIPS BEQL (Branch if Equal Likely is an Example) Foo: L.DF0,0(R4) S.DF0,0(R5) Suppose R5 = 2000 initially. DAADR1,R1,#8 DSLTUIR2,R1,#80 DAADR4,R4,#8 BGTZR2,Foo DADDR5,R5,#8Delay Slot, Executes only if branch taken. AND……R4,R5 = ? After the loop completes?

12 Comparison Normal: 10 stalls for 10 iterations of the loop Delayed Branching: NO stalls Cancelled Branching: DADD R5.. is flushed in the last iteration. Hence 1 stall. Foo: L.DF0,0(R4)Foo: L.DF0,0(R4)Foo: L.DF0,0(R4) S.DF0,0(R5) S.DF0,0(R5) S.DF0,0(R5) DAADR4,R4,#8 DAADR1,R1,#8 DAADR1,R1,#8 DADDR5,R5,#8 DSLTUIR2,R1,#80 DSLTUIR2,R1,#80 DAADR1,R1,#8 DAADR4,R4,#8 DAADR4,R4,#8 DSLTUIR2,R1,#80 BNEZR2,Foo BNEZR2,Foo BNEZR2,Foo DADDR5,R5,#8 DADDR5,R5,#8 AND…… …… …… Instruction Moved to Delay Slot


Download ppt "Delayed Branching Explained Winter, 2005 Print a copy of these and handout, not the old one."

Similar presentations


Ads by Google