1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

Slides:

Advertisements

Similar presentations

Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.

Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Dynamic Branch Prediction (Sec 4.3) Control dependences become a limiting factor in exploiting ILP So far, we’ve discussed only static branch prediction.

1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.

1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

1 Lecture 7: Static ILP, Branch prediction Topics: static ILP wrap-up, bimodal, global, local branch prediction (Sections )

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )

1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)

1 Lecture 6: Static ILP Topics: loop analysis, SW pipelining, predication, speculation (Section 2.2, Appendix G) Assignment 2 posted; due in a week.

1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Hardware Support for Compiler Speculation

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

Branch Hazards and Static Branch Prediction Techniques

1 Lecture 12: Advanced Static ILP Topics: parallel loops, software speculation (Sections )

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

Lecture: Out-of-order Processors

Computer Organization CS224

Lecture: Branch Prediction

Lecture: Branch Prediction

Lecture: Pipelining Extensions

Lecture: Out-of-order Processors

Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

Lecture 6: Advanced Pipelines

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 6: Static ILP, Branch prediction

Lecture 18: Pipelining Today’s topics:

Lecture: Static ILP, Branch Prediction

Lecture 5: Pipelining Basics

Lecture 18: Pipelining Today’s topics:

Lecture: Branch Prediction

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)

Lecture 19: Branches, OOO Today’s topics: Instruction scheduling

Control unit extension for data hazards

Lecture: Pipelining Extensions

Lecture 20: OOO, Memory Hierarchy

Lecture 18: Pipelining Today’s topics:

Control unit extension for data hazards

Dynamic Hardware Prediction

Control unit extension for data hazards

Lecture 9: Dynamic ILP Topics: out-of-order processors

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)

2 Support for Speculation In general, when we re-order instructions, register renaming can ensure we do not violate register data dependences However, we need hardware support  to ensure that an exception is raised at the correct point  to ensure that we do not violate memory dependences st br ld

3 Detecting Exceptions Some exceptions require that the program be terminated (memory protection violation), while other exceptions require execution to resume (page faults) For a speculative instruction, in the latter case, servicing the exception only implies potential performance loss In the former case, you want to defer servicing the exception until you are sure the instruction is not speculative Note that a speculative instruction needs a special opcode to indicate that it is speculative

4 Program-Terminate Exceptions When a speculative instruction experiences an exception, instead of servicing it, it writes a special NotAThing value (NAT) in the destination register If a non-speculative instruction reads a NAT, it flags the exception and the program terminates (it may not be desireable that the error is caused by an array access, but the core-dump happens two procedures later) Alternatively, an instruction (the sentinel) in the speculative instruction’s original location checks the register value and initiates recovery

5 Memory Dependence Detection If a load is moved before a preceding store, we must ensure that the store writes to a non-conflicting address, else, the load has to re-execute When the speculative load issues, it stores its address in a table (Advanced Load Address Table in the IA-64) If a store finds its address in the ALAT, it indicates that a violation occurred for that address A special instruction (the sentinel) in the load’s original location checks to see if the address had a violation and re-executes the load if necessary

6 Dynamic Vs. Static ILP Static ILP: + The compiler finds parallelism  no scoreboarding  higher clock speeds and lower power + Compiler knows what is next  better global schedule - Compiler can not react to dynamic events (cache misses) - Can not re-order instructions unless you provide hardware and extra instructions to detect violations (eats into the low complexity/power argument) - Static branch prediction is poor  even statically scheduled processors use hardware branch predictors - Building an optimizing compiler is easier said than done A comparison of the Alpha, Pentium 4, and Itanium (statically scheduled IA-64 architecture) shows that the Itanium is not much better in terms of performance, clock speed or power

7 Control Hazards In the 5-stage in-order processor: assume always taken or assume always not taken; if the branch goes the other way, squash mis-fetched instructions (momentarily, forget about branch delay slots) Modern in-order and out-of-order processors: dynamic branch prediction; instead of a default not-taken assumption, either predict not-taken, or predict taken-to-X, or predict taken-to-Y Branch predictor: a cache of recent branch outcomes

8 Pipeline without Branch Predictor IF (br) PC Reg Read Compare Br-target PC + 4 In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch

9 Pipeline with Branch Predictor IF (br) PC Reg Read Compare Br-target In the 5-stage pipeline, a branch completes in two cycles  If the branch went the wrong way, one incorrect instr is fetched  One stall cycle per incorrect branch Branch Predictor

10 Branch Mispredict Penalty Assume: no data or structural hazards; only control hazards; every 5 th instruction is a branch; branch predictor accuracy is 90% Slowdown = 1 / (1 + stalls per instruction) Stalls per instruction = % branches x %mispreds x penalty = 20% x 10% x 1 = 0.02 Slowdown = 1/1.02 ; if penalty = 20, slowdown = 1/1.4

11 1-Bit Prediction For each branch, keep track of what happened last time and use that outcome as the prediction What are prediction accuracies for branches 1 and 2 below: while (1) { for (i=0;i<10;i++) { branch-1 … } for (j=0;j<20;j++) { branch-2 … }

12 2-Bit Prediction For each branch, maintain a 2-bit saturating counter: if the branch is taken: counter = min(3,counter+1) if the branch is not taken: counter = max(0,counter-1) If (counter >= 2), predict taken, else predict not taken Advantage: a few atypical branches will not influence the prediction (a better measure of “the common case”) Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor) Can be easily extended to N-bits (in most processors, N=2)

13 Title Bullet