Lecture 4.5 Pipelines – Control Hazards Topics Control Hazards Branch Prediction Misprediction stalls Readings: Appendix C September 2, 2015 CSCE 513 Computer.

Slides:



Advertisements
Similar presentations
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Dynamic Branch Prediction
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Pipeline Hazards CS365 Lecture 10. D. Barbara Pipeline Hazards CS465 2 Review  Pipelined CPU  Overlapped execution of multiple instructions  Each on.
CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Organization and Architecture The CPU Structure.
EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
Chapter 2 Instruction-Level Parallelism and Its Exploitation
Goal: Reduce the Penalty of Control Hazards
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
Appendix A Pipelining: Basic and Intermediate Concepts
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
EECC551 - Shaaban #1 Lec # 4 winter Data Hazards Requiring Stall Cycles In some code sequence cases, potential data hazards cannot be handled.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Lecture 5: Pipelining Implementation Kai Bu
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
1 Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix C Authors: John Hennessy & David Patterson.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Branch Hazards and Static Branch Prediction Techniques
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CSC 4250 Computer Architectures October 31, 2006 Chapter 3.Instruction-Level Parallelism & Its Dynamic Exploitation.
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Data Hazards Dependent instructions add %g1, %g2, %g3 sub %l1, %g3, %o0 Forwarding helps, but not all hazards can be avoided.
Instruction-Level Parallelism and Its Dynamic Exploitation
Computer Organization CS224
Concepts and Challenges
Dynamic Branch Prediction
CS 704 Advanced Computer Architecture
Pipelining Wrapup Brief overview of the rest of chapter 3
Morgan Kaufmann Publishers The Processor
Lecture 5 Pipelines – Control Hazards
Exceptions & Multi-cycle Operations
The processor: Pipelining and Branching
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Dynamic Hardware Branch Prediction
CSCE 513 Computer Architecture
Lecture 5 Pipelines – Control Hazards
Control unit extension for data hazards
Project Instruction Scheduler Assembler for DLX
Overview What are pipeline hazards? Types of hazards
Adapted from the slides of Prof
Control unit extension for data hazards
Dynamic Hardware Prediction
Control unit extension for data hazards
CMSC 611: Advanced Computer Architecture
Presentation transcript:

Lecture 4.5 Pipelines – Control Hazards Topics Control Hazards Branch Prediction Misprediction stalls Readings: Appendix C September 2, 2015 CSCE 513 Computer Architecture

– 2 – CSCE 513 Fall 2015 Overview Last Time Review of Single cycle design 5 stage Pipeline Lecture 3 slides 1-20New Slides of Lecture 3 IEEE 754 Floating Point Normal Pipeline Operations – the Ideal World Hazards Data Hazards: RAW, WAR, WAW, forwarding, load-use Control hazards Performance with StallsReferences Appendix C

– 3 – CSCE 513 Fall 2015 Review A simple Implementation of MIPS Pipeline pages C-31 through C-33 5 Stages – specify the register transfers in each stage 1.Instruction Fetch (IF or F) 2.Instruction decode/register fetch (ID or D) 3.Execute 4.Memory 5.Write Back – store results into the register indicated (r d or r t )

– 4 – CSCE 513 Fall 2015 Load-Use Hazard Must stall even with Full Forwarding

– 5 – CSCE 513 Fall 2015 Control Hazards – basics review Loop: 800 LD F2, 0(R1) -- top of loop …. 1000BNEZR1, loop 1004 LD F4, 0(R3) 1008 DADDIU R3, R3, #4 100C SD F2, #-4(R3) 1010… Branch prediction – guess which way to go

– 6 – CSCE 513 Fall 2015 Branches predicted not taken correctly  BNEZ R1, loop  And when you get to the execute R1 != 0  Predict Branch not taken and it’s not taken  No stalls what a wonderful world!

– 7 – CSCE 513 Fall 2015 Figure C-22 revisited  Branch target and Branch taken  What are they?  When are they calculated?  Where are they used from?

– 8 – CSCE 513 Fall 2015 Branches predicted correctly and not Assuming Improved hardware  BNEZ R1, loop  Predict Branch not taken and Woops! it’s taken  The condition turns out we should take the branch  i+1, i+2, i+3 were wrong  Turn into “NOPs” ( No Operations, instruction that does nothing)  When does the pipeline find out Instr BNEZFDEMW i+1FDEMW i+2FDEMW i+3FDEMW Branch target FDEMW

– 9 – CSCE 513 Fall 2015 Delays for Mis-predicted Branches  Figure C-22 revisited yet again

– 10 – CSCE 513 Fall 2015 Figure C.28 Avoiding some Branch Stalls Copyright © 2011, Elsevier Inc. All rights Reserved.

– 11 – CSCE 513 Fall 2015 Branches predicted correctly and not Assuming Improved hardware Fig C-12  BNEZ R1, loop  Predict Branch not taken and it’s not taken  Predict Branch not taken and Woops! it’s taken  The condition turns out we should take the branch

– 12 – CSCE 513 Fall 2015 Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.28 The stall from branch hazards can be reduced by moving the zero test and branch-target calculation into the ID phase of the pipeline. Notice that we have made two important changes, each of which removes 1 cycle from the 3-cycle stall for branches. The first change is to move both the branch-target address calculation and the branch condition decision to the ID cycle. The second change is to write the PC of the instruction in the IF phase, using either the branch-target address computed during ID or the incremented PC computed during IF. In comparison, Figure C.22 obtained the branch-target address from the EX/MEM register and wrote the result during the MEM clock cycle. As mentioned in Figure C.22, the PC can be thought of as a pipeline register (e.g., as part of ID/IF), which is written with the address of the next instruction at the end of each IF cycle. Returns; Unconditional branches

– 13 – CSCE 513 Fall 2015 Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.14 Scheduling the branch delay slot. The top box in each pair shows the code before scheduling; the bottom box shows the scheduled code. In (a), the delay slot is scheduled with an independent instruction from before the branch. This is the best choice. Strategies (b) and (c) are used when (a) is not possible. In the code sequences for (b) and (c), the use of R1 in the branch condition prevents the DADD instruction (whose destination is R1) from being moved after the branch. In (b), the branch delay slot is scheduled from the target of the branch; usually the target instruction will need to be copied because it can be reached by another path. Strategy (b) is preferred when the branch is taken with high probability, such as a loop branch. Finally, the branch may be scheduled from the not-taken fall-through as in (c). To make this optimization legal for (b) or (c), it must be OK to execute the moved instruction when the branch goes in the unexpected direction. By OK we mean that the work is wasted, but the program will still execute correctly. This is the case, for example, in (c) if R7 were an unused temporary register when the branch goes in the unexpected direction. Branch Delay slots Dumb hardware; smart compiler - scheduling

– 14 – CSCE 513 Fall 2015 Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.17 Misprediction rate on SPEC92 for a profile-based predictor varies widely but is generally better for the floating-point programs, which have an average misprediction rate of 9% with a standard deviation of 4%, than for the integer programs, which have an average misprediction rate of 15% with a standard deviation of 5%. The actual performance depends on both the prediction accuracy and the branch frequency, which vary from 3% to 24%.

– 15 – CSCE 513 Fall 2015 Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.19 Prediction accuracy of a 4096-entry 2-bit prediction buffer for the SPEC89 benchmarks. The misprediction rate for the integer benchmarks (gcc, espresso, eqntott, and li) is substantially higher (average of 11%) than that for the floating-point programs (average of 4%). Omitting the floating-point kernels (nasa7, matrix300, and tomcatv) still yields a higher accuracy for the FP benchmarks than for the integer benchmarks. These data, as well as the rest of the data in this section, are taken from a branch-prediction study done using the IBM Power architecture and optimized code for that system. See Pan, So, and Rameh [1992]. Although these data are for an older version of a subset of the SPEC benchmarks, the newer benchmarks are larger and would show slightly worse behavior, especially for the integer benchmarks.

– 16 – CSCE 513 Fall 2015 Copyright © 2011, Elsevier Inc. All rights Reserved. Figure C.18 The states in a 2-bit prediction scheme. By using 2 bits rather than 1, a branch that strongly favors taken or not taken—as many branches do—will be mispredicted less often than with a 1-bit predictor. The 2 bits are used to encode the four states in the system. The 2-bit scheme is actually a specialization of a more general scheme that has an n-bit saturating counter for each entry in the prediction buffer. With an n-bit counter, the counter can take on values between 0 and 2n – 1: When the counter is greater than or equal to one-half of its maximum value (2n – 1), the branch is predicted as taken; otherwise, it is predicted as untaken. Studies of n-bit predictors have shown that the 2-bit predictors do almost as well, thus most systems rely on 2-bit branch predictors rather than the more general n-bit predictors.

– 17 – CSCE 513 Fall bit Saturating Branch predictor Consider a loop i = 0 i = 0Loop:… i = i + 4 i = i + 4 if i < 400 go to loop Branch Prediction trace  What state do we start in?  Assume something say SNT iStatePred Actual Next State 0SNTNTTakenWNT 1 NTTakenWT 2 TTakenST 3 TTakenST

– 18 – CSCE 513 Fall 2015 Dynamic Scheduling

– 19 – CSCE 513 Fall 2015

– 20 – CSCE 513 Fall 2015 Why it is really that this easy! interrupts, fault, and exception The terms interrupt, fault, and exception are used, although not in a consistent fashion. We use the term exception to cover all these mechanisms, including the following:  I/ O device request Invoking an operating system service from a user program  Tracing instruction execution  Breakpoint (programmer-requested interrupt)  Integer arithmetic overflow FP arithmetic anomaly  Page fault (not in main memory)  Misaligned memory accesses (if alignment is required)  Memory protection violation  Using an undefined or unimplemented instruction  Hardware malfunctions  Power failure