Appendix A. Pipelining: Basic and Intermediate Concept

Slides:



Advertisements
Similar presentations
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Advertisements

COMP 4211 Seminar Presentation Based On: Computer Architecture A Quantitative Approach by Hennessey and Patterson Presenter : Feri Danes.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
Computer Organization and Architecture
Lecture 6: Pipelining MIPS R4000 and More Kai Bu
Instruction-Level Parallelism (ILP)
1 IF IDEX MEM L.D F4,0(R2) MUL.D F0, F4, F6 ADD.D F2, F0, F8 L.D F2, 0(R2) WB IF IDM1 MEM WBM2M3M4M5M6M7 stall.
COMP381 by M. Hamdi 1 Pipeline Hazards. COMP381 by M. Hamdi 2 Pipeline Hazards Hazards are situations in pipelining where one instruction cannot immediately.
CIS429/529 Winter 2007 Pipelining II- 1 Additional pipelining topics Why pipelining is so hard: exception handling ILP techniques: loop unrolling.
CIS429/529 Winter 2007 Pipelining-1 1 Pipeling RISC/MIPS64 five stage pipeline Basic pipeline performance Pipeline hazards Branch hazards More pipeline.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Sep 24, 2003 Topic: Pipelining -- Intermediate Concepts (Multicycle Operations;
EECC551 - Shaaban #1 Lec # 2 Winter Instruction Pipelining Review:Instruction Pipelining Review: –MIPS In-Order Single-Issue Integer Pipeline.
Instruction Pipelining Review
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
COMP381 by M. Hamdi 1 Pipelining Control Hazards and Deeper pipelines.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
DLX Instruction Format
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
EECC551 - Shaaban #1 Lec # 2 Winter Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple.
1 Lecture 4: Advanced Pipelines Control hazards, multi-cycle in-order pipelines, static ILP (Appendix A.4-A.10, Sections )
Appendix A Pipelining: Basic and Intermediate Concepts
CIS429.S00: Lec12- 1 Miscellaneous pipelining topics Why pipelining is so hard: exception handling Advanced pipelining techniques: loop unrolling.
EECC551 - Shaaban #1 Lec # 4 winter Data Hazards Requiring Stall Cycles In some code sequence cases, potential data hazards cannot be handled.
Instruction Pipelining Review:
EECC551 - Shaaban #1 Lec # 2 Winter DLX Instruction Format 16 bits6 bits5 bits Immediate rdrs1Opcode 6 bits5 bits 11 bits Opcoders1rs2rdfunc.
1 Manchester Mark I, This was the second (the first was a small- scale prototype) machine built at Cambridge. A production version of this computer.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
-1.1- PIPELINING 2 nd week. -2- Khoa Coâng Ngheä Thoâng Tin – Ñaïi Hoïc Baùch Khoa Tp.HCM PIPELINING 2 nd week References Pipelining concepts The DLX.
Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
Lecture 7: Pipelining Review Kai Bu
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
Lecture 5: Pipelining Implementation Kai Bu
Lecture 05: Pipelining Basics & Hazards Kai Bu
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
1 Appendix A Pipeline implementation Pipeline hazards, detection and forwarding Multiple-cycle operations MIPS R4000 CDA5155 Spring, 2007, Peir / University.
Pipeline Extensions prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University MIPS Extensions1May 2015.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
CSC 4250 Computer Architectures September 22, 2006 Appendix A. Pipelining.
Data Hazards Dependent instructions add %g1, %g2, %g3 sub %l1, %g3, %o0 Forwarding helps, but not all hazards can be avoided.
Instruction-Level Parallelism
Computer Organization CS224
Lecture 15: Pipelining: Branching & Complications
Lecture 07: Pipelining Multicycle, MIPS R4000, and More
Pipelining Wrapup Brief overview of the rest of chapter 3
Appendix C Pipeline implementation
Exceptions & Multi-cycle Operations
Pipelining: Advanced ILP
Lecture 6: Advanced Pipelines
Pipelining Multicycle, MIPS R4000, and More
The processor: Pipelining and Branching
Instruction Pipelining Review:
CSC 4250 Computer Architectures
How to improve (decrease) CPI
Project Instruction Scheduler Assembler for DLX
Overview What are pipeline hazards? Types of hazards
Lecture 4: Advanced Pipelines
Control Hazards Branches (conditional, unconditional, call-return)
Lecture 06: Pipelining Implementation
Interrupts and exceptions
CMSC 611: Advanced Computer Architecture
Pipelining Hazards.
Presentation transcript:

Appendix A. Pipelining: Basic and Intermediate Concept Rung-Bin Lin Appendix A. Pipelining: Basic and Intermediate Concept What is Pipelining? Pipelining is an implementation technique whereby multiple instructions are overlaped in execution. Pipe stage (pipe segment) Throughput Machine cycle: The time required between moving an instruction one step down the pipeline. This time is equal to the time required for the slowest pipe stage. In a computer, the machine cycle is usually one clock cycle. The pipeline designer‘s goal is to balance the length of each pipe stage. If the stages are perfectly balanced,

A Simple Implementation of A RISC ISA Five-cycle implementation Instruction fetch cycle (IF) Instruction decode/register fetch cycle (ID) Operand fetches; Sign-extending the immediate field; Decoding is done in parallel with reading registers. This technique is known as fixed-field decoding; Test branch condition and computed branch address; finished branching at the end of this cycle. Execution/effective address cycle (EX) Memory reference; Register-Register ALU instruction; Register-Immediate ALU instruction; Memory access/branch completion cycle (MEM) Write-back cycle (WB) Load instruction;

Performance of the Five-Cycle Implementation CPI=4.54 Branch instructions (12%) take 2 cycles Store instructions (10%) require 4 cycles Others takes 5 cycles

The Classic Five-Stage Pipeline for a RSIC Processor

The RISC Pipeline with Registers

Instruction Issue The process of letting an instruction move from the instruction decode stage (ID) into execution stage (EX) of this pipeline.

Basic Performance Issues in Pipelining Pipelining increasing instruction execution throughput, but it does not reduce the execution time of an individual instruction due to pipeline overhead. Register delay Clock skew The limitation of pipeline depth is due to Pipeline latency Pipe stage imbalance Pipeline overhead Example in A-10.

The Major Hurdle of Pipelining - Pipelining Hazards A hazard is a situation that prevents the next instruction in the instruction stream from executing during its designated clock cycle. Three classes of hazards Structural hazard: Arise from resource conflicts. Data hazard: Arise when an instruction depends on the results of a previous instruction. Control hazard: Arise from branches and other instructions that change the PC. A pipeline can be stalled by a hazard. To eliminate hazards, Instructions issued later than the stalled instruction are also stalled. Instructions issued earlier than the stalled one must continue. Note that a cache miss stalls the whole pipeline.

Performance of Pipeline with Stalls When pipelining is thought of as decreasing the CPI,

When pipelining is thought of as improving the clock cycle time,

Structural Hazards Due to resource conflicts (Example in A-14) Due to some functional unit being not fully pipelined. When some resources have not been duplicated enough.

Data Hazards A memory access depends on the results of unfinishing instructions.

Forwarding (Bypassing) ALU Results To Minimize Hazards

Forwarding (Bypassing) Results to Store

Bypassing Results of LOAD

Data Hazard Classification Consider two instructions i and j, with i occurring before j, the possible hazards are, RAW (read after write) : j tries to read a source before i writes it. WAW (write after write): j tries to write an operand before it is written by i. For example, LW R1, 0(R2) IF ID EX MEM1 MEM2 WB DADD R1, R2, R3 IF ID EX WB WAR (write after read): j tries to write a destination before it is read by i. For example, if read is done in the second half of MEM2, and write is done in the first half of WB. SW 0(R1), R2 IF ID EX MEM1 MEM2 WB DADD R2, R3, R4 IF ID EX WB RAR (read after read): not a hazard.

Data Hazards Requiring Stalls Pipeline interlock A piece of hardware that detects a hazard and stalls the pipeline until the hazard is cleared. Load interlock Example (Fig. A.10 at A-21)

Control Hazards Caused by the instructions that change PC. Some basics If a branch changes the PC to its target address, it is a taken branch. If it does not change the PC, it falls through or it is not taken. Recall that if an instruction i is a taken branch, the PC is normally not changed until the end of ID. A stall cycle is required. Branch Instruction IF ID EX MEM WB Branch successor IF IF ID EX MEM WB Branch successor+1 IF ID EX MEM WB Branch successor+2 IF ID EX MEM WB

Branch Penalty Branch delay: The length of a control hazard. Branch penalty: The branch delay, unless it is dealt with, turns into branch penalty. The deeper the pipeline, the worse the branch penalty. The number of branch stalls can be reduced by two steps Find out whether the branch is taken or not taken earlier in the pipeline. Compute the taken PC (i.e., the address of the branch target) earlier. Branch behavior in programs Average frequency of taken branches : 67% 60% of the forward branches are taken. 85% of the backward branches are taken.

Reducing Pipeline Branch Penalties Static branch prediction methods (Compile-time guess). Free or flush the pipeline Holding or deleting any instructions after the branch until the branch destination is known. Predict-not-taken (untaken) (Fig. A.12 in A-23) Predict-taken Does it have any advantage? Ans: no. Delayed branch: The execution cycle with a branch delay n is Branch instruction Sequential successor 1 Sequential successor 2 … Sequential successor n (n=1 for MIPS) Branch target if taken

Scheduling the Branch Delay Slot

Effectiveness of Scheduling Branch Delay Slots Requirements for being effective Scheduling from before: Always Scheduling from target: Taken Scheduling from fall through: Not taken The limitation on delayed-branch scheduling arises from The restrictions on the instructions that are scheduled into the delay slots. The ability to predict at compile time whether a branch is likely to be taken or not. Using canceling or nullified branch to relieve the limlits In a canceling branch, the instruction includes the direction that the branch was predicted. When the branch behaves as predicted, the instruction in the branch delay slot is simply executed. Otherwise, the instruction in the branch delay slot is simply turned into a No-Op.

How Is Pipelining Implemented? Unpipelined 5-cycle implementation

Simple Pipelining Implementation for MIPS

Implementing the Control for MIPS Pipeline Implementing the control focuses on detecting of hazards and generating of control signals for forwarding. Hazard detection All the data hazards can be checked and forwarding control signals can be set during the ID phase. If a data hazard exists, the instruction is stalled before it is issued. Or, alternatively, hazards forwarding are checked at the beginning of a clock cycle that uses an operand (EX and MEM for the MIPS pipeline). Implementing the logic for hazard detection Hazard detection by comparing the destination and sources of adjacent instructions (fig. A.20 on page A-34). An example shows detecting of all load interlocks when the instruction using the load result in the ID stage (fig. A.21 on page A-34).

Implementing Forwarding Logic Forwarding sources: ALU or data memory output. Forwarding destination: ALU input, data memory input, or zero detection unit (for BRANCH). The forwarding can be implemented by checking the following conditions EX/MEM.IR.destination =ID/EX.IR.source ? MEM/WB.IR.destination = ID/EX.IR.source ? MEM/WB.IR.destination = EX/MEM.IR.source?

Forwarding Data to the Two ALU Inputs

Dealing with Branches in the Pipeline

What Makes Pipelining Hard to Implement Exception (interrupt, fault) makes pipelining difficult to implement. Instruction set complications

Types of Exceptions Types I/O device request Invoking an OS service from a user program Tracing instruction execution Breakpoint Integer arithmetic overflow or underflow FP arithmetic anomaly Page fault Misaligned memory access Memory-protection violation Using an undefined instruction Hardware malfunction Power failure Exceptions for different architecture (fig. A.26 on page A-40).

Classification of Exceptions Synchronous versus asynchronous If the event occurs at the same place every time that the program is executed with the same data and memory allocation, the event is called synchronous. User requested versus coerced User maskable versus nonmaskable Within versus between instruction Depend on whether the event prevents instruction completion by occurring in the middle of execution or whether it is recognized between instructions. Resume versus terminate (fig. 3.40 on page 182).

Action Requirements for Different Exception Types (Fig. A Action Requirements for Different Exception Types (Fig. A.27 on page A-42) Actions Resume Terminate The most difficult exceptions have two properties: They occur within instructions (i.e. at EX or MEM stages). They must be restartable (must save the PC of the instruction at which to restart).

Exception Handling Stopping and restarting execution Force a trap instruction on the next IF Until the trap is taken, turn off all writes for the faulting instruction and for all instructions that follow in the pipeline. After the exception-handling routine in the operating system receives control, it immediately saves the PC of the faulting instruction. IF ID EX MEM WB <--- Faulting instruction IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM Trap instruction -> IF ID EX If delayed branch is used, we need to save and restore as many PCs as the length of the branch delay plus one.

Precise Interrupt If a pipeline can be stopped so that the instructions just before the faulting instruction are completed and those after it can be restarted from scratch. Supporting precise interrupts is a requirement in many systems. Exceptions in DLX With pipelining, multiple exceptions may occur in the same clock cycle. (fig. A.28 on page A-44).

Implementations of Precise Exceptions Principle The pipeline should be able to handle the exceptions caused by instruction i prior to the exceptions caused by instruction i+1. Implementation Hardware posts all exceptions caused by a given instruction in a status vector associated that instruction. Once an exception indication is set in the exception status vector, any control signal that may cause a data value to be written is turned off. When an instruction enters WB, the exception status vector is checked, if any exceptions are posted, they are handled in the order in which they would occur in time on an unpipelined machine. This will guarantee that all exceptions will be seen on instruction i before any are seen on i+1.

Instruction Committed When an instruction is guaranteed to complete, it is called committed. In the MIPS pipeline, all instructions are committed when they reach the end of the MEM stage and no instruction updates the state before that stage. Thus precise exceptions are straight forward.

Instruction Set Complications Some machines have instructions that change the state in the middle if the instruction execution. VAX: Autoincrement addressing mode. VAX or IBM 360: String copy. Implicitly set condition code. Cause difficulties in scheduling any pipeline delays between setting condition code and the branch. ADD XXX <--- Set condition code C. … <- Can not place instructions that change C. BR C, YYY <--- Use C for branch. In fact, the condition code must be treated as an operand that requires hazard detection for RAW hazards with branch no matter the condition code is set implicitly or explicitly Multicycle operations in VAX

Extending the MIPS Pipeline to Handle Multi-Cycle Operations Assuming four separate functional units in our MIPS implementation Integer unit Handle loads and stores, ALU operations and branches. FP and integer multiplier FP adder FP and integer divider If an instruction cannot proceed to the EX stage , the entire pipeline behind that instruction will be stalled.

MIPS Pipeline with Multi-cycle Functional Units

Pipelining Multi-cycle Functional Units

Latency and Initiation(repeat interval) The number of intervening cycles between an instruction that produces a result and an instruction that uses the result. Initiation (repeat) interval The number of cycles that must elapse between issuing two operations of a given type. Latency and initiation interval for pipelining multi-cycle functional units Functional Unit Latency Initiation interval Integer ALU 0 1 Data memory access 1 1 FP add 3 1 FP (integer) multiply 6 1 FP (integer) divide 24 25

Hazards and Forwarding in Longer Latency Pipelines Hazard detection and forwarding for a pipeline as before. Structural hazards can occur because the divide unit is not fully pipelined. The number of register writes can be larger than 1 because the instructions have varying running time. WAW hazards are possible, but WAR hazards are not possible. Instructions can complete in a different order than they were issued, causing problems with exceptions. Stalls for RAW hazards will be more frequent because of longer latency. Assuming all hazard detection is done in ID, three checks must be done before issuing an instruction: Check for structural hazards Check for a RAW data hazard Check for a WAW data hazard

RAW Hazards Caused by Longer Pipeline Fig. A.33

Structural Hazards in Longer Pipeline Fig. A.34

Maintaining Precise Exceptions (1) Problems caused by out-of-order completion DIV.D F0, F2, F4 ADD.D F10, F10, F8 SUB.D F12, F12, F14 Four possible approaches Ignore the problem and settle for imprecise exceptions Buffer the results of an operation until all the operations that were issued earlier are completed. History file approach: Buffer the original register values. Future file approach: Keep the newer values of registers. Allow the exceptions to become somewhat imprecise, but to keep enough information so that the trap-handling routines can create a precise sequence for exceptions. This means knowing what operations were in the pipeline and their PCs.

Maintaining Precise Exceptions (2) Worst-case scenario: Instruction 1: A long-running instruction that interrupts. Instruction 2 : not completed. …. Instruction n-1: not completed. Instruction n: completed. <-- The latest completed instruction. The software must simulate the instruction 1 through instruction n-1 and restart the execution at instruction n+1. Allows the instruction issue to continue only if it is certain that all the instructions before the issuing instruction will complete without causing an exception. This sometimes means stalling the machine to maintain precise exceptions.

Number of Stalls per FP Operation

Performance of a MIPS FP Pipeline

Overview of The MIPS R4000 Pipeline An implementation of MIPS64 Eight pipeline stages (superpipelining)

Load Delay in MIPS R4000

Branch Delay in MIPS R4000

CPI of MIPS R4000

Concluding Remarks We can spend a little money to buy a very powerful computer today.