Download presentation
Presentation is loading. Please wait.
1
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining as a means of improving performance Topics to be covered Pipeline concept, its potential for speedup, and the need for balance among the pipeline stages Pipeline hazards Techniques to resolve hazards
2
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu What is Pipelining Pipelining is an implementation technique that overlaps multiple instruction execution. An instruction is broken into smaller steps Each smaller step (pipeline stage or pipeline segment) takes a fraction of the time needed to complete the entire instruction.
3
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example (without pipelining) Consider the lw instruction for the multiple cycle implementation we discussed in Chapter 5. The operation times for the major functional units in the implementation are as follows: Memory units : 200 ps (for read and write) ALU : 200 ps Register file : 100 ps (for read and write) Assume that the multiplexors, control unit, PC access, and sign-extension unit have no delay.
4
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example (without pipelining) - Continued Five steps are involved in the lw fetch and execution. Time taken to complete each step is as follows: Instruction fetch: 200 ps Register read: 100 ps (for base value) ALU: 200 ps (for memory address) Memory read: 200 ps (for reading data from memory) Register write: 100 ps (for register write) Execution time for lw instruction = 800 ps Execution time for a sequence of 3 lw instructions = 2400 ps
5
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example (with pipelining) Since the lw instruction is divided into five steps, a 5 stage pipeline is employed. Each pipeline stage takes one clock cycle. Clock cycle for a pipeline stage must be long enough to accommodate the slowest operation (200 ps in our example). Figure 6.3 Nonpipelined versus pipelined execution of 3 lw instructions From the pipelined example, we see that, the first lw instruction execution takes 800 ps and each additional lw instruction execution adds 200 ps to the total execution time. Thus, the total execution time for the sequence of 3 lw instructions is 1200 ps
6
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 6.3 Nonpipelined versus pipelined execution of 3 lw instructions
7
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Pipeline Performance - Summary Pipeline does not change the individual instruction execution time Pipeline improves performance by increasing the instruction throughput The pipelined processor has a lower average CPI when compared to a multicycle implementation with the same clock rate. The pipelined processor has a lower product of clock rate and CPI when compared to the single cycle implementation Ideal speedup is proportional to the number of stages
8
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Need for Registers Between Pipeline Stages Registers are needed between the pipeline stages To store the value(s) generated by each pipeline stage to allow the data path to be shared by other instructions in the pipeline. All instructions advance during each clock cycle from one pipeline register to the next.
9
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Pipeline Hazards Hazard:A situation in pipelining when the next instruction cannot execute in the next clock cycle Three types of hazards: Structural hazard Data hazard Control (branch) hazard
10
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Structural Hazard The hardware cannot support the combination of instructions that we want to execute in the same clock cycle.
11
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Data Hazards Data hazard can occur when one or more of the instructions in the pipeline are data dependent. Consider the following sequence of instructions: add$s0, $t0, $t1 sub$t2, $s0, $t3 The sub instruction is dependent on the result in register $s0 of the first instruction. Consider the following sequence of instructions: lw$s0, 20 ($t1) sub$t2, $s0, $t3 The data required by the sub instruction is available only after the fourth stage of the first instruction.
12
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Data Hazard - Solutions Two methods are used to resolve a data hazard. Forwarding or bypassing Retrieves the missing data element from internal buffers instead of waiting for it to come from the registers or memory location specified by the instruction (Figure 6.5) Pipeline stall (bubble) Stall the pipeline by the required number of stages. This guarantees correct execution, but could result in a lower performance. In our example (lw followed by sub), we would have to stall by one stage (Figure 6.6).
13
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 6.5 Forwarding or bypassing
14
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 6.6 Pipeline stall (bubble)
15
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Control (Branch) Hazards In a pipeline, an instruction is fetched at every clock cycle to sustain the pipeline. If the instruction fetched is a “branch” instruction, the decision about whether to branch does not occur until the memory pipeline stage. The delay in determining the proper instruction to fetch is called a “control hazard” or “branch hazard”.
16
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Resolving Branch Hazards Techniques employed are: Always stall Pipeline is stalled until the branch is complete. The penalty will be several clock cycles. Assume branch not taken Execution of the branch instruction is continued in the pipeline assuming that the branch is not likely to take place. If the branch is taken, the instructions that are being fetched and decoded are discarded (flushed).
17
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Performance of Pipelined Systems Pipelining reduces the average execution time per instruction, thereby improving the system performance. Hazards limit the performance improvement, but appropriate hardware/software techniques can be devised to circumvent these limits.
18
S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Superscalar Technique The internal components of the computer are replicated so that the processor can launch multiple instructions in every pipeline stage. Launching multiple instructions per stage allows the instruction execution rate to exceed the clock rate (CPI < 1).
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.