# Goal: Describe Pipelining

## Presentation on theme: "Goal: Describe Pipelining"— Presentation transcript:

Goal: Describe Pipelining
A Pipeline (צנרת) is like a conveyor belt (סרט נא) multiple instructions can be executed at the same time, each one in a different stage. Laundry (כביסה) is like pipelining: 1. Put a load of dirty clothing in the washer 2. Move the wet wash in the dryer 3. Move the dry wash onto a table and fold 4. Put the clothes away in a closet You can’t do it any faster. But what if you have 3 washes (whites, colors, delicates)? Computer Architecture - Pipelining 1/14

Pipelined Laundry Each wash cycle takes 2hours. 4 washes take 8 hours.
4 pipelined washes take 3.5 hours. The stages are overlapped. pipelined laundry is potentially 4 times faster. 2/14

Pipelining Instructions
Executing instructions is performed in stages: 1. Fetch the instruction from memory 2. Decode the instruction and read the registers 3. Execute the operation or calculate address 4. Access a word in data Memory 5. Writeback the result into a register While the first instruction is being decoded the second instruction is already being fetched. While the first instruction is executed the second is decoded and the third ... Computer Architecture - Pipelining 3/14

Single-Cycle vs. Pipelined Performance
A lw takes 8ns, each cycle is 2ns long (time of longest stage, memory access). Computer Architecture - Pipelining

Speedup of Pipelining Assuming ideal conditions:
DT instructions pipelined = DT instructions nonpipelined number of pipe stages Time between instructions: 8ns. Number of pipe stages: 5 -> 8ns/5 = 1.6ns But the minimum pipeline stage is 2ns. So why is the speedup only 24ns/14ns = 1.7 and not 4.0. For 1003 instructions: single-cycle: 1000*8ns + 24ns = 8,024 pipelined: *2ns + 14ns = 2,014 = 3.98 Computer Architecture - Pipelining 5/14

The MIPS instruction set was designed for pipelining:
Each instruction is the same size. Each cycle an instruction is fetched. In the 80x86 instruction lengths vary from 1 to 17 bytes. Some instructions can be fetched in 1 cycle other not, this complicates things. MIPS has few instruction formats, in each instruction the source register fields are in the same place. Registers can be read at the same time the control is determining the type of instruction. If this wasn’t so an extra pipeline stage would be needed to read the registers. Only Load and Store instructions access memory. If ALU ops could access memory stages 3 and 4 would be expanded. Computer Architecture - Pipelining

Pipeline Hazards When the next instruction can’t execute in the next clock cycle we say that a Hazard (סכנה) has occurred. There are 3 type of hazards: Structural Hazards: The same unit is needed by two instructions. Control Hazards: The next instruction isn’t known yet. Data Hazards: An instruction depends on the result of a previous instruction. Computer Architecture - Pipelining 6/14

Structural Hazards Laundry example: Instruction example: Solution:
A washer/dryer combination is used. Both the wash and dry stages use the same unit. The folding table has all your school work on it. Instruction example: A single memory is used. The Fetch and Memory stages can’t be executed at the same time. The Register file can’t be read and written to at the same cycle. Solution: Two memories, for data and instructions. Enable the register file to read and write simultaneously. Computer Architecture - Pipelining 7/14

Control Hazards Laundry example: Instruction example: Solution:
Washing filthy uniforms. We might have to add more soap and wash again. Only after the dry stage can we tell if the uniforms are clean, or is more soap needed. Instruction example: A branch instruction is being decoded. The next instruction fetched might be the wrong one. Even with a dedicated ALU in the decode stage we still miss a stage. Solution: Stall: Wait until the branch direction is known and then continue fetching. This is known as a pipeline stall or bubble. Computer Architecture - Pipelining 8/14

Control Hazard Example
If the branch test fails the lw instruction will be executed with a delay of one cycle. In many processors a delay of 2 cycles is necessary. This solution is to slow, we need a faster solution. What if we predict (מנבא) the result of the branch. Computer Architecture - Pipelining 9/14

Branch Prediction Laundry Solution: Instruction Solution:
While drying the first load of uniforms wash the second load. If the first load isn’t clean enough rewash the first and second loads. Instruction Solution: Predict all branches are not taken. Fetch the next instruction. If the branch is taken (misprediction), stall. Computer Architecture - Pipelining 10/14

3rd Solution: Delayed Decision
Laundry Solution: When drying the first uniform load, wash a regular load. Instruction Solution: Switch around the order of instructions. After the branch instruction execute an instruction that isn’t dependent on the branch. Computer Architecture - Pipelining 11/14

Data Hazards Laundry example: Instruction example: Solution:
The first load is mainly socks. Every sock has its pair in the second load. Can’t fold the first load until the second load is dried. Instruction example: add \$s0, \$t0, \$t1 sub \$t2, \$s0, \$t3 The 2nd instruction is dependent on the 1st. Only during the 5th stage is the result written back into \$s0. Solution: Stall: Wait until the 1st instruction ends. Results in 3 bubbles. To long to wait. Computer Architecture - Pipelining 12/14

Forwarding The result is calculated in the 3rd stage, why wait:
In the case of a R-format following a load, a bubble is added (called a load-use data hazard): 13/14

Code Reordering Find the hazard: Solution: Reorder the instructions:
# \$t1 is the address of v[k] lw \$t0, 0(\$t1) # \$t0=v[k] lw \$t2, 4(\$t1) # \$t2 = v[k+1] sw \$t2, 0(\$t1) # v[k] = \$t2 sw \$t0, 4(\$t1) # v[k+1] = \$t0 Solution: Reorder the instructions: Computer Architecture - Pipelining 14/14