# PIPELINE AND VECTOR PROCESSING

## Presentation on theme: "PIPELINE AND VECTOR PROCESSING"— Presentation transcript:

PIPELINE AND VECTOR PROCESSING
CHAPTER # 9 PIPELINE AND VECTOR PROCESSING

CONTENTS Parallel Processing Pipelining Arithmetic Pipeline
Instruction Pipeline RISC Pipeline Vector Processing Array Processors

Figure 9-1 Processor with multiple functional units
Adder-sub tractor Integer multiply Logic unit Shift unit Processor register Incrementer To memory Floating-point Add-subtract Floating-point multiply Floating-point divide

Instruction and stream.
Single instruction stream, single data stream (SISD). Single instruction stream, multiple data stream (SIMD). Multiple instruction stream, single data stream (MISD). Multiple instruction stream, multiple data stream (MIMD).

Figure 9-2 Example of Pipelining.
Ai Bi Ci R Ai , R Bi Input Ai and Bi R R1 * R2, R Ci Multiply and input Ci R R3 + R4 Add Ci to product R1 R2 Multiplier R3 R4 Adder R5

1 A1 B1 ---- ---- ---- Content of registers in pipeline example.
Table 9-1 Content of registers in pipeline example. Clock Pulse number Segment1 R R2 Segment2 R R4 Segment3 R5 A B A B A1*B C A B A2*B C A1*B1+C1 A B A3*B C A2*B2+C2 A B A4*B C A3*B3+C3 A B A5*B C A4*B4+C4 A B A6*B C A5*B5+C5 A7*B C A6*B6+C6 A7*B7+C7

Figure 9-3 Four segment pipeline.
Clock Input S1 R1 S2 R2 S3 R3 S4 R4

Figure 9-4 Space-time diagram for pipeline.
Clock cycle 1 2 3 4 5 6 7 8 9 T1 T2 T3 T4 T5 T6 Segment: 1 2 3 4

Figure 9-5 Multiple functional units in parallel.
Ii+3 P3 Ii+2 P2 Ii+1 P1 Ii

Add or subtract the mantissas. Normalize the result.
Arithmetic Pipeline Compare the exponents. Align the mantissas. Add or subtract the mantissas. Normalize the result.

Exponents Mantissas a b A B R Difference
Figure 9-6 Pipeline for floating-point and subtraction. Exponents Mantissas a b A B Segment 1 Segment 2 Segment 3 Segment 4 R Compare Exponent By subtraction Choose exponent Adjust Align mantissas Add or subtract mantissas Normalize result Difference

Instruction Pipeline Fetch the instruction from memory. Decode the instruction. Calculate the effective address. Fetch the operands from memory. Execute the instruction. Store the result in the proper place.

Figure 9-7 Four-segment CPU pipeline.
Decode instruction And calculate Effective address Fetch instruction from memory Branch? Fetch operand From memory Execute instruction Interrupt? Interrupt handling Update PC Empty pipe yes no

Segments and their purpose.
FI is the segment that fetches an instruction. DA is the segment that decodes the instruction and calculate the effective address. FO is the segment that fetches the operand. EX is the segment that executes the instruction.

Figure 9-8 Timing of instruction pipeline.
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13 Instruction: 1 FI DA FO EX 2 FI DA FO EX (Branch) 3 FI DA FO EX 4 FI FI DA FO EX 5 -- -- -- FI DA FO EX 6 FI DA FO EX 7 FI DA FO EX

Pipeline Conflicts Resource conflicts Data dependency conflicts Branch difficulties conflicts

Three-segment instruction pipeline
I: Instruction fetch A: ALU operation E: Execute instruction

Figure 9-9 Three segment pipeline timing.
6 5 4 3 2 1 I Clock cycles A E 1. Load R1 2. Load R2 3. Add R1+R2 4. Store R3 Pipeline timing with data conflict 7 3. No-operation 4. Add R1+R2 5. Store R3 Pipeline timing with delayed load E

Figure 9-10 Examples of delayed branch.
Clock cycles A E 1. Load 2. Increment 3. Add 4. Subtract 10 9 8 7 6 5 4 3 2 1 5. Branch to X 6. NO-operation 7. NO-operation 8. Instruction in X Using no-operation instructions

Figure 9-10 Examples of delayed branch.
2 3 4 5 6 7 8 Clock cycles I A E 1. Load 2. Increment I A E 3. Branch to X I A E 4. Add I A E 5. Subtract I A E 6. Instruction in X I A E Rearranging instruction

Application of Vector Processing
Long range weather forecasting. Petroleum explorations. Seismic data analysis. Medical diagnosis. Aerodynamics and space flight simulations.

Figure 9-11 Instruction format for vector processor

Figure 9-12 Pipeline for calculating an inner product
Source A B Multiplier pipeline Adder

Figure 9-13 Multiple module memory organization
AR DR Memory array Address bus Data bus

Types of Array Processors
Attached Array Processor SIMD Array Processor

Figure 9-14 Attached Array Processor with host computer
General-Purpose computer input-output interface Attached array processor Local memory Main memory High-speed memory to Memory bus

Figure 9-15 SIMD array processor organization
Master control unit Main memory PE1 PE2 PE3 PEn M1 M2 M3 Mn