Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter One Introduction to Pipelined Processors.

Similar presentations


Presentation on theme: "Chapter One Introduction to Pipelined Processors."— Presentation transcript:

1 Chapter One Introduction to Pipelined Processors

2 Principle of Designing Pipeline Processors (Design Problems of Pipeline Processors)

3 Instruction Prefetch and Branch Handling The instructions in computer programs can be classified into 4 types: – Arithmetic/Load Operations (60%) – Store Type Instructions (15%) – Branch Type Instructions (5%) – Conditional Branch Type (Yes – 12% and No – 8%)

4 Instruction Prefetch and Branch Handling Arithmetic/Load Operations (60%) : – These operations require one or two operand fetches. – The execution of different operations requires a different number of pipeline cycles

5 Instruction Prefetch and Branch Handling Store Type Instructions (15%) : – It requires a memory access to store the data. Branch Type Instructions (5%) : – It corresponds to an unconditional jump.

6 Instruction Prefetch and Branch Handling Conditional Branch Type (Yes – 12% and No – 8%) : – Yes path requires the calculation of the new address – No path proceeds to next sequential instruction.

7 Instruction Prefetch and Branch Handling Arithmetic-load and store instructions do not alter the execution order of the program. Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers.

8 Handling Example – Interrupt System of Cray1

9 Cray-1 System The interrupt system is built around an exchange package. When an interrupt occurs, the Cray-1 saves 8 scalar registers, 8 address registers, program counter and monitor flags. These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register

10 Instruction Prefetch and Branch Handling In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor.

11 Effect of Branching on Pipeline Performance Consider a linear pipeline of 5 stages Fetch Instruction Decode Fetch Operands Execute Store Results

12 Overlapped Execution of Instruction without branching I1I1 I2I2 I3I3 I4I4 I5I5 I6I6 I7I7 I8I8

13 I5 is a branch instruction I1I1 I2I2 I3I3 I4I4 I5I5 I6I6 I7I7 I8I8

14 Estimation of the effect of branching on an n-segment instruction pipeline

15 Estimation of the effect of branching Consider an instruction cycle with n pipeline clock periods. Let – p – probability of conditional branch (20%) – q – probability that a branch is successful (60% of 20%) (12/20=0.6)

16 Estimation of the effect of branching Suppose there are m instructions Then no. of instructions of successful branches = mxpxq (mx0.2x0.6) Delay of (n-1)/n is required for each successful branch to flush pipeline.

17 Estimation of the effect of branching Thus, the total instruction cycle required for m instructions =

18 Estimation of the effect of branching As m becomes large, the average no. of instructions per instruction cycle is given as =?

19 Estimation of the effect of branching As m becomes large, the average no. of instructions per instruction cycle is given as

20 Estimation of the effect of branching When p =0, the above measure reduces to n, which is ideal. In reality, it is always less than n.

21 Solution = ?

22 Multiple Prefetch Buffers Three types of buffers can be used to match the instruction fetch rate to pipeline consumption rate 1.Sequential Buffers: for in-sequence pipelining 2.Target Buffers: instructions from a branch target (for out-of-sequence pipelining)

23 Multiple Prefetch Buffers A conditional branch cause both sequential and target to fill and based on condition one is selected and other is discarded

24 Multiple Prefetch Buffers 3.Loop Buffers – Holds sequential instructions within a loop

25 Data Buffering and Busing Structures

26 Speeding up of pipeline segments The processing speed of pipeline segments are usually unequal. Consider the example given below: S1S2S3 T1T2T3

27 Speeding up of pipeline segments If T1 = T3 = T and T2 = 3T, S2 becomes the bottleneck and we need to remove it How? One method is to subdivide the bottleneck – Two divisions possible are:

28 Speeding up of pipeline segments First Method: S1 TT2T S3 T

29 Speeding up of pipeline segments First Method: S1 TT2T S3 T

30 Speeding up of pipeline segments Second Method: S1 TTT S3 T T

31 Speeding up of pipeline segments If the bottleneck is not sub-divisible, we can duplicate S2 in parallel S1 S2 S3 T 3T T S2 3T S2 3T

32 Speeding up of pipeline segments Control and Synchronization is more complex in parallel segments

33 Data Buffering Instruction and data buffering provides a continuous flow to pipeline units Example: 4X TI ASC

34 In this system it uses a memory buffer unit (MBU) which – Supply arithmetic unit with a continuous stream of operands – Store results in memory The MBU has three double buffers X, Y and Z (one octet per buffer) – X,Y for input and Z for output

35 Example: 4X TI ASC This provides pipeline processing at high rate and alleviate mismatch bandwidth problem between memory and arithmetic pipeline

36 Busing Structures PBLM: Ideally subfunctions in pipeline should be independent, else the pipeline must be halted till dependency is removed. SOLN: An efficient internal busing structure. Example : TI ASC

37 In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.


Download ppt "Chapter One Introduction to Pipelined Processors."

Similar presentations


Ads by Google