Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallelism Processing more than one instruction at a time. Pipelining

Similar presentations


Presentation on theme: "Parallelism Processing more than one instruction at a time. Pipelining"— Presentation transcript:

1 Parallelism Processing more than one instruction at a time. Pipelining
Speed-up due to pipelining Briefly consider multi-processor systems

2 Last time Looked at How instructions are executed in a computer.

3 Instruction parallelism
One method of improving performance machines is to increase clock speed, but this is a limited and ‘brute force’ approach. So parallelism (which is doing more than one thing at a time) has been used as an approach to get more performance for a given clock speed.

4 instruction-level parallelism
processor-level parallelism. Individual instruction-level parallelism is used within individual instructions to get more instructions/sec out of a machine: Pipelining.

5 Pipelining is a key technique used to make faster CPUs.
Processors allow instructions to be executed in stages; stages implemented using separate hardware. Stages connected together forming an instruction pipeline, allowing more than one instruction to be processed at the same time.

6 Tannebaum, 2006: The instruction fetch stage (IF): - fetches instructions from memory or an instruction cache. It requires use of the fetch unit controlling the PC and buses to gain access to memory. The instruction decode (ID):- uses the control unit (CU) to decode instructions and identify any source operands. Intermediate operands and operands stored in the register file moved into temporary ALU registers during this stage. The execution stage (EX):- Arithmetic Logic Unit (ALU) performs operations on operand stored in input registers, and stores result in temporary ALU output registers. The write-back (WB):- The contents of ALUs temporary output registers are copied to the register file.

7 Fetch-execute cycle stages fetch decode execute write-back 1
instruction 1 2 3 4 5 instruction 2 6 7 8

8 Pipelining

9 An analogy of pipelining
Imagine a warehouse packing plant. Worker 1 puts a box on the conveyer belt; worker 2 puts the product into box and seals it, worker 3 puts address-label on box, and worker 4 picks the box to be delivered. However, after each time the worker has finished the task they do not wait for the whole procedure to be finished by the final worker, they are getting the next job (putting another empty box on the conveyer belt). Pipelining instructions works like this - several processing stages before it is completed, but the stages are working in a parallel.

10 Speed up If a k-stage pipeline executes n instructions using a clock with a cycle time t, without overlapping instructions the total time to execute instructions will be So if 4 stages are used (k=4), 4 instructions (n=4), and t=1s, Ts=16s.

11 If instructions are executed in parallel, where kt is the time to fill-up to the point where the first instruction completes and (n-1)t is time taken for the remaining (n-1) instructions at a rate of one per clock cycle.

12 Speedup factor, S=Ts/Tp=(nk)/(k+n-1) If n=50 (50 instructions in a sequence) S= (50*4)/(4+50-1)=3.77 If n=100 S=3.88

13 Problems with Pipelining
Unfortunately branching instructions alter the program flow. So a pipeline can become filled with instructions that are no longer needed, which are flushed from the pipe, so it can be filled with a new stream of instructions. This ‘wastes’ clock-cycles. Resource Hazards: If two stages need to access same resource (e.g. ALU, same register). One solution is too duplicate the hardware, but being aware that delays can still occur depending on the order of the instructions.

14 Pipelines are included in some CISC processors (e. g
Pipelines are included in some CISC processors (e.g. MC68040 and Intel 80486). but software written for these did not in general make effective use of pipelining, and therefore speeding-up offered by the approach. In superpipeling instructions are broken down into even finer steps by lengthening the pipeline (adding stages).

15 An alternative is if one pipeline is good, why not increase the number of pipelines, allowing multiple instructions to be issued in the same clock cycle. Complex rules are used to determine whether a pair of instructions can be executed in parallel, in a similar way to the scheduling discussed earlier.

16 Superscalar processors such as the Intel Pentium uses multiple pipelines, and uses scheduling of variable length blocks of instructions.

17 Superscalar Architecture
Modified version of figure 2-6 Tannebaum (2006) pg 65

18 In a superscalar approach contains a single pipeline but with multiple functional units, in the execution stage (e.g. such as multiple ALUs, floating point processor). The operand decode stage (s3) must be quicker than the functional units are able to execute the instructions. So no more than one functional unit will be busy at once. Functional units are implemented in hardware. Pentium II had a similar structure to that shown previously.

19 Other forms of Parallelism
Multiprocessors: Consists of a large number of identical processors that operate in parallel, usually sharing a common memory. Since each CPUs can read and write to memory, software is usually used to co-ordinate the CPUs to avoid clashes. One design is to give each CPU local memory of its own, as well as the main memory. This extra memory can be used for program code and data not used by other CPUs, reducing the amount of BUS traffic to main memory.

20 Multi-computers:- CPUs communicate over networks
Multi-computers:- CPUs communicate over networks. Private memory not shared memory, but gives the illusion of it. Multiprocessor systems are easier to program, but multi-computer systems are easier to build. ‘Soupercomputer’:

21 Reference and further reading
Tannebaum AS(2006) Structured Computer Organisation pg 65, ISBN Soupercomputer’:


Download ppt "Parallelism Processing more than one instruction at a time. Pipelining"

Similar presentations


Ads by Google