1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Lecture 19: Parallel Algorithms
OMSE 510: Computing Foundations 4: The CPU!
Pipelining I Topics Pipelining principles Pipeline overheads Pipeline registers and stages Systems I.
Pipeline and Vector Processing (Chapter2 and Appendix A)
Chapter 8. Pipelining.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
EECS 318 CAD Computer Aided Design LECTURE 2: DSP Architectures Instructor: Francis G. Wolff Case Western Reserve University This presentation.
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Chapter Six 1.
1 5.1 Pipelined Computations. 2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming).
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Numerical Algorithms Matrix multiplication
Numerical Algorithms • Matrix multiplication
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
Pipelined Computations Divide a problem into a series of tasks A processor completes a task sequentially and pipes the results to the next processor Pipelining.
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Science Education
Integrated Circuits Costs
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
Pipelined Computations P0P1P2P3P5. Pipelined Computations a s in s out a[0] a s in s out a[1] a s in s out a[2] a s in s out a[3] a s in s out a[4] for(i=0.
CSCI-455/552 Introduction to High Performance Computing Lecture 13.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Chun-Yuan Lin OpenMP-Programming training-5. “Type 2” Pipeline Space-Time Diagram.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Pipelining Example Laundry Example: Three Stages
MS108 Computer System I Lecture 5 Pipeline Prof. Xiaoyao Liang 2015/3/27 1.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Chapter One Introduction to Pipelined Processors.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.
DICCD Class-08. Parallel processing A parallel processing system is able to perform concurrent data processing to achieve faster execution time The system.
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Lecture 18: Pipelining I.
Pipelines An overview of pipelining
Review: Instruction Set Evolution
Chapter One Introduction to Pipelined Processors
Pipelined Computations
Chapter 4 The Processor Part 2
Decomposition Data Decomposition Functional Decomposition
Numerical Algorithms • Parallelizing matrix multiplication
Pipelined Computations
Systems I Pipelining I Topics Pipelining principles Pipeline overheads
Introduction to High Performance Computing Lecture 12
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.
Pipelined Pattern This pattern is implemented in Seeds, see
An Introduction to pipelining
Chapter 8. Pipelining.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.
Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.
8086 processor.
Pipelining Appendix A and Chapter 3.
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Pipelining.
Presentation transcript:

1 5.1 Pipelined Computations

2 Problem divided into a series of tasks that have to be completed one after the other (the basis of sequential programming). Each task executed by a separate process or processor

TRADITIONAL PIPELINE CONCEPT Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes ABCD

TRADITIONAL PIPELINE CONCEPT Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? ABCD PM Midnight Time

TRADITIONAL PIPELINE CONCEPT Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM Midnight TaskOrderTaskOrder Time

TRADITIONAL PIPELINE CONCEPT Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup ABCD 6 PM 789 TaskOrderTaskOrder Time

7 Example Add all the elements of array a to an accumulating sum: for (i = 0; i < n; i++) sum = sum + a[i]; The loop could be “unfolded” to yield sum = sum + a[0]; sum = sum + a[1]; sum = sum + a[2]; sum = sum + a[3]; sum = sum + a[4];.

8 Pipeline for an unfolded loop

9 Where pipelining can be used to good effect Assuming problem can be divided into a series of sequential tasks, pipelined approach can provide increased execution speed under the following three types of computations: 1. If more than one instance of the complete problem is to be Executed 2. If a series of data items must be processed, each requiring multiple operations 3. If information to start next process can be passed forward before process has completed all its internal operations

SIX STAGE INSTRUCTION PIPELINE 10

TIMING DIAGRAM FOR INSTRUCTION PIPELINE OPERATION 11

12 “Type 1” Pipeline Space-Time Diagram

13 Alternative space-time diagram

14 “Type 2” Pipeline Space-Time Diagram

15 “Type 3” Pipeline Space-Time Diagram Pipeline processing where information passes to next stage before previous state completed.

16 If the number of stages is larger than the number of processors in any pipeline, a group of stages can be assigned to each processor:

17 Computing Platform for Pipelined Applications Multiprocessor system with a line configuration

18 Example Pipelined Solutions (Examples of each type of computation)

19 Pipeline Program Examples Adding Numbers Type 1 pipeline computation

20 Basic code for process Pi : recv(&accumulation, P i-1 ); accumulation = accumulation + number; send(&accumulation, P i+1 ); except for the first process, P 0, which is send(&number, P 1 ); and the last process, P n-1, which is recv(&number, P n-2 ); accumulation = accumulation + number;

21 SPMD program if (process > 0) { recv(&accumulation, P i-1 ); accumulation = accumulation + number; } if (process < n-1) send(&accumulation, P i+1 ); The final result is in the last process. Instead of addition, other arithmetic operations could be done.

22 Pipelined addition numbers Master process and ring configuration

SORTING NUMBERS INSERTION SORT ALGORITHM For each array element from the second to the last (nextPos = 1)  Insert the element at nextPos where it belongs in the array, increasing the length of the sorted subarray by 1 23

24 Sorting Numbers A parallel version of insertion sort.

25 Pipeline for sorting using insertion sort Type 2 pipeline computation

26 The basic algorithm for process P i is recv(&number, P i-1 ); if (number > x) { send(&x, P i+1 ); x = number; } else send(&number, P i+1 ); With n numbers, number ith process is to accept = n - i. Number of passes onward = n - i - 1 Hence, a simple loop could be used.

27 Insertion sort with results returned to master process using bidirectional line configuration

28 Insertion sort with results returned

PRIME NUMBER GENERATION SIEVE OF ERATOSTHENES Next_Prime = 2 ====> Mark multiples of 2 as non-prime Next_Prime = 3 ====> Mark multiples of 3 as non-prime.

30 Prime Number Generation Sieve of Eratosthenes Series of all integers generated from 2. First number, 2, is prime and kept. All multiples of this number deleted as they cannot be prime. Process repeated with each remaining number. The algorithm removes non-primes, leaving only primes. Type 2 pipeline computation

31 The code for a process, P i, could be based upon recv(&x, P i-1 ); /* repeat following for each number */ recv(&number, P i-1 ); if ((number % x) != 0) send(&number, P i+1 ); Each process will not receive the same number of numbers and is not known beforehand. Use a “terminator” message, which is sent at the end of the sequence: recv(&x, P i-1 ); for (i = 0; i < n; i++) { recv(&number, P i-1 ); If (number == terminator) break; If((number % x) != 0) send(&number, P i+1 ); }

32 GAUSSIAN ELIMINATION (GE) FOR SOLVING AX=B … for each column i … zero it out below the diagonal by adding multiples of row i to later rows for i = 1 to n-1 … for each row j below row i for j = i+1 to n … add a multiple of row i to row j tmp = A(j,i); for k = i to n A(j,k) = A(j,k) - (tmp/A(i,i)) * A(i,k) After i=1After i=2After i=3After i=n-1 …

33 Solving a System of Linear Equations Upper-triangular form where a’s and b’s are constants and x’s are unknowns to be found.

34 Back Substitution First, unknown x 0 is found from last equation; i.e., Value obtained for x 0 substituted into next equation to obtain x 1 ; i.e., Values obtained for x 1 and x 0 substituted into next equation to obtain x 2 : and so on until all the unknowns are found.

35 Pipeline Solution First pipeline stage computes x 0 and passes x 0 onto the second stage, which computes x 1 from x 0 and passes both x 0 and x 1 onto the next stage, which computes x 2 from x 0 and x 1, and so on. Type 3 pipeline computation

36 The ith process (0 < i < n) receives the values x 0, x 1, x 2, …, x i-1 and computes x i from the equation:

37 Sequential Code Given constants a i,j and b k stored in arrays a[ ][ ] and b[ ], respectively, and values for unknowns to be stored in array, x[ ], sequential code could be x[0] = b[0]/a[0][0]; /* computed separately */ for (i = 1; i < n; i++) { /*for remaining unknowns*/ sum = 0; For (j = 0; j < i; j++) sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; }

38 Parallel Code Pseudo-code of process P i (1 < i < n) of could be for (j = 0; j < i; j++) { recv(&x[j], P i-1 ); send(&x[j], P i+1 ); } sum = 0; for (j = 0; j < i; j++) sum = sum + a[i][j]*x[j]; x[i] = (b[i] - sum)/a[i][i]; send(&x[i], P i+1 ); Now have additional computations to do after receiving and resending values.

39 Pipeline processing using back substitution