Potential for parallel computers/parallel programming

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Distributed Systems CS

Concurrency The need for speed. Why concurrency? Moore’s law: 1. The number of components on a chip doubles about every 18 months 2. The speed of computation.

Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 23, 2013.

Chapter 1 Parallel Computers.

Parallel Computers Chapter 1

1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.

Parallel System Performance CS 524 – High-Performance Computing.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.

Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.

CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.

Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.

Parallel System Performance CS 524 – High-Performance Computing.

1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1.

CS 420 Design of Algorithms Analytical Models of Parallel Algorithms.

Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.

Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.

Performance Evaluation of Parallel Processing. Why Performance?

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

“elbowing out” Processors used Speedup Efficiency timeexecution Parallel Processors timeexecution Sequential Efficiency   

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your.

Grid Computing, B. Wilkinson, 20047a.1 Computational Grids.

1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

Compiled by Maria Ramila Jimenez

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

CSCI-455/522 Introduction to High Performance Computing Lecture 1.

Parallel Programming with MPI and OpenMP

CSCI-455/552 Introduction to High Performance Computing Lecture 6.

Advanced Computer Networks Lecture 1 - Parallelization 1.

CSCI-455/552 Introduction to High Performance Computing Lecture 23.

Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.

August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,

Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.

Classification of parallel computers Limitations of parallel processing.

Distributed and Parallel Processing George Wells.

Parallel Computers Chapter 1.

Potential for parallel computers/parallel programming

Parallel Computing and Parallel Computers

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Parallel Computers.

CSE8380 Parallel and Distributed Processing Presentation

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt Oct 24, 2013.

Quiz Questions Parallel Programming Parallel Computing Potential

By Brandon, Ben, and Lee Parallel Computing.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2012 slides5.ppt March 20, 2014.

Pipeline Pattern ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson slides5.ppt August 17, 2014.

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Parallel Computing and Parallel Computers

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Complexity Measures for Parallel Computation

Potential for parallel computers/parallel programming

Quiz Questions Parallel Programming Parallel Computing Potential

Quiz Questions Parallel Programming Parallel Computing Potential

Quiz Questions Parallel Programming Parallel Computing Potential

Potential for parallel computers/parallel programming

Presentation transcript:

Potential for parallel computers/parallel programming Before we embark on using a parallel computer, we need to establish whether we can obtain increased execution speed with an application and what the constraints are. ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson. slides1a-2 Dec 27, 2012

Speedup Factor ts S(p) = = tp Execution time using one processor (best sequential algorithm) S(p) = = tp Execution time using a multiprocessor with p processors where ts is execution time on a single processor, and tp is execution time on a multiprocessor. S(p) gives increase in speed by using multiprocessor. Typically use best sequential algorithm for single processor system. Underlying algorithm for parallel implementation might be (and is usually) different.

Parallel time complexity Can extend sequential time complexity notation to parallel computations. Speedup factor can also be cast in terms of computational steps: although several factors may complicate this matter, including: Individual processors usually do not operate in synchronism and do not perform their steps in unison Additional message passing overhead Number of computational steps using one processor S(p) = Number of parallel computational steps with p processors

Maximum Speedup Maximum speedup usually p with p processors (linear speedup). Possible to get superlinear speedup (greater than p) but usually a specific reason such as: Extra memory in multiprocessor system Nondeterministic algorithm

Superlinear Speedup example - Searching (a) Searching each sub-space sequentially t s /p Start Time D Solution found x Sub-space search indeterminate

S(p) x t s p ´ D + = Speed-up: Solution found D t (b) Searching each sub-space in parallel S(p) x t s p ´ D + = Speed-up:

S(p) p 1 – t s D + ´ ¥ ® = S(p) = t D = 1 as t tends to 0 Worst case for sequential search when solution found in last sub-space search. Greatest benefit for parallel version S(p) p 1 – t s D + ´ ¥ ® = as t tends to 0 Least advantage for parallel version when solution found in first sub-space search of sequential search, i.e. S(p) = t D = 1 Actual speed-up depends upon which subspace holds solution but could be extremely large.

Why superlinear speedup should not be possible in other cases: Without such features as a nondeterministic algorithm or additional memory in parallel computer if we were to get superlinear speed up, then we could argue that the sequential algorithm used is sub-optimal as the parallel parts in parallel algorithm could simply done in sequence on a single computer for a faster sequential solution.

Maximum speedup with parts not parallelizable (usual case) Amdahl’s law (circa 1967) ft (1 - f ) t s s Serial section Parallelizable sections (a) One processor (b) Multiple processors Note the serial section is bunched at the beginning but in practice may be at the beginning, end, and inbetween p processors (1 - f ) t / p t s p

Speedup factor is given by: This equation is known as Amdahl’s law.

Speedup against number of processors 4 8 12 16 20 f = 20% = 10% = 5% = 0% Number of processors , p

Even with infinite number of processors, maximum speedup limited to 1/f. Example With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors. This is a very discouraging result. Amdahl used this argument to support the design of ultra-high speed single processor systems in the 1960s.

Gustafson’s law Later, Gustafson (1988) described how the conclusion of Amdahl’s law might be overcome by considering the effect of increasing the problem size. He argued that when a problem is ported onto a multiprocessor system, larger problem sizes can be considered, that is, the same problem but with a larger number of data values.

Gustafson’s law Starting point for Gustafson’s law is the computation on the multiprocessor rather than on the single computer. In Gustafson’s analysis, parallel execution time kept constant, which we assume to be some acceptable time for waiting for the solution.

Gustafson’s law* Using same approach as for Amdahl’s law: Conclusion - almost linear increase in speedup with increasing number of processors, but fractional sequential part f ‘ needs to diminish with increasing problem size. Example: if f ‘ is 5%, scaled speedup is 19.05 with 20 processors whereas with Amdahl’s law and f = 5% speedup is 10.26. Gustafson quotes results obtained in practice of very high speedup close to linear on a 1024-processor hypercube. f ’tp + (1 – f ’)ptp S’(p) = = p - (p – 1)f ’ tp * See Wikipedia http://en.wikipedia.org/wiki/Gustafson%27s_law for original derivation of Gustafson's law.

Next question to answer is how does one construct a computer system with multiple processors to achieve the speed-up?