INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10.

Slides:

Advertisements

Similar presentations

Implementing Task Decompositions Intel Software College Introduction to Parallel Programming – Part 5.

Advertisements

Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.

Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.

Improving Parallel Performance Intel Software College Introduction to Parallel Programming – Part 7.

Implementing Domain Decompositions Intel Software College Introduction to Parallel Programming – Part 3.

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9.

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Distributed Systems CS

INTEL CONFIDENTIAL Improving Parallel Performance Introduction to Parallel Programming – Part 11.

Potential for parallel computers/parallel programming

INTEL CONFIDENTIAL Deadlock Introduction to Parallel Programming – Part 7.

11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.

Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.

1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.

Chapter 7 Performance Analysis. 2 Additional References Selim Akl, “Parallel Computation: Models and Methods”, Prentice Hall, 1997, Updated online version.

CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.

Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.

Performance Metrics Parallel Computing - Theory and Practice (2/e) Section 3.6 Michael J. Quinn mcGraw-Hill, Inc., 1994.

CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.

Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.

Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.

INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.

INTEL CONFIDENTIAL Confronting Race Conditions Introduction to Parallel Programming – Part 6.

INTEL CONFIDENTIAL OpenMP for Task Decomposition Introduction to Parallel Programming – Part 8.

INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.

Intel® Processor Architecture: Multi-core Overview Intel® Software College.

INTEL CONFIDENTIAL Reducing Parallel Overhead Introduction to Parallel Programming – Part 12.

INTEL CONFIDENTIAL Parallel Decomposition Methods Introduction to Parallel Programming – Part 2.

INTEL CONFIDENTIAL Finding Parallelism Introduction to Parallel Programming – Part 3.

Programming Models using Windows* Threads Intel Software College.

Rechen- und Kommunikationszentrum (RZ) Parallelization at a Glance Christian Terboven / Aachen, Germany Stand: Version 2.3.

Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.

Recognizing Potential Parallelism Intel Software College Introduction to Parallel Programming – Part 1.

Multi-core Programming: Basic Concepts. Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered.

Performance Evaluation of Parallel Processing. Why Performance?

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:

1 Multithreaded Programming Concepts Myongji University Sugwon Hong 1.

Lecturer: Simon Winberg Lecture 18 Amdahl’s Law (+- 25 min)

Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

Parallel Programming with MPI and OpenMP

Copyright © 2003, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries.

INTEL CONFIDENTIAL Shared Memory Considerations Introduction to Parallel Programming – Part 4.

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.

Concurrency and Performance Based on slides by Henri Casanova.

1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.

Thinking in Parallel - Introduction New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Tuning Threaded Code with Intel® Parallel Amplifier.

1 Parallel Processing Fundamental Concepts. 2 Selection of an Application for Parallelization Can use parallel computation for 2 things: –Speed up an.

Distributed and Parallel Processing George Wells.

Potential for parallel computers/parallel programming

Parallel Computing and Parallel Computers

Introduction to Parallelism.

Parallel Processing Sharing the load.

ე ვ ი ო Ш Е Т И О А С Д Ф К Ж З В Н М W Y U I O S D Z X C V B N M

PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.

Parallelism and Amdahl's Law

Mattan Erez The University of Texas at Austin

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Potential for parallel computers/parallel programming

Presentation transcript:

INTEL CONFIDENTIAL Predicting Parallel Performance Introduction to Parallel Programming – Part 10

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Review & Objectives Previously: Design and implement of a task decomposition solution At the end of this part you should be able to: Define speedup and efficiency Use Amdahl’s Law to predict maximum speedup 2

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Speedup Speedup is the ratio between sequential execution time and parallel execution time For example, if the sequential program executes in 6 seconds and the parallel program executes in 2 seconds, the speedup is 3X 3 Speedup curves look like this Cores Speedup

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Efficiency A measure of core utilization Speedup divided by the number of cores Example Program achieves speedup of 3 on 4 cores Efficiency is 3 / 4 = 75% 4 Efficiency Cores Efficiency curves look like this

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Speedup Example Painting a picket fence –30 minutes of preparation (serial) –One minute to paint a single picket –30 minutes of cleanup (serial) Thus, 300 pickets takes 360 minutes (serial time) 5 Speedup and Efficiency

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Computing Speedup 6 Number of painters TimeSpeedup = X = X = 904.0X = 635.7X Infinite = 606.0X Speedup and Efficiency

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Efficiency Example 7 Number of painters TimeSpeedupEfficiency X100% = X85% = 904.0X40% = 635.7X5.7% Infinite = 606.0Xvery low Speedup and Efficiency

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Idea Behind Amdahl’s Law 8 Cores Execution Time s s s s s 1-s (1-s )/2 (1-s )/3 (1-s )/5 (1-s )/4 Portion of computation that will be performed sequentially Portion of computation that will be executed in parallel

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Derivation of Amdahl’s Law Speedup is ratio of execution time on 1 core to execution time on p cores Execution time on 1 core is s + (1-s) Execution time on p cores is at least s + (1-s)/p 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Amdahl’s Law Is Too Optimistic Amdahl’s Law ignores parallel processing overhead Examples of this overhead include time spent creating and terminating threads Parallel processing overhead is usually an increasing function of the number of cores (threads) 10

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Graph with Parallel Overhead Added 11 Cores Execution Time Parallel overhead increases with # of cores

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Other Optimistic Assumptions Amdahl’s Law assumes that the computation divides evenly among the cores In reality, the amount of work does not divide evenly among the cores Core waiting time is another form of overhead 12 Task started Task completed Working time Waiting time

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Graph with Workload Imbalance Added 13 Cores Execution Time Time lost due to workload imbalance

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Illustration of the Amdahl Effect 14 n = 100,000 n = 10,000 n = 1,000 Cores Speedup Linear speedup

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Using Amdahl’s Law Program executes in 5 seconds Profile reveals 80% of time spent in function alpha, which we can execute in parallel What would be maximum speedup on 2 cores? New execution time ≥ 5 sec / 1.67 = 3 seconds 15

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Superlinear Speedup According to our general speedup formula, the maximum speedup a program can achieve on p cores is p Superlinear speedup is the situation where speedup is greater than the number of cores used It means the computational rate of the cores is faster when the parallel program is executing Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher 16

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. References Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004). 17

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. More General Speedup Formula 19 (n,p) Speedup for problem of size n on p cores (n) Time spent in sequential portion of code for problem of size n (n) Time spent in parallelizable portion of code for problem of size n (n,p) Parallel overhead

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. Amdahl’s Law: Maximum Speedup 20 This term is set to 0 Assumes parallel work divides perfectly among available cores

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners. The Amdahl Effect 21 As n   these terms dominate Speedup is an increasing function of problem size