Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6.

Similar presentations


Presentation on theme: "Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6."— Presentation transcript:

1 Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6

2 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 2 Analyzing Parallel Performance Objectives At the end of this module, you should be able to Define speedup and efficiency Use Amdahls Law to predict maximum speedup Use the Karp-Flatt metric to analyze parallel program performance predict speedup with additional processors

3 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 3 Analyzing Parallel Performance Speedup Speedup is the ratio between sequential execution time and parallel execution time For example, if the sequential program executes in 6 seconds and the parallel program executes in 2 seconds, the speedup is 3 Speedup curves look like this Processors Speedup y = x

4 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 4 Analyzing Parallel Performance Efficiency A measure of processor utilization Speedup divided by the number of processors Example Program achieves speedup of 3 on 4 CPUs Efficiency is 3 / 4 = 75% Efficiency Processors Efficiency curves look like this y = 1.0

5 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 5 Analyzing Parallel Performance Idea Behind Amdahls Law Processors Execution Time f f f f f 1-f (1-f )/2 (1-f )/3 (1-f )/5 (1-f )/4 Portion of computation that will be performed sequentially Portion of computation that will be executed in parallel

6 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 6 Analyzing Parallel Performance Derivation of Amdahls Law Speedup is ratio of execution time on 1 processor to execution time on p processors Execution time on 1 processor is f + (1-f) Execution time on p processors is at least f + (1-f)/p

7 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 7 Analyzing Parallel Performance Amdahls Law Is Too Optimistic Amdahls Law ignores parallel processing overhead Examples of this overhead include time spent creating and terminating threads Parallel processing overhead is usually an increasing function of the number of processors

8 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 8 Analyzing Parallel Performance Graph with Parallel Overhead Added Processors Execution Time Parallel overhead increases with # of processors

9 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 9 Analyzing Parallel Performance Other Optimistic Assumptions Amdahls Law assumes that the computation divides evenly among the processors In reality, the amount of work does not divide evenly among the processors Processor waiting time is another form of overhead Task started Task completed Working time Waiting time

10 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 10 Analyzing Parallel Performance Graph with Workload Imbalance Added Processors Execution Time Time lost due to workload imbalance

11 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 11 Analyzing Parallel Performance More General Speedup Formula (n,p)Speedup for problem of size n on p CPUs (n)Time spent in sequential portion of code for problem of size n (n)Time spent in parallelizable portion of code for problem of size n (n,p)Parallel overhead

12 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 12 Analyzing Parallel Performance Amdahls Law: Maximum Speedup This term is set to 0 Assumes parallel work divides perfectly among available CPUs

13 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 13 Analyzing Parallel Performance The Amdahl Effect As n these terms dominate Speedup is an increasing function of problem size

14 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 14 Analyzing Parallel Performance Illustration of the Amdahl Effect n = 100,000 n = 10,000 n = 1,000 Processors Speedup Linear speedup

15 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 15 Analyzing Parallel Performance Using Amdahls Law Program executes in 5 seconds Profile reveals 80% of time spent in function alpha, which we can execute in parallel What would be maximum speedup on 2 processors? New execution time 5 sec / 1.67 = 3 seconds

16 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 16 Analyzing Parallel Performance The Karp-Flatt Metric Suppose we benchmark a parallel program and get these speedup figures Why is efficiency dropping? How much speedup could we expect on 8 processors? ProcessorsSpeedupEfficiency % % 4250%

17 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 17 Analyzing Parallel Performance Deriving the Karp-Flatt Metric The denominator represents parallel execution time One processor does sequential code; others idle All processors incur overhead time Wasted time = (p-1) (n) + p (n, p) Experimentally determined serial fraction = wasted time divided by ( p -1) times sequential time

18 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 18 Analyzing Parallel Performance Karp-Flatt Metric The experimentally determined serial fraction is a function of speedup and the number of processors We can use e to determine whether efficiency decreases are due to Sequential component of computation Increases in overhead

19 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 19 Analyzing Parallel Performance How to Interpret e If e is constant as the number of processors increases, then speedup is constrained by the sequential component of the computation If e is increasing as the number of processors increases, then speedup is constrained by parallel overhead, such as Thread creation/termination time Contention for shared data structures Cache-related inefficiencies Often a combination of the two factors

20 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 20 Analyzing Parallel Performance Going Back to Our Example ProcessorsSpeedupEfficiencye % % %0.33 In this case, speedup is constrained by the relatively large amount of time spent in sequential code

21 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 21 Analyzing Parallel Performance Example: Rectangle Rule Program Benchmark data from an OpenMP program computing using the rectangle rule We can predict speedup on 6 processors Extrapolate e to be 0.11 Speedup would be 3.87 ProcessorsSpeedupEfficiencye % % %0.089

22 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 22 Analyzing Parallel Performance Speedup Prediction Formula

23 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 23 Analyzing Parallel Performance Case Study We benchmark a sequential program and find it spends 85% of its time in functions we believe we can make parallel We make these functions multithreaded and execute the program on a dual-core system The parallel program achieves a speedup of 1.67 on 2 processors If we can get access to a quad-core system, what kind of speedup should we expect?

24 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 24 Analyzing Parallel Performance Prediction Based on Amdahls Law

25 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 25 Analyzing Parallel Performance Prediction Based on Karp-Flatt Metric When p = 2, e = 0.25 We know 0.15 of e is sequential component Rest of e (0.05) is parallel overhead If parallel overhead increases linearly with number of processors, then it will be 0.15 when p = 3 We predict when p = 4, e = 0.30 Hence when p = 4, we predict speedup of 2.11

26 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 26 Analyzing Parallel Performance Superlinear Speedup According to our general speedup formula, the maximum speedup a program can achieve on p processors is p Superlinear speedup is the situation where speedup is greater than the number of processors used It means the computational rate of the processors is faster when the parallel program is executing Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher

27 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 27 Analyzing Parallel Performance References Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

28 Copyright © 2006, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners. Intel ® Software College 28 Analyzing Parallel Performance


Download ppt "Analyzing Parallel Performance Intel Software College Introduction to Parallel Programming – Part 6."

Similar presentations


Ads by Google