# Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.

## Presentation on theme: "Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec."— Presentation transcript:

Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }

n Paper Schedule –22 Students –6 Days –Look at the schedule and email me your preference. Quickly.

A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors: –execution time –scalability –efficiency

A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors: n Also must take into account the costs: –memory requirements –implementation costs –maintenance costs etc.

A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors: n Also must take into account the costs: n Mathematical performance models are used to asses these costs and predict performance.

Defining Performance n How do you define parallel performance? n What do you define it in terms of? n Consider –Distributed databases –Image processing pipeline –Nuclear weapons testbed

Amdahl's Law n Every algorithm has a sequential component. n Sequential component limits speedup Sequential Component Maximum Speedup = 1/s = s

Amdahl's Law s Speedup

What's wrong? n Works fine for a given algorithm. –But what if we change the algorithm? n We may change algorithms to increase parallelism and thus eventually increase performance. –May introduce inefficiency

Metrics for Performance n Speedup n Efficiency n Scalability n Others …………..

Speedup SpeedP Speed S 1  What is Speed? What algorithm for Speed1? What is the work performed? How much work?

Two kinds of Speedup n Relative –Uses parallel algorithm on 1 processor –Most common n Absolute –Uses best known serial algorithm –Eliminates overheads in calculation.

Speedup n Algorithm A –Serial execution time is 10 sec. –Parallel execution time is 2 sec. n Algorithm B –Serial execution time is 2 sec. –Parallel execution time is 1 sec. n What if I told you A = B?

Efficiency p S E  The fraction of time a processor spends doing useful work

Cost (Processor-Time Product) p pTC  p = # processors C T E s 

Performance Measurement n Algorithm X achieved speedup of 10.8 on 12 processors. –What is wrong? n A single point of reference is not enough! n What about asymptotic analysis?

Performance Measurement n There is not a perfect way to measure and report performance. n Wall clock time seems to be the best. n But how much work do you do? n Best Bet: –Develop a model that fits experimental results.

Parallel Programming Steps n Develop algorithm n Develop a model to predict performance n If the performance looks ok then code n Check actual performance against model n Report the performance

Performance Evaluation n Identify the data n Design the experiments to obtain the data n Report data

Performance Evaluation n Identify the data –Execution time –Be sure to examine a range of data points n Design the experiments to obtain the data n Report data

Performance Evaluation n Identify the data n Design the experiments to obtain the data –Make sure the experiment measures what you intend to measure. –Remember: Execution time is max time taken. –Repeat your experiments many times –Validate data by designing a model n Report data

Performance Evaluation n Identify the data n Design the experiments to obtain the data n Report data –Report all information that affects execution –Results should be separate from Conclusions –Present the data in an easily understandable format.

Finite Difference Example n Finite Difference Code n 512 x 512 x 5 Elements n 16 IBM RS6000 workstations n Connected via Ethernet

Finite Difference Model n Execution Time –ExTime = (Tcomp + Tcomm)/P n Communication Time –Tcomm = 2*lat + 4*bw*n*z n Computation Time –Estimate using some sample runs

Estimated Performance

Finite Difference Example

What was wrong? n Ethernet n Change the computation of Tcomm –Reduce the bandwith –Tcomm = 2*lat + 4*bw*n*z*P/2

Finite Difference Example

Download ppt "Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec."

Similar presentations