Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002

1. Quote 32-bit performance results, not 64-bit results 32-bit performance generally faster, but 64-bit arithmetic often needed for types of applications performed on supercomputers

2. Present performance figures for an inner kernel, and then represent these figures as the performance of the entire application. Although the application typically spends a good deal of time in the inner kernel, it tends to exhibit greater parallelism than the overall application. Thus, representing speedups for the inner kernel as representative of speedup for overall application is misleading. See: Amdahl’s Law

3. Quietly employ assembly code and other low-level language constructs. The compiler for a parallel supercomputer may not take full advantage of the hardware of the system. Using assembly code or other low-level constructs will permit better use of the underlying hardware. However, the use of such low-level constructs should be reported when providing performance results.

4. Scale up the problem size with the number of processors, but omit any mention of this fact. For a fixed problem size, as you add more processors, the benefits of additional processors drops off as you introduce more overhead relative to the amount of computation done, and speed-up is thus less than linear. Scaling up the size of the problem as you add processors improves the ratio of useful work to overhead. Failing to state how you’ve measured speedup is misleading.

5. Quote performance results projected to a full system. Such projections assume linear functions – not likely true.

6.Compare your results against scalar, unoptimized code on Crays. You should compare your parallel version of a code to the best serial implementation that is known. Similarly, you should compare your parallel version of a code to the best implementation on whatever architecture you’re comparing to – not to the naïve version or worst version.

7.When direct run time comparisons are required, compare with an old code on an obsolete system. Same idea here …

8. If MFLOPS rates must be quoted, base the operation count on the parallel implementation, not on the best serial implementation. Parallel version for single processor is typically slower than serial version – due to added overhead

9. Quote performance in terms of processor utilization, parallel speedups or MFLOPS per dollar. Runtime or MFLOPS, though likely more informative, don’t make your codes look quite so impressive

10.Mutilate the algorithm used in the parallel implementation to match the architecture. For example: to get higher MFLOPS (but longer run time)

11.Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment. Again, you should be comparing “your best” to “their best”.

12. If all else fails, show pretty pictures and animated videos, and don’t talk about performance. … you get the idea ….

Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

Similar presentations

Presentation on theme: "Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.

Similar presentations

Presentation on theme: "Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002."— Presentation transcript:

Similar presentations

About project

Feedback