Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley.

Similar presentations


Presentation on theme: "Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley."— Presentation transcript:

1 Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley http://www.cs.berkeley.edu/~culler/talks

2 11/5/99LLNL ASCI III2 Traditional Parallel Programming Tools Focus on showing “what program did” and “when it did it” –microscopic analysis of deterministic events –oriented towards initial development of small programs on small data sets and small machines Instrumentation –traces, counters, profiles Visualization Examples –AIMS, PTOOLS, PPP –pablo + paradyn +... => delphi –ACTS TAU - tuning and analysis util.

3 11/5/99LLNL ASCI III3 Example: Pablo

4 11/5/99LLNL ASCI III4 Beyond Zeroth-order Analysis Basic level to get to a system design that is reasonable and behaves properly under “ideal condition” Subject the system to various stresses to understand its operating regime and gain deeper insight into its dynamic behavior Combine empirical data with analytical models Iterate from What? to What if? Wind Speed max displacement

5 11/5/99LLNL ASCI III5 Approach: Framework for Parameterized Sensitivity Analsys framework performs analysis over numerous runs –statistical filtering –vary parameter of interest provides means of combining data to isolate effects of interest => ROBUSTNESS Well-developed Parallel Program Study Parameter Problem Data Set Generator Instrumentation Tools Machine Characterizers visualization, modeling Procs Comm. perf. Cache Scheduling...

6 11/5/99LLNL ASCI III6 Simplest Example: Performance( P ) NPB2.2 on NOW and Origin 2000 (250)

7 11/5/99LLNL ASCI III7 Where Time is Spent ( P ) Reveal basic Processor and network loading (vs P) Basis for model derivation - comm(P)

8 11/5/99LLNL ASCI III8 Where Time is Spent ( P ) - cont Reveal basic Processor and network loading (vs P)

9 11/5/99LLNL ASCI III9 Communication Volume ( P )

10 11/5/99LLNL ASCI III10 Communication Structure ( P )

11 11/5/99LLNL ASCI III11 Understanding Efficiency ( P, M ) Want to understand both what load the program is placing on the system and how well the system is handling that load => characterize the capability of the system via simple benchmarks (rather than advertised peaks) => combine with measured load for predictive model, & compare

12 11/5/99LLNL ASCI III12 Communication Efficiency

13 11/5/99LLNL ASCI III13 Tools => Improvements in Run Time Efficiency analysis (vs parameters) gives insight into where to improve the system or the program –use traditional profiling to see where is program the ‘bad stuff’ happens –or go back and tune the system to do better

14 11/5/99LLNL ASCI III14 Cache Behavior (P, $) Combining trace generation with simulation provides new structural insight Here: clear knees in program working set ($) these shift with machine size (P)

15 11/5/99LLNL ASCI III15 Cache Behavior (P, $) Clear knees in program working set ($) not affected by P

16 11/5/99LLNL ASCI III16 Sensitivity to Multiprogramming Parallel machines are increasingly general purpose –multiprogramming, at least interrupts and daemons Many ‘ideal’ programs very sensitive to perturbations –Msg Passing is loosely coupled, but implementation may not be!

17 11/5/99LLNL ASCI III17 Tools => Improvements in Run Time MPI implementation spin-waits on send till network available (or queue not full) or on recv- complete Should use two-phase spin-block

18 11/5/99LLNL ASCI III18 Sensitivity to Seemingly Unrelated Activity The mechanism for doing parameter studies is naturally extended to get statistically valid data through multiple samples at each point –tend to get crisp, fast results in the wee hours Extend study outside the app Example: two programs on big Origin – alonetogether on 64 P –8 processor IS run: 4.71 sec 6.18 –36 processor SP run:26.36 sec65.28

19 11/5/99LLNL ASCI III19 Repeatability The variance for the repeated runs is a key result for production codes - the real world is not ideal

20 11/5/99LLNL ASCI III20 Plans Integrate our instrumentation and analysis tools with ACTS TAU –port to UCB Millennium environment –experiment with ASCI platforms Refine and complete the automated sensitivity analysis framework Backend performance data storage –Pablo SPPF? Next Year –integrate performance model development, prediction


Download ppt "Tools for Engineering Analysis of High Performance Parallel Programs David Culler, Frederick Wong, Alan Mainwaring Computer Science Division U.C.Berkeley."

Similar presentations


Ads by Google