The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321 Brühl, Germany info@pallas.com http://www.pallas.com SCICOMP 2000 Tutorial, San Diego

© Pallas GmbH Why performance tools? CPUs and interconnects are getting faster all the time Compilers are improving “Abundance of computing power” Shouldn’t it be sufficient to just write an application and let the system do the rest?

© Pallas GmbH Why performance tools? In reality, there remain severe performance bottlenecks –slow memory access (instructions and data) –cache consistency effects –starvation of instruction units –contention of interconnection systems –adverse interaction with schedulers

© Pallas GmbH Why performance tools? The application programmer does the rest –excessive sequential sections –bad load balance –non–optimized communication patterns –excessive synchronization Performance analysis tools can –help to diagnose system–level performance problems –help to identify user–level performance bottlenecks –assist the users in improving their applications

© Pallas GmbH Performance aspects Sequential performance –Optimize memory accesses –Optimize instruction sequences Parallel performance –Minimize sequential sections and replicated work –Optimize load balance and communication –Reduce synchronization Parallel correctness –Analyze results –Analyze execution traces –Compare parallel vs. sequential code

© Pallas GmbH Kinds of performance tools Sequential performance –Profiling tools –Compiler– and hardware–specific Parallel performance –Static code analysis –Automatic parallelisation –Counter–based profiling tools –Event tracing tools (analysis, prediction) Parallel correctness –Static code analysis tools –Trace–based verification

© Pallas GmbH Vendor–specific vs. portable tools Vendor–specific tools –Superior support for platform specifics –Proprietary data formats, API’s and user interfaces –Very useful for sequential optimizations, vendor–specific parallel models Portable tools –Concentrate on (portable) programming model –Open data formats and API’s –Useful for parallel optimizations, portable parallel models Examples –Guide (counter–based profiling) –Vampir, Dimemas, jumpshot (event trace analysis) –Assure (trace–based code verification)

© Pallas GmbH Performance tools – goals? Holy grail –Automatic parallelisation and optimization –One code version for sequential and parallel –One code version for all platforms –Automatic code verification –Automatic performance verification –Automatic detection of performance problems –Integration of performance analysis and parallelisation

© Pallas GmbH Performance tools – reality? Open problems –Limited capabilities of automatic parallelisation –Performance portability portable sequential optimizations portable parallel optimizations –Code version maintenance –Verification of MPI applications –Scaling to large, hierarchical systems

© Pallas GmbH MPI performance specifics Static SPMD–model, weak synchronization No sequential sections – work is replicated or sequential communication patterns are used Data distribution defined by communication Work distribution determined by data distribution Explicit communication and synchronization Optimization areas –Load balancing (tune data distribution) –Parallelize replicated work –Tune communication patterns –Reduce synchronization

© Pallas GmbH Event–based MPI Analysis Record trace of application execution –Calls to MPI and user routines –MPI communication events –Source locations –Values of performance registers or program variables From a trace, a performance analysis tool can show –Protocol of execution over time –Statistics for MPI routine execution –Statistics for communication –Dynamic calling tree Important advantage –Focus on any phase of the execution

© Pallas GmbH Vampirtrace details Vampirtrace ™ –Instrumentation library producing traces for Vampir and Dimemas –Supports MPI–1 (incl. collective operations) and MPI–I/O –Exploits MPI profiling interface –Works with vendors MPI implementations –API for user–level instrumentation –Capability to filter for event subsets Developed, productized and marketed by Pallas Available for IBM SP, PE 3.x

© Pallas GmbH Vampir details Vampir ™ –Event–trace visualization tool –Analyzes MPI and user routines –Analyzes point–to–point, collective and MPI–IO operations –Focus on arbitrary execution phases –Execution and communication statistics –Filter processes, messages, and user/MPI routines Jointly developed by TU Dresden and Pallas Productized and marketed by Pallas Available for IBM RS6000, AIX 4.2/AIX 4.3

© Pallas GmbH Dimemas details Dimemas –Event–based performance prediction tool –Parameterized machine model CPU performance Communication and network performance –Predicts performance on modeled platform –What–if analysis determined influence of parameters Jointly developed by UPC Barcelona and Pallas Productized and marketed by Pallas Available for IBM RS6000, AIX 4.2/AIX 4.3

© Pallas GmbH Vampirtrace options Filter events for –Processes –Time interval or record count –Event type Instrumentation (user routines, counters) –portable: by hand –some platforms (Fujitsu, Hitachi, NEC): by compiler Limit memory use –Spill data to disk, store all events –Only store n first/last events

© Pallas GmbH Vampir main window Vampir 2.5 main window Tracefile loading can be interrupted at any time Tracefile loading can be resumed Tracefile can be loaded starting at a specified time offset Tracefile can be re–written

© Pallas GmbH Vampir state model User specifies activities and symbol grouping Look at all/any activities or all symbols Summary chart Calculation Tracing MPI MPI_Send MPI_Recv MPI_Wait ssor exchange Activities Symbols

© Pallas GmbH Collective operations statistics Statistics for collective operations: –operation counts, Bytes sent/received –transmission rates Filter for collective operation MPI_Gather only All collective operations

© Pallas GmbH Focus on a time interval Chose a time interval by zooming with the timeline display Enable the Show Timeline Portion option All statistics windows are updated for the selected interval Use to focus on one application phase or iteration!

© Pallas GmbH Compare traces Compare profiling information –To check load balance (between processes) –To evaluate scalability (different runs) –To look at optimization effects (different code versions) Compare processes 6 and 19 Comparison by routine

© Pallas GmbH Vampir/Vampirtrace roadmap Ongoing developments –Scalability enhancements –Functionality enhancements –Instrumentation enhancements Will be first available commercially on NEC and Compaq platforms –Earth simulator –ASCI machines PathForward developments for ASCI machines

© Pallas GmbH Scalability challenges Scalability in processor count –ASCI–class machines have 1000s of processors –High–end systems have 100s of processors –Applications use most of them Scalability in time –Need to analyze actual production runs (hours/days) Scalability in detail –Record and analyze system–specific performance data –Support for threaded and hybrid models

© Pallas GmbH Scalability problems Counter–based profiling tools are basically OK –Severely limited in the level of detail –Can’t focus into parts of application run Event–based tools have problems –Event traces get really large –Display tools use huge amounts of memory –Many displays do not scale Example: Vampir tracefiles for NAS NPB–LU –128 processes: 3.000.000 records(120 Mbyte) –256 processes: 15.000.000 records(600 Mbyte) –512 processes: 150.000.000 records(6 Gbyte)

© Pallas GmbH Threaded programming models Enhance Vampir to display –Thread fork/join –Thread synchronization –Show a timeline per thread / aggregate threads into single timeline –Display subroutine/code block execution for each thread Create instrumentation library for thread packages Integrate instrumentation capability into OpenMP systems

© Pallas GmbH Cluster timeline display Display node–level information Show communication volume within nodes Show communication between nodes as usual Allow to expand nodes into processes There may be more than two hierarchy levels...

© Pallas GmbH Structured tracefile format Subdivide the tracefile into frames –Time intervals, thread/process/node subsets Put frame data –All in one file (as today) –In multiple files (one per frame...) –On a parallel filesystem (exploit parallelism) Frame index file holds –Location of frame start/end –Frame statistic data for immediate display –“Frame thumbnail”

© Pallas GmbH Structured tracefile format Vampir loads the frame index Displays immediately available –Global profiling/communication statistics –By–frame profiling/communication statistics –Thumbnail timeline User gets overview of application run –Can load particular frame data –Can navigate between frames User can refine instrumentation/tracing –Get detailed trace of interesting frames

© Pallas GmbH Dynamic tracing control What can be controlled –Definition of frames –Data to be recorded per frame Control methods –Instrumentation with Vampirtrace API –Binary instrumentation (atom) or use of a debugger –Configuration file –Interactive control agent (debugger) Tracing the right data is an iterative process!

© Pallas GmbH Cluster timeline display For very large systems, still can’t look at complete system (too many nodes) Display “interesting” nodes only –Regarding communication volume/delays –Regarding load imbalance –Regarding execution times of particular code modules

© Pallas GmbH Scalable Vampir structure Scalable user–interface Scalable internals Data Control Vampir SC User Interaction Trace Data Processing Trace Data I/O Data Control Vampir DC User Interaction Trace Data Analysis Display Handling Structured Trace Data runs on WS runs on parallel system may exploit parallel FS

The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

Similar presentations

Presentation on theme: "The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321.

Similar presentations

Presentation on theme: "The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen und Systeme mbH Pallas GmbH Hermülheimer Straße 10 D-50321."— Presentation transcript:

Similar presentations

About project

Feedback