Download presentation
Presentation is loading. Please wait.
Published byAntony Gardner Modified over 9 years ago
1
Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida
2
2 Summary Give characteristics of existing tools to aid our design discussions Metrics (what is recorded, any hardware counters, etc) Profiled entities Visualizations Most information & some slides taken from tool evaluations Tools overviewed TAU Paradyn MPE/Jumpshot Dimemas/Paraver/MPITrace mpiP Dynaprof KOJAK Intel Cluster Tools (old Vampir/VampirTrace) Pablo MPICL/Paragraph
3
3 TAU Metrics recorded Two modes: profile, trace Profile mode Inclusive/exclusive time spent in functions Hardware counter information PAPI/PCL: L1/2/3 cache reads/writes/misses, TLB misses, cycles, integer/floating point/load/store/stalls executed, wall clock time, virtual time Other OS timers (gettimeofday, getrusage) MPI message size sent Trace mode Same as profile (minus hardware counters?) Message send time, message receive time, message size, message sender/recipient(?) Profiled entities Functions (automatic & dynamic), loops + regions (manual instrumentation)
4
4 TAU Visualizations Profile mode Text-based: pprof, shows a summary of profile information Graphical: racy (old), jracy a.k.a. paraprof Trace mode No built-in visualizations Can export to CUBE (see KOJAK), Jumpshot (see MPE), and Vampir format (see Intel Cluster Tools)
5
5 Paradyn Metrics recorded Number of CPUs, number of active threads, CPU and inclusive CPU time Function calls to and by Synchronization (# operations, wait time, inclusive wait time) Overall communication (# messages, bytes sent and received), collective communication (# messages, bytes sent and received), point-to-point communication (# messages, bytes sent and received) I/O (# operations, wait time, inclusive wait time, total bytes) All metrics recorded as “time histograms” (fixed-size data structure) Profiled entities Functions only (but includes functions linked to in existing libraries)
6
6 Paradyn Visualizations Time histograms Tables Barcharts “Terrains” (3-D histograms)
7
7 MPE/Jumpshot Metrics collected MPI message send time, receive time, size, message sender/recipient User-defined event entry & exit Profiled entities All MPI functions Functions or regions via manual instrumentation and custom events Visualization Jumpshot: timeline view (space-time diagram overlaid on Gantt chart), histogram
8
8 Dimemas/Paraver/MPITrace Metrics recorded (MPITrace) All MPI functions Hardware counters (2 from the following two lists, uses PAPI) Counter 1 Cycles Issued instructions, loads, stores, store conditionals Failed store conditionals Decoded branches Quadwords written back from scache(?) Correctible scache data array errors(?) Primary/secondary I-cache misses Instructions mispredicted from scache way prediction table(?) External interventions (cache coherency?) External invalidations (cache coherency?) Graduated instructions Counter 2 Cycles Graduated instructions, loads, stores, store conditionals, floating point instructions TLB misses Mispredicted branches Primary/secondary data cache miss rates Data mispredictions from scache way prediction table(?) External intervention/invalidation (cache coherency?) Store/prefetch exclusive to clean/shared block
9
9 Dimemas/Paraver/MPITrace Profiled entities (MPITrace) All MPI functions (message start time, message end time, message size, message recipient/sender) User regions/functions via manual instrumentation Visualization Timeline display (like Jumpshot) Shows Gantt chart and messages Also can overlay hardware counter information Clicking on timeline brings up a text listing of events near where you clicked 1D/2D analysis modules
10
10 mpiP Metrics collected Start time, end time, message size for each MPI call Profiled entities MPI function calls + PMPI wrapper Visualization Text-based output, with graphical browser that displays statistics in-line with source Displayed information: Overall time (%) for each MPI node Top 20 callsites for time (MPI%, App%, variance) Top 20 callsites for message size (MPI%, App%, variance) Min/max/average/MPI%/App% time spent at each call site Min/max/average/sum of message sizes at each call site App time = wall clock time between MPI_Init and MPI_Finalize MPI time = all time consumed by MPI functions App% = % of metric in relation to overall app time MPI% = % of metric in relation to overall MPI time
11
11 Dynaprof Metrics collected Wall clock time or PAPI metric for each profiled entity Collects inclusive, exclusive, and 1-level call tree % information Profiled entities Functions (dynamic instrumentation) Visualizations Simple text-based Simple GUI (shows same info as text-based)
12
12 KOJAK Metrics collected MPI: message start time, receive time, size, message sender/recipient Manual instrumentation: start and stop times 1 PAPI metric / run (only FLOPS and L1 data misses visualized) Profiled entities MPI calls (MPI wrapper library) Function calls (automatic instrumentation, only available on a few platforms) Regions and function calls via manual instrumentation Visualizations Can export traces to Vampir trace format (see ICT) Shows profile and analyzed data via CUBE
13
13 Intel Cluster Tools (ICT) Metrics collected MPI functions: start time, end time, message size, message sender/recipient User-defined events: counter, start & end times Code location for source-code correlation Instrumented entities MPI functions via wrapper library User functions via binary instrumentation(?) User functions & regions via manual instrumentation Visualizations Different types: timelines, statistics & counter info
14
14 Pablo Metrics collected Time inclusive/exclusive of a function Hardware counters via PAPI Summary metrics computed from timing info Min/max/avg/stdev/count Profiled entities Functions, function calls, and outer loops All selected via GUI Visualizations Displays derived summary metrics color-coded and inline with source code
15
15 MPICL/Paragraph Metrics collected MPI functions: start time, end time, message size, message sender/recipient Manual instrumentation: start time, end time, “work” done (up to user to pass this in) Profiled entities MPI function calls via PMPI interface User functions/regions via manual instrumentation Visualizations Many, separated into 4 categories: utilization, communication, task, “other”
16
16 ParaGraph visualizations Utilization visualizations Display rough estimate of processor utilization Utilization broken down into 3 states: Idle – When program is blocked waiting for a communication operation (or it has stopped execution) Overhead – When a program is performing communication but is not blocked (time spent within MPI library) Busy – if execution part of program other than communication “Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy Communication visualizations Display different aspects of communication Frequency, volume, overall pattern, etc. “Distance” computed by setting topology in options menu Task visualizations Display information about when processors start & stop tasks Requires manually instrumented code to identify when processors start/stop tasks Other visualizations Miscellaneous things
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.