Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.

Similar presentations


Presentation on theme: "Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida."— Presentation transcript:

1 Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida

2 2 Summary Give characteristics of existing tools to aid our design discussions  Metrics (what is recorded, any hardware counters, etc)  Profiled entities  Visualizations Most information & some slides taken from tool evaluations Tools overviewed  TAU  Paradyn  MPE/Jumpshot  Dimemas/Paraver/MPITrace  mpiP  Dynaprof  KOJAK  Intel Cluster Tools (old Vampir/VampirTrace)  Pablo  MPICL/Paragraph

3 3 TAU Metrics recorded  Two modes: profile, trace Profile mode  Inclusive/exclusive time spent in functions  Hardware counter information  PAPI/PCL: L1/2/3 cache reads/writes/misses, TLB misses, cycles, integer/floating point/load/store/stalls executed, wall clock time, virtual time  Other OS timers (gettimeofday, getrusage)  MPI message size sent Trace mode  Same as profile (minus hardware counters?)  Message send time, message receive time, message size, message sender/recipient(?) Profiled entities  Functions (automatic & dynamic), loops + regions (manual instrumentation)

4 4 TAU Visualizations  Profile mode Text-based: pprof, shows a summary of profile information Graphical: racy (old), jracy a.k.a. paraprof  Trace mode No built-in visualizations Can export to CUBE (see KOJAK), Jumpshot (see MPE), and Vampir format (see Intel Cluster Tools)

5 5 Paradyn Metrics recorded  Number of CPUs, number of active threads, CPU and inclusive CPU time  Function calls to and by  Synchronization (# operations, wait time, inclusive wait time)  Overall communication (# messages, bytes sent and received), collective communication (# messages, bytes sent and received), point-to-point communication (# messages, bytes sent and received)  I/O (# operations, wait time, inclusive wait time, total bytes)  All metrics recorded as “time histograms” (fixed-size data structure) Profiled entities  Functions only (but includes functions linked to in existing libraries)

6 6 Paradyn Visualizations  Time histograms  Tables  Barcharts  “Terrains” (3-D histograms)

7 7 MPE/Jumpshot Metrics collected  MPI message send time, receive time, size, message sender/recipient  User-defined event entry & exit Profiled entities  All MPI functions  Functions or regions via manual instrumentation and custom events Visualization  Jumpshot: timeline view (space-time diagram overlaid on Gantt chart), histogram

8 8 Dimemas/Paraver/MPITrace Metrics recorded (MPITrace)  All MPI functions  Hardware counters (2 from the following two lists, uses PAPI) Counter 1  Cycles  Issued instructions, loads, stores, store conditionals  Failed store conditionals  Decoded branches  Quadwords written back from scache(?)  Correctible scache data array errors(?)  Primary/secondary I-cache misses  Instructions mispredicted from scache way prediction table(?)  External interventions (cache coherency?)  External invalidations (cache coherency?)  Graduated instructions  Counter 2 Cycles Graduated instructions, loads, stores, store conditionals, floating point instructions TLB misses Mispredicted branches Primary/secondary data cache miss rates Data mispredictions from scache way prediction table(?) External intervention/invalidation (cache coherency?) Store/prefetch exclusive to clean/shared block

9 9 Dimemas/Paraver/MPITrace Profiled entities (MPITrace)  All MPI functions (message start time, message end time, message size, message recipient/sender)  User regions/functions via manual instrumentation Visualization  Timeline display (like Jumpshot) Shows Gantt chart and messages Also can overlay hardware counter information  Clicking on timeline brings up a text listing of events near where you clicked  1D/2D analysis modules

10 10 mpiP Metrics collected  Start time, end time, message size for each MPI call Profiled entities  MPI function calls + PMPI wrapper Visualization  Text-based output, with graphical browser that displays statistics in-line with source  Displayed information: Overall time (%) for each MPI node Top 20 callsites for time (MPI%, App%, variance) Top 20 callsites for message size (MPI%, App%, variance) Min/max/average/MPI%/App% time spent at each call site Min/max/average/sum of message sizes at each call site  App time = wall clock time between MPI_Init and MPI_Finalize  MPI time = all time consumed by MPI functions  App% = % of metric in relation to overall app time  MPI% = % of metric in relation to overall MPI time

11 11 Dynaprof Metrics collected  Wall clock time or PAPI metric for each profiled entity  Collects inclusive, exclusive, and 1-level call tree % information Profiled entities  Functions (dynamic instrumentation) Visualizations  Simple text-based  Simple GUI (shows same info as text-based)

12 12 KOJAK Metrics collected  MPI: message start time, receive time, size, message sender/recipient  Manual instrumentation: start and stop times  1 PAPI metric / run (only FLOPS and L1 data misses visualized) Profiled entities  MPI calls (MPI wrapper library)  Function calls (automatic instrumentation, only available on a few platforms)  Regions and function calls via manual instrumentation Visualizations  Can export traces to Vampir trace format (see ICT)  Shows profile and analyzed data via CUBE

13 13 Intel Cluster Tools (ICT) Metrics collected  MPI functions: start time, end time, message size, message sender/recipient  User-defined events: counter, start & end times  Code location for source-code correlation Instrumented entities  MPI functions via wrapper library  User functions via binary instrumentation(?)  User functions & regions via manual instrumentation Visualizations  Different types: timelines, statistics & counter info

14 14 Pablo Metrics collected  Time inclusive/exclusive of a function  Hardware counters via PAPI  Summary metrics computed from timing info Min/max/avg/stdev/count Profiled entities  Functions, function calls, and outer loops  All selected via GUI Visualizations  Displays derived summary metrics color-coded and inline with source code

15 15 MPICL/Paragraph Metrics collected  MPI functions: start time, end time, message size, message sender/recipient  Manual instrumentation: start time, end time, “work” done (up to user to pass this in) Profiled entities  MPI function calls via PMPI interface  User functions/regions via manual instrumentation Visualizations  Many, separated into 4 categories: utilization, communication, task, “other”

16 16 ParaGraph visualizations Utilization visualizations  Display rough estimate of processor utilization  Utilization broken down into 3 states: Idle – When program is blocked waiting for a communication operation (or it has stopped execution) Overhead – When a program is performing communication but is not blocked (time spent within MPI library) Busy – if execution part of program other than communication  “Busy” doesn’t necessarily mean useful work is being done since it assumes (not communication) := busy Communication visualizations  Display different aspects of communication  Frequency, volume, overall pattern, etc.  “Distance” computed by setting topology in options menu Task visualizations  Display information about when processors start & stop tasks  Requires manually instrumented code to identify when processors start/stop tasks Other visualizations  Miscellaneous things


Download ppt "Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida."

Similar presentations


Ads by Google