Presentation is loading. Please wait.

Presentation is loading. Please wait.

PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance.

Similar presentations


Presentation on theme: "PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance."— Presentation transcript:

1 PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance

2 Presentation overview Project objectives PAPI explanation Blue Gene L explanation Current state of research

3 Project objectives Upgrade PAPI on BG/L Provide interface for network counters Allow Lawrence Livermore National Lab users to also have access to PAPI Using network counters to place tasks optimally on BG/L

4 PAPI – Intro Courtesy of http://icl.cs.utk.edu/papi/

5 PAPI – Intro PAPI useful to profile your own programs. Many tools based on PAPI PapiEx – Command line measurement tool PerfSuite – Aggregate measurement and statistical profiling package and API HPCToolkit – Statistical profiling package Many more!

6 PAPI – Supported platforms IBM – POWER3, 604, 604e, POWER4 Cray T3E, Cray X1 AMD – Athlon, Opteron Intel – P1 to P4, Itanium I and II UltraSparc I, II & III MIPS R10K, R12K, R14K Alpha

7 PAPI – Generic Interface Call sequence for generic interface PAPI_library_init – Initialize memory for PAPI’s data structures PAPI_create_eventset – Create an empty list of events PAPI_add_event – Add events to be counted PAPI_start – Begin counting all events within the specified eventset PAPI_stop – Stop all counters and read their current values

8 PAPI – Events: Presets Presets – list of predefined events implemented on all systems where they can be supported Not all presets available on every architecture (e.g. BG/L has no cache lower than L3 – thus L1 cache hit preset not applicable) Native events form the basic building blocks for PAPI presets

9 PAPI – Events: Presets Courtesy of http://icl.cs.utk.edu/papi/

10 PAPI – Events: Native In addition to the predefined PAPI preset events, the PAPI library also exposes a majority of the events native to each platform Can be added to eventsets in the same manner as presets

11 PAPI – Events: Native

12 PAPI – Internals Array of eventsets is the main portion

13 PAPI – Other features Multiplexing – If there are not enough hardware counters Thread safe – Profiling is thread safe Overflow detection – Hardware counters have limited space

14 PAPI – PAPI2 vs PAPI3 PAPI 3 significantly reduced overheads for starting, stopping and reading the counters Courtesy of http://icl.cs.utk.edu/papi/

15 PAPI – PAPI2 vs PAPI3 Better native event support in PAPI3 Better thread support in PAPI3 Overflow and Profiling enhancements in PAPI3 Myriad bug fixes and code cleanup in PAPI3

16 PAPI – PAPI2 vs PAPI3 Overlapping eventsets supported in PAPI2 Minor changes in the API – mostly dereferencing variables

17 Blue Gene L – Intro 65,536 nodes connected in 64 x 32 x 32 3D torus Nodes made up of PowerPC 440 embedded processors Smaller than most super computers Consumes less power

18 Blue Gene L

19 Blue Gene L - Networks 3D torus network (node to node) Tree network (broadcasts)

20 Blue Gene L – HW counters 48 universal performance counters 4 floating point unit counters Counters 32 bit – must use virtual counters to prevent overflow

21 Blue Gene L – HW counters

22 Research – Overall goals Network hardware counters new Use network counters to determine traffic between tasks Try to optimize placement of tasks to minimize communication latency Given counts and distances: cost = counts * distance. Minimize over all nodes

23 Research – Counting First goal to determine what is being counted

24 Research – Networks For each MPI call – determine which network counters are being used Tree is supposed to be for broadcasts Torus is supposed to be for point to point communication Ambiguities in the specification

25 Research – Future decisions How to profile a target application Manually insert PAPI instrumentation: a lot of work Instrument binaries with counting code What information to store All counts on each node: a lot of data Sample of all nodes: not as accurate (what if the tasks behave / communicate differently?

26 Research – Future decisions How to use collected information Profile an application to obtain counter feedback to determine optimized static task layout Dynamically migrate tasks in response to counters


Download ppt "PAPI 3.0.8.1 on Blue Gene L Using network performance counters to layout tasks for improved performance."

Similar presentations


Ads by Google