Presentation is loading. Please wait.

Presentation is loading. Please wait.

Productive Performance Tools for Heterogeneous Parallel Computing

Similar presentations


Presentation on theme: "Productive Performance Tools for Heterogeneous Parallel Computing"— Presentation transcript:

1 Productive Performance Tools for Heterogeneous Parallel Computing
Allen D. Malony Department of Computer and Information Science University of Oregon Shigeo Fukuda, Lunch with a Helmet On

2 Parallel Performance Engineering and Productivity
Scalable, optimized applications deliver HPC promise Optimization through performance engineering process Understand performance complexity and inefficiencies Tune application to run optimally at scale How to make the process more effective and productive? What performance technology should be used? Performance technology part of larger environment Programmability, reusability, portability, robustness Application development and optimization productivity Process, performance technology, and use will change as parallel systems evolve and grow to extreme scale Goal is to deliver effective performance with high productivity value now and in the future

3 Heterogeneous Parallel Systems and Performance
What does it mean to be heterogeneous? Diverse in character or content (New Oxford America) Not homogeneous (Anonymous) Diversity in what? Hardware processors/cores, memory, interconnection, … different in computing elements and how they are used Software how heterogeneous hardware is programmed different software models, libraries, frameworks, … Diversity how? when? space? time? Variation is operation, interaction, concurrency, ...

4 Why Do We Care? Heterogeneity has been around for a long time
Different programmable components in computer systems Long history of specialized hardware (e.g., ASPs) Heterogeneous (computing) technology more accessible Multicore processors (e.g., 4-core, 6-core, 8-core, ...) Manycore accelerators (e.g., Tesla, Fermi, Larrabee) High-performance processing engines (e.g., IBM Cell BE) Performance is the main driving concern Heterogeneity is arguably the only path to extreme scale Heterogeneous software technology required Greater performance enables more powerful software Will give rise to more sophisticated software environments

5 Implications for Productive Performance Tools
Current status quo is comfortable Mostly homogeneous parallel systems and software Shared-memory multithreading – OpenMP Distributed-memory message passing – MPI Parallel computational models are relatively stable (simple) Corresponding performance models are relatively tractable Parallel performance tools are just keeping up Heterogeneity creates richer computational potential Results in greater performance diversity and complexity Heterogeneous systems will utilize more sophisticated programming and runtime environments Performance tools have to support richer computation models and more versatile performance perspectives

6 Heterogeneous Performance Views
Want to create performance views that capture heterogeneous concurrency and execution behavior Reflect execution logic beyond standard actions Capture performance semantics at multiple levels Allow for compatible perspectives that do not conflict Performance views may have to work with reduced data Consider “host” path and “external” external paths Want to capture performance for all execution paths External execution may be more difficult to measure What perspective does the host have of external entities? Determines the semantics of the measurement data Determines assumptions about entity behavior

7 Task-based Performance View
Consider the “task” abstraction for accelerator scenario Host regards external execution as a task Tasks operate concurrently with respect to the host Requires support for tracking asynchronous execution Host creates measurement perspective for external task Maintains local and remote performance data Tasks may have limited measurement support May depend on host for performance data I/O Performance data might be received from external task How to create a view of external performance Host GPU

8 Simple Master-Worker Scenario
Master sends tasks to N workers Workers report back their performance to master Done for each piece of work In general, the external performance data can be arbitrary single time value more complete representation of external performance Build a worker performance perspective in the master Each work task will appear as a separate “thread”

9 CPU – GPU Execution Scenarios

10 Heterogeneous Complexity Issues
Asynchronous execution (concurrency) Memory transfer and memory sharing Interactions between heterogeneous components Interference between heterogeneous components Different programming languages/libraries/tools Availability of performance information Performance measurement and analysis models ...

11 Trace of LINPACK with GPUs (TAUcuda, Jumpshot)

12 Trace of LINPACK with GPUs (TAUcuda, Vampir)

13 NAMD with CUDA NAMD is a molecular dynamics application (Charm++)
NAMD has been accelerated with CUDA TAU integrated in Charm++ Apply TAUcuda to NAMD Four processes with one Tesla GPU for each

14 NAMD with CUDA (4 processes)
GPU kernel

15 Scaling NAMD with CUDA good GPU performance

16 HMPP Workbench with TAUcuda
Host process Compute kernel Transfer kernel


Download ppt "Productive Performance Tools for Heterogeneous Parallel Computing"

Similar presentations


Ads by Google