Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://www.cs.utk.edu/~dongarra/

Similar presentations


Presentation on theme: "1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://www.cs.utk.edu/~dongarra/"— Presentation transcript:

1 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://www.cs.utk.edu/~dongarra/

2 2 Four Components for the University of Tennessee’s  Performance Capturing Tools  PAPI  Self adapting numerical software  Automatic performance enhancement  SANS/AEOS/ATLAS  Performance repository for apps, kernels, machines, etc  NETLIB, Repository in a Box (RIB)  Modeling, predictability

3 3 Tools for Performance Evaluation  Timing and performance evaluation has been an art  Resolution of the clock  Issues about cache effects  Different systems  Can be cumbersome and inefficient with traditional tools  Situation about to change  Today’s processors have internal counters

4 4 Performance Counters  Almost all high performance processors include hardware performance counters.  Some are easy to access, others not available to users.  On most platforms the APIs, if they exist, are not appropriate for the end user or well documented.  Existing performance counter APIs  Compaq Alpha EV 6 & 6/7  SGI MIPS R10000  IBM Power Series  CRAY T3E  Sun Solaris  Pentium Linux and Windows  IA-64  HP-PA RISC  Hitachi  Fujitsu  NEC

5 5 Overview of PAPI  Performance Application Programming Interface  The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors

6 6 Performance Data from PAPI  Execution Rate (MIPS, Flop/s)  Bandwidth Utilization  Main Memory  L2 cache  L1 cache  Cache Miss Statistics: Icache, Dcache, and L2 cache  TLB misses  Mispredicted Branches  Instruction Mix (FP, branch, LD/ST, other)  Load/store instruction issue rate

7 7 Implementation  Counters exist as a small set of registers that count events.  PAPI provides three interfaces to the underlying counter hardware: 1.The low level interface manages hardware events in user defined groups called EventSet. 2.The high level interface simply provides the ability to start, stop and read the counters for a specified list of events. 3.Graphical tools to visualize information.

8 8 PAPI - Supported Processors  Intel Pentium,Pro,II,III,4  Linux 2.4, 2.2, 2.0 and perf kernel patch  IBM Power 3,604,604e  For AIX 4.3 and pmtoolkit (in 4.3.4 available)  (laderose@us.ibm.com)laderose@us.ibm.com  Sun UltraSparc I, II, & III  Solaris 2.8  MIPS R10K, R12K  AMD Athlon  Linux 2.4 and perf kernel patch  Cray T3E, SV1, SV2  Soon: Windows 2K, Compaq Alpha EV6 & 67 and Intel IA-64

9 9 Go To Demo

10 10 PAPI’s Parallel Interface

11 11 PAPI Development  Extensions to PAPI to support collection and analysis of hardware performance counter data in the context of shared and distributed memory parallel programs  Allowing for straightforward instrumentation of multithreaded and multiprocessor applications.  Tools will include graphical tools extended with dynamic instrumentation capabilities.  Framework for using Dyninst with parallel programs, the Free Probe Class Server (FPCS) and IBM’s Dynamic Probe Class Library (DPCL)  Port PAPI to Compaq Alpha and HP machines  Summary information on problem spots within applications  Integration with other tools, SvPablo, Dyninst, etc  Help with setting up PAPI at various sites.

12 12 Repository Development  Repository of Tools and Data on Performance Evaluation  A network-based catalog that will serve as a “road map” to important Performance Evaluation enabling technologies  A methodology for evaluation and measurement of the success of the tools.  SciDAC outreach: Start a community effort for the collection and dissemination of performance data

13 13 Self-Adapting Numerical Software (SANS)  Today’s processors can achieve high-performance, but this requires extensive machine-specific hand tuning.  Simple operations like Matrix-Vector ops require many man-hours / platform Software lags far behind hardware introduction Only done if financial incentive is there  Compilers not up to optimization challenge  Hardware, compilers, and software have a large design space w/many parameters  Blocking sizes, loop nesting permutations, loop unrolling depths, software pipelining strategies, register allocations, and instruction schedules.  Complicated interactions with the increasingly sophisticated micro-architectures of new microprocessors.  Need for quick/dynamic deployment of optimized routines.  ATLAS - Automatic Tuned Linear Algebra Software

14 14 SANS Extensions  BLAS  Sparse matrix operations  Message passing  Algorithm selection at a higher level

15 15 Repository In a Box (RIB)  Metadata objects are stored in repositories.  A repository automatically generates a web site for displaying customizable views of its metadata - search, browse, join, etc.  Metadata objects are also made available to network applications via the RIB API.

16 16 Repository Interoperation My Repository Our Virtual Repository Metadata objects Your Repository Metadata objects HTML Catalog

17 17 Tools Integration  PAPI, Dyninst, SVPablo  Intelligent Adaptation  Rose and SANS (ATLAS)  Repository-in-a-Box effort provides a toolkit for building and maintaining meta-data repositories

18 18 Interaction with Other Efforts  SciDAC - TOPS  David Keyes, ICASE/ODU/LLNL  SciDAC - Astrophysics  Tony Mezzacappa, ORNL  DOE - Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis  Bart Miller, U Wisc  Jeff H., U of Maryland

19 19 High-End Computer System Performance: Science and Engineering  Activities for UTennessee  Performance Capturing Tools  PAPI  Automatic performance enhancement  SANS/AEOS/ATLAS  Performance repository for apps, kernels, machines, etc  NETLIB, RIB  Modeling, predictability


Download ppt "1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseehttp://www.cs.utk.edu/~dongarra/"

Similar presentations


Ads by Google