Presentation is loading. Please wait.

Presentation is loading. Please wait.

Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance.

Similar presentations


Presentation on theme: "Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance."— Presentation transcript:

1 Allen D. Malony malony@cs.uoregon.edu Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance Analysis in Complex Scientific Software: Experiences with the Uintah Computational Framework

2 Research Centre Juelich 2 April 9, 2002 Acknowledgements  Sameer Shende, Robert Bell, University of Oregon  Steven Parker, J. Davison de St.-Germain, and Alan Morris, University of Utah  Department of Energy (DOE), ASCI Academic Strategic Alliances Program (ASAP)  Center for the Simulation of Accidental Fires and Explosions (C-SAFE), ASCI/ASAP Level 1 center, University of Utah http://www.csafe.utah.edu http://www.csafe.utah.edu  Computational Science Institute, ASCI/ASAP Level 3 projects with LLNL / LANL, University of Oregon http://www.csi.uoregon.edu http://www.csi.uoregon.edu

3 Research Centre Juelich 3 April 9, 2002 Outline  Complexity and performance technology  TAU performance system  Scientific software engineering  C-SAFE and Uintah Computational Framework (UCF)  Goals and design  Challenges for performance technology integration  Role of performance mapping  Performance analysis integration in UCF  TAU performance mapping  X-PARE  Concluding remarks

4 Research Centre Juelich 4 April 9, 2002 Complexity - Where does it come from?  Complexity in computing system architecture  Diverse parallel system architectures  shared / distributed memory, cluster, hybrid, NOW, Grid, …  Sophisticated processor and memory architectures  Advanced network interface and switching architecture  Specialization of hardware components  Complexity in parallel software environment  Diverse parallel programming paradigms  Optimizing compilers and sophisticated runtime systems  Advanced numerical libraries and application frameworks  Hierarchical, multi-level software architectures  Multi-component, coupled simulation models

5 Research Centre Juelich 5 April 9, 2002 Complexity Determines Performance Requirements  Performance observability requirements  Multiple levels of software and hardware  Different types and detail of performance data  Alternative performance problem solving methods  Multiple targets of software and system application  Performance technology requirements  Broad scope of performance observation  Flexible and configurable mechanisms  Technology integration and extension  Cross-platform portability  Open, layered, and modular framework architecture

6 Research Centre Juelich 6 April 9, 2002 What is Parallel Performance Technology?  Performance instrumentation tools  Different program code levels  Different system levels  Performance measurement (observation) tools  Profiling and tracing of SW/HW performance events  Different software (SW) and hardware (HW) levels  Performance analysis tools  Performance data analysis and presentation  Online and offline tools  Performance experimentation and data management  Performance modeling and prediction tools

7 Research Centre Juelich 7 April 9, 2002 Complexity Challenges for Performance Tools  Computing system environment complexity  Observation integration and optimization  Access, accuracy, and granularity constraints  Diverse/specialized observation capabilities/technology  Restricted modes limit performance problem solving  Sophisticated software development environments  Programming paradigms and performance models  Performance data mapping to software abstractions  Uniformity of performance abstraction across platforms  Rich observation capabilities and flexible configuration  Common performance problem solving methods

8 Research Centre Juelich 8 April 9, 2002 General Problems (Performance Technology) How do we create robust and ubiquitous performance technology for the analysis and tuning of parallel and distributed software and systems in the presence of (evolving) complexity challenges? How do we apply performance technology effectively for the variety and diversity of performance problems that arise in the context of complex parallel and distributed computer systems? 

9 Research Centre Juelich 9 April 9, 2002 Computation Model for Performance Technology  How to address dual performance technology goals?  Robust capabilities + widely available methodologies  Contend with problems of system diversity  Flexible tool composition/configuration/integration  Approaches  Restrict computation types / performance problems  limited performance technology coverage  Base technology on abstract computation model  general architecture and software execution features  map features/methods to existing complex system types  develop capabilities that can adapt and be optimized

10 Research Centre Juelich 10 April 9, 2002 General Complex System Computation Model  Node: physically distinct shared memory machine  Message passing node interconnection network  Context: distinct virtual memory space within node  Thread: execution threads (user/system) in context memory Node VM space Context SMP Threads node memory … … Interconnection Network Inter-node message communication * * physical view model view

11 Research Centre Juelich 11 April 9, 2002 Framework for Performance Problem Solving  Model-based performance technology  Instrumentation / measurement / execution models  performance observability constraints  performance data types and events  Analysis / presentation model  performance data processing  performance views and model mapping  Integration model  performance tool component configuration / integration  Can a performance problem solving framework be designed based on a general complex system model and with a performance technology model approach?

12 Research Centre Juelich 12 April 9, 2002 TAU Performance System Framework  Tuning and Analysis Utilities  Performance system framework for scalable parallel and distributed high-performance computing  Targets a general complex system computation model  nodes / contexts / threads  Multi-level: system / software / parallelism  Measurement and analysis abstraction  Integrated toolkit for performance instrumentation, measurement, analysis, and visualization  Portable performance profiling/tracing facility  Open software approach

13 Research Centre Juelich 13 April 9, 2002 TAU Performance System Architecture EPILOG Paraver

14 Research Centre Juelich 14 April 9, 2002 Pprof Output (NAS Parallel Benchmark – LU)  Intel Quad PIII Xeon, RedHat, PGI F90  F90 + MPICH  Profile for: Node Context Thread  Application events and MPI events

15 Research Centre Juelich 15 April 9, 2002 jRacy (NAS Parallel Benchmark – LU) n: node c: context t: thread Global profiles Individual profile Routine profile across all nodes

16 Research Centre Juelich 16 April 9, 2002 TAU + PAPI (NAS Parallel Benchmark – LU )  Floating point operations  Replaces execution time  Only requires re-linking to different TAU library

17 Research Centre Juelich 17 April 9, 2002 TAU + Vampir (NAS Parallel Benchmark – LU) Timeline display Communications display Parallelism display Callgraph display

18 Research Centre Juelich 18 April 9, 2002 Scientific Software (Performance) Engineering  Modern scientific simulation software is complex (  )  Large development teams of diverse expertise  Simultaneous development on different system parts  Iterative, multi-stage, long-term software development  Need support for managing complex software process  Software engineering tools for revision control, automated testing, and bug tracking are commonplace  In contrast, tools for performance engineering are not  evaluation (measurement, analysis, benchmarking)  optimization (diagnosis, tracking, prediction, tuning)  Incorporate performance engineering methodology and support by flexible and robust performance tools

19 Research Centre Juelich 19 April 9, 2002 Utah ASCI/ASAP Level 1 Center (C-SAFE)  C-SAFE was established to build a problem-solving environment (PSE) for the numerical simulation of accidental fires and explosions  Combine fundamental chemistry and engineering physics  Integrate non-linear solvers, optimization, computational steering, visualization, and experimental data verification  Support very large-scale coupled simulations  Computer science problems:  Coupling multiple scientific simulation codes with different numerical and software properties  Software engineering across diverse expert teams  Achieving high performance on large-scale systems

20 Research Centre Juelich 20 April 9, 2002 Example C-SAFE Simulation Problems ∑ Heptane fire simulation Material stress simulation Typical C-SAFE simulation with a billion degrees of freedom and non-linear time dynamics

21 Research Centre Juelich 21 April 9, 2002 Uintah Problem Solving Environment  Enhanced SCIRun PSE  Pure dataflow to component-based  Shared memory to scalable multi-/mixed-mode parallelism  Interactive only to interactive plus standalone  Design and implement Uintah component architecture  Application programmers provide  description of computation (tasks and variables)  code to perform task on single “patch” (sub-region of space)  Components for scheduling, partitioning, load balance, …  Follow Common Component Architecture (CCA) model  Design and implement Uintah Computational Framework (UCF) on top of the component architecture

22 Research Centre Juelich 22 April 9, 2002 Uintah High-Level Component View

23 Research Centre Juelich 23 April 9, 2002 High Level Architecture C-SAFE Implicitly Connected to All Components UCF Data Control / Light Data Checkpointing Mixing Model Mixing Model Fluid Model Fluid Model Subgrid Model Subgrid Model Chemistry Database Controller Chemistry Database Controller Chemistry Databases Chemistry Databases High Energy Simulations High Energy Simulations Numerical Solvers Numerical Solvers Non-PSE Components Performance Analysis Performance Analysis Simulation Controller Simulation Controller Problem Specification Numerical Solvers Numerical Solvers MPM Material Properties Database Material Properties Database Blazer Database Visualization Data Manager Data Manager Post Processing And Analysis Post Processing And Analysis Parallel Services Parallel Services Resource Management Resource Management PSE Components Scheduler Uintah Parallel Component Architecture

24 Research Centre Juelich 24 April 9, 2002 Uintah Computational Framework  Execution model based on software (macro) dataflow  Exposes parallelism and hides data transport latency  Computations expressed a directed acyclic graphs of tasks  consumes input and produces output (input to future task)  input/outputs specified for each patch in a structured grid  Abstraction of global single-assignment memory  DataWarehouse  Directory mapping names to values (array structured)  Write value once then communicate to awaiting tasks  Task graph gets mapped to processing resources  Communications schedule approximates global optimal

25 Research Centre Juelich 25 April 9, 2002 Uintah Task Graph (Material Point Method)  Diagram of named tasks (ovals) and data (edges)  Imminent computation  Dataflow-constrained  MPM  Newtonian material point motion time step  Solid: values defined at material point (particle)  Dashed: values defined at vertex (grid)  Prime (‘): values updated during time step

26 Research Centre Juelich 26 April 9, 2002 Example Taskgraphs (MPM and Coupled)

27 Research Centre Juelich 27 April 9, 2002 Taskgraph Advantages  Accommodates flexible integration needs  Accommodates a wide range of unforeseen work loads  Accommodates a mix of static and dynamic load balance  Manage complexity of mixed-mode programming  Avoids unnecessary transport abstraction overheads  Simulation time/space coupling  Allows uniform abstraction for coordinating coupled models’ time and grid scales  Allows application components and framework infrastructure (e.g., scheduler) to evolve independently

28 Research Centre Juelich 28 April 9, 2002 Uintah PSE  UCF automatically sets up:  Domain decomposition  Inter-processor communication with aggregation/reduction  Parallel I/O  Checkpoint and restart  Performance measurement and analysis (stay tuned)  Software engineering  Coding standards  CVS (Commits: Y3 - 26.6 files/day, Y4 - 29.9 files/day)  Correctness regression testing with bugzilla bug tracking  Nightly build (parallel compiles)  170,000 lines of code (Fortran and C++ tasks supported)

29 Research Centre Juelich 29 April 9, 2002 Performance Technology Integration  Uintah presents challenges to performance integration  Software diversity and structure  UCF middleware, simulation code modules  component-based hierarchy  Portability objectives  cross-language and cross-platform  multi-parallelism: thread, message passing, mixed  Scalability objectives  High-level programming and execution abstractions  Requires flexible and robust performance technology  Requires support for performance mapping

30 Research Centre Juelich 30 April 9, 2002 Performance Analysis Objectives for Uintah  Micro tuning  Optimization of simulation code (task) kernels for maximum serial performance  Scalability tuning  Identification of parallel execution bottlenecks  overheads: scheduler, data warehouse, communication  load imbalance  Adjustment of task graph decomposition and scheduling  Performance tracking  Understand performance impacts of code modifications  Throughout course of software development  C-SAFE application and UCF software

31 Research Centre Juelich 31 April 9, 2002 Uintah Performance Engineering Approach  Contemporary performance methodology focuses on control flow (function) level measurement and analysis  C-SAFE application involves coupled-models with task- based parallelism and dataflow control constraints  Performance engineering on algorithmic (task) basis  Observe performance based on algorithm (task) semantics  Analyze task performance characteristics in relation to other simulation tasks and UCF components  scientific component developers can concentrate on performance improvement at algorithmic level  UCF developers can concentrate on bottlenecks not directly associated with simulation module code

32 Research Centre Juelich 32 April 9, 2002 Task execution time dominates (what task?) MPI communication overheads (where?) Task Execution in Uintah Parallel Scheduler  Profile methods and functions in scheduler and in MPI library Task execution time distribution  Need to map performance data!

33 Research Centre Juelich 33 April 9, 2002 Semantics-Based Performance Mapping  Associate performance measurements with high-level semantic abstractions  Need mapping support in the performance measurement system to assign data correctly

34 Research Centre Juelich 34 April 9, 2002 Hypothetical Mapping Example  Particles distributed on surfaces of a cube Particle* P[MAX]; /* Array of particles */ int GenerateParticles() { /* distribute particles over all faces of the cube */ for (int face=0, last=0; face < 6; face++){ /* particles on this face */ int particles_on_this_face = num(face); for (int i=last; i < particles_on_this_face; i++) { /* particle properties are a function of face */ P[i] =... f(face);... } last+= particles_on_this_face; }

35 Research Centre Juelich 35 April 9, 2002 Hypothetical Mapping Example (continued)  How much time (flops) spent processing face i particles?  What is the distribution of performance among faces?  How is this determined if execution is parallel? int ProcessParticle(Particle *p) { /* perform some computation on p */ } int main() { GenerateParticles(); /* create a list of particles */ for (int i = 0; i < N; i++) /* iterates over the list */ ProcessParticle(P[i]); }

36 Research Centre Juelich 36 April 9, 2002 Semantic Entities/Attributes/Associations (SEAA)  New dynamic mapping scheme (S. Shende, Ph.D. thesis)  Contrast with ParaMap (Miller and Irvin)  Entities defined at any level of abstraction  Attribute entity with semantic information  Entity-to-entity associations  Two association types (implemented in TAU API)  Embedded – extends data structure of associated object to store performance measurement entity  External – creates an external look-up table using address of object as the key to locate performance measurement entity

37 Research Centre Juelich 37 April 9, 2002 No Performance Mapping versus Mapping  Typical performance tools report performance with respect to routines  Does not provide support for mapping  Performance tools with SEAA mapping can observe performance with respect to scientist’s programming and problem abstractions TAU (no mapping) TAU (w/ mapping)

38 Research Centre Juelich 38 April 9, 2002 Uintah Task Performance Mapping  Uintah partitions individual particles across processing elements (processes or threads)  Simulation tasks in task graph work on particles  Tasks have domain-specific character in the computation  “interpolate particles to grid” in Material Point Method  Task instances generated for each partitioned particle set  Execution scheduled with respect to task dependencies  How to attribute execution time among different tasks?  Assign semantic name (task type) to a task instance  SerialMPM::interpolateParticleToGrid  Map TAU timer object to (abstract) task (semantic entity)  Look up timer object using task type (semantic attribute)  Further partition along different domain-specific axes

39 Research Centre Juelich 39 April 9, 2002 Task Performance Mapping Instrumentation void MPIScheduler::execute(const ProcessorGroup * pc, DataWarehouseP & old_dw, DataWarehouseP & dw ) {... TAU_MAPPING_CREATE( // name, type, key, groupname, tid task->getName(), "[MPIScheduler::execute()]", (TauGroup_t)(void*)task->getName(), task->getName(), 0);... TAU_MAPPING_OBJECT(tautimer) // create timer object // create external association (link) between timer and key TAU_MAPPING_LINK(tautimer,(TauGroup_t)(void*)task->getName());... // create profiler object using timer TAU_MAPPING_PROFILE_TIMER(doitprofiler, tautimer, 0) TAU_MAPPING_PROFILE_START(doitprofiler,0); task->doit(pc); TAU_MAPPING_PROFILE_STOP(0);... }

40 Research Centre Juelich 40 April 9, 2002 Task Performance Mapping (Profile) Performance mapping for different tasks Mapped task performance across processes

41 Research Centre Juelich 41 April 9, 2002 Task Performance Mapping (Trace) Work packet computation events colored by task type Distinct phases of computation can be identifed based on task

42 Research Centre Juelich 42 April 9, 2002 Task Performance Mapping (Trace - Zoom) Startup communication imbalance

43 Research Centre Juelich 43 April 9, 2002 Task Performance Mapping (Trace - Parallelism) Communication / load imbalance

44 Research Centre Juelich 44 April 9, 2002 Comparing Uintah Traces for Scalability Analysis 8 processes 32 processes

45 Research Centre Juelich 45 April 9, 2002 Performance Tracking and Reporting  Integrated performance measurement allows performance analysis throughout development lifetime  Applied performance engineering in software design and development (software engineering) process  Create “performance portfolio” from regular performance experimentation (couple with software testing)  Use performance knowledge in making key software design decision, prior to major development stages  Use performance benchmarking and regression testing to identify irregularities  Support automatic reporting of “performance bugs”  Enable cross-platform (cross-generation) evaluation

46 Research Centre Juelich 46 April 9, 2002 XPARE - eXPeriment Alerting and REporting  Experiment launcher automates measurement / analysis  Configuration and compilation of performance tools  Instrumentation control for Uintah experiment type  Execution of multiple performance experiments  Performance data collection, analysis, and storage  Integrated in Uintah software testing harness  Reporting system conducts performance regression tests  Apply performance difference thresholds (alert ruleset)  Alerts users via email if thresholds have been exceeded  Web alerting setup and full performance data reporting  Historical performance data analysis

47 Research Centre Juelich 47 April 9, 2002 XPARE System Architecture Experiment Launch Mail server Performance Database Performance Reporter Comparison Tool Regression Analyzer Alerting Setup Web server

48 Research Centre Juelich 48 April 9, 2002 Experiment Results Viewing Selection

49 Research Centre Juelich 49 April 9, 2002 Web-Based Experiment Reporting

50 Research Centre Juelich 50 April 9, 2002 Web-Based Experiment Reporting (continued)

51 Research Centre Juelich 51 April 9, 2002 Web-Based Experiment Reporting (continued)

52 Research Centre Juelich 52 April 9, 2002 Alerting Setup

53 Research Centre Juelich 53 April 9, 2002 Scaling Performance Optimizations (Past) Last year: initial “correct” scheduler Reduce communication by 10 x Reduce task graph overhead by 20 x ASCI Nirvana SGI Origin 2000 Los Alamos National Laboratory

54 Research Centre Juelich 54 April 9, 2002 Scalability to 2000 Processors (Current) ASCI Nirvana SGI Origin 2000 Los Alamos National Laboratory

55 Research Centre Juelich 55 April 9, 2002 Concluding Remarks  Complex systems pose challenging performance analysis problems that require robust methodologies and tools  New performance problems will arise  Instrumentation and measurement  Data analysis and presentation  Diagnosis and tuning  No one performance tool can address all concerns  Look towards an integration of performance technologies  Support to link technologies to create performance problem solving environments  Performance engineering methodology and tool integration with software design and development process

56 Research Centre Juelich 56 April 9, 2002 Integrated Performance Evaluation Environment

57 Research Centre Juelich 57 April 9, 2002 References  A. Malony and S. Shende, “Performance Technology for Complex Parallel and Distributed Systems,” Proc. 3rd Workshop on Parallel and Distributed Systems (DAPSYS), pp. 37-46, Aug. 2000.  S. Shende, A. Malony, and R. Ansell-Bell, “Instrumentation and Measurement Strategies for Flexible and Portable Empirical Performance Evaluation,” Proc. Int’l. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA), CSREA, pp. 1150-1156, July 2001.  S. Shende, “The Role of Instrumentation and Mapping in Performance Measurement,” Ph.D. Dissertation, Univ. of Oregon, Aug. 2001.  J. de St. Germain, A. Morris, S. Parker, A. Malony, and S. Shende, “Integrating Performance Analysis in the Uintah Software Development Cycle,” ISHPC 2002, Nara, Japan, May, 2002.


Download ppt "Allen D. Malony Department of Computer and Information Science Computational Science Institute University of Oregon Integrating Performance."

Similar presentations


Ads by Google