Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Adaptive Algorithms through Component-Based Software Engineering Boyana Norris Argonne National Laboratory RWTH.

Similar presentations


Presentation on theme: "Enabling Adaptive Algorithms through Component-Based Software Engineering Boyana Norris Argonne National Laboratory RWTH."— Presentation transcript:

1 Enabling Adaptive Algorithms through Component-Based Software Engineering Boyana Norris Argonne National Laboratory http://www.mcs.anl.gov/~norris RWTH Aachen, July 10, 2007

2 2 Outline Motivation Goals and assumptions A brief introduction to components for high-performance computing Approach: A quality of service architecture Application examples Summary and future work

3 3 Motivation Overall simulation times for nonlinear PDE-based models often depend to a large extent on the robustness and efficiency of sparse linear solvers –the complexity of long-running parallel applications prevents a single tool or solver to perform best for all cases We are investigating parallel multimethod (non)linear solvers that incorporate dynamic algorithmic adaptivity to enhance robustness and decrease overall runtimes.

4 4 Target Applications Scientific simulation involving the solution of nonlinear PDEs –Nonlinear system solution embedded in time-depenedent problems –Sparse (iterative) linear solvers needed, performance not normally known a priori –Multimethod algorithms for the solution of linear systems of equations, which consume the bulk of the execution time in nonlinear PDE solution and optimization Nonlinear optimization Parallel grid partitioning Multi-fidelity computations

5 5 Motivating Application Examples: PETSc- FUN3D 3D compressible Euler (used in this work; also supports incompressible Navier-Stokes) Fully implicit, steady-state Developed by D. Kaushik et al. (ANL) Based on FUN3D (developed by W.K. Anderson, NASA Langley) –Tetrahedral, vertex-centered unstructured mesh –Discretization: 1 st or 2 nd order Roe for convection and Galerkin for diffusion Pseudo-transient continuation –backward Euler for nonlinear continuation toward steady-state solution –Switched Evolution/relaxation (SER) approach of Van Leer and Mulder Newton-Krylov nonlinear solver –Matrix-Free (2 nd order FD) –Preconditioner (1 st order analytical) Won Gordon Bell prize at SC99; ongoing enhancements and performance tuning

6 6 3D transonic flow over ONERA M6 wing, at 3.06º angle of attack (exhibits l-shock at M = 0.839) Solve where Motivating Application Examples (cont.): Transonic Flow r = density, u = velocity, p = pressure E = energy density Test problem parameters –357,900 vertices –Jacobian of dimension 1.8 million with about 129 million nonzeros

7 7 Pseudo -Transient Continuation Parameters SER Heuristic Parameters of Interest –Initial CFL number –Exponent in the Power Law = 1 normally > 1 for first-order discretization (1.5) < 1 at outset of second-order discretization (0.75) Switch-over Ratio between First and Second Order –0.01 -0.001

8 8 Motivating Application Examples (cont.): Radiation Transport Based on Mousseau, Knoll, and Rider (LA-UR-99-4230) Govern the evolution of photon radiation in an optically thick medium Derived by integrating over all energy frequencies, assuming –Isotropy (angle dependence averaged out) –Small mean-free photon paths Very important in the simulation of forest fires, inertial confinement fusion (http://fusion.gat.com/icf), astrophysical phenomenahttp://fusion.gat.com/icf At t = 1At t = 3

9 9 Two Equation Model Based on Mousseau, Knoll, and Rider (LA-UR-99-4230) Photon Energy Equation Material Energy Equation where Atomic number (z) depends on the location Solve the nonlinear equation, R = 0 at every time step

10 10 Adaptive Newton-Krylov Solvers Solve f(u) = 0 using preconditioned inexact Newton’s methods: –Solve f’(u l-1 ) du l = -f(u l-1 ) –Update u l = u l-1 + l du l Observations: –Preconditioner quality dramatically impacts overall efficiency of pseudo-transient Newton-Krylov methods [Gropp et al., 2000] –Convergence tuning also important, e.g., relatively low- accuracy linear solution sufficient for good nonlinear convergence in some phases of pseudo-transient algorithm

11 11 Linear Solvers: Multimethod Approaches for Improved Robustness and Performance Composite Solvers (static): provide reliable linear solution by using a sequence of preconditioned iterative methods on a given system until convergence is achieved –Developed a combinatorial scheme for constructing a composite in terms of metrics such as execution time and mean failure rates –Experiments demonstrate significant improvements in overall time to solution, though many issues remain to be addressed Flow in driven cavity: Developed composites that required only about 60% of overall time to solution of the best single method Adaptive Solvers (dynamic): Solution method is selected dynamically to match the attributes of the linear systems as they change during the course of a simulation –Some performance illustrations in the following slides

12 12 More Motivating Applications Parallel mesh partitioning in combustion simulations: J. Ray et al. (Sandia) have developed a CCA toolkit for flame simulations using block- structured adaptive meshes, which must be partitioned to enable parallel computing. No single partitioner is optimal; thus, CQoS support for choosing an efficient partitioner and an appropriate configuration for a given mesh is desirable. Resource management in quantum chemistry: enabling dynamic adaptation to available resources (e.g., memory and time) during molecular wavefunction determination and other quantum chemical subproblems [Kenny:2004]; adaptively using MCMD and hybrid computing paradigms in quantum chemistry Simulations [Manoj:2005]. Efficient solution of linear systems arising in accelerator and fusion simulations: The solution of linear algebraic systems of equations often dominates the overall runtime of large-scale PDE-based simulations, such as those proposed in the Accelerator Community Code Development and Discovery and Framework Application for Core-Edge Transport Simulations SciDAC-2 proposals. The focus of CQoS support is on automation of appropriate choices for algorithms and parameters linear and nonlinear solver components.

13 13 Definitions and Assumptions Adaptivity –We consider adaptivity at a coarse level, usually for algorithms performing some nontrivial computation –In addition to runtime adaptivity, we also consider the problem of initial application configuration Prerequisites –A library or component implementation with well-defined interfaces

14 14 Goal Improve time to solution through automation of application configuration and runtime adaptation within given quality constraints.

15 15 Outline Motivation Goals and assumptions A brief introduction to components for high- performance computing Approach: A quality of service architecture Application examples Summary and future work

16 16 Basic Requirements for a Righ-Performance Component Architecture Simple/Flexible – to adopt – to understand – to use Support a composition mechanism that does not impede high- performance component interactions Permit the SPMD (also allow MPMD) paradigms in component form Be able to coexist with and rely on other commodity component frameworks to provide services... – e.g., EJB, CORBA, COMM…

17 17 CCA Component Model Common Component Architecture (CCA) Forum ( http://www.cca- forum.org) http://www.cca- forum.org –Originated in 1998, led by Rob Armstrong (SNL) –Open to everyone interested in HPC components CCA provides a specification and supporting software for interoperability of high-performance scientific components developed by many different groups in different languages or frameworks. –Minimalist approach makes it easier to create components from existing software Abstract interfaces called ports define the interactions between components –A set of tools to enhance the productivity of scientific programmers Make some things easier, make some intractable problems tractable Support & promote reuse “The best software is code you don’t have to write.” [Steve Jobs] – Design reuse – Reuse of tested and optimized implementations – Skills reuse Support interoperability – Software – People Enable developer team specialization Existing component architecture standards such as CORBA, Java Beans, and COM do not provide support for parallel components. sec 1 10 -6 10 -4 10 CORBA/Java CCAMPI Latency between components

18 18 Object-Oriented vs Component-Oriented Development Components can be implemented using OO techniques Component-oriented development can be viewed as augmenting OOD with certain policies, e.g., require that certain abstract interfaces be implemented Components, once compiled, require a special execution environment COD focuses on higher levels of abstraction, not particular to a specific OO language –abstract (common) interfaces –dynamic composability –language interoperability Can convert from OO to CO specific to a given framework, possibly with some level of automation

19 19 CCA Delivers (Preserves) Performance Local No CCA overhead within components Small overhead between components Small overhead for language interoperability (with some exceptions) Parallel No CCA overhead on parallel computing Use your favorite parallel programming model Supports SPMD and MPMD approaches Distributed (remote) No CCA overhead – performance depends on networks, protocols CCA frameworks support OGSA/Grid Services/Web Services and other approaches Maximum 0.2% overhead for CCA vs native C++ code for parallel molecular dynamics up to 170 CPUs Aggregate time for linear solver component in unconstrained minimization problem w/ PETSc

20 20 Why Components? Component-oriented development enables automated application adaptation that relies on conformance to common well-defined interfaces, e.g., –Linear solver interfaces –Automatic generation of source code for proxy components for performance monitoring, debugging, checkpointing, etc. –Database access interfaces –… Component definitions can be augmented with metadata for specifying performance and quality attributes/requirements. Components are a “plug-and-play” model for building applications, enabling –Algorithm substitution –Automation of performance instrumentation/monitoring –Application adaptivity (automated or user-guided)

21 21 Major Events in a Component’s Lifetime Dynamic Library Loading Instantiation Configuration Execution Destruction Port Connections (Port Configuration) Port Connections (Port Configuration)

22 22 Outline Motivation Goals and assumptions A brief introduction to components for high-performance computing Approach: A quality of service architecture Application examples Summary and future work

23 23 Approach Computational quality of service (CQoS) infrastructure for component applications, which –Uses metadata for describing non-functional properties and requirements, e.g., quality “metrics” –Supports automated performance instrumentation and monitoring –Enables off-line performance data analysis through machine learning, statistics, etc. for extracting essential performance characteristics for constructing composite algorithms for use in adaptive heuristics –Provides a framework for dynamic adaptive algorithm development (domain-specific) based on CQoS metadata from analytical performance models or synthesized from past performance history of the application.

24 24 Approach (Cont.) Consider a multi-component application A, composed of components C 1, C 2, …, C n. Each C i can have multiple implementations C i,k Define a cost function P(C i,k ) = F i,k (x,p), where –P(C i,k ) is the execution time of component C i with implementation C i,k –F i,k (x,p) represents a mathematical model or data representing the simulation’s state x and parameters p. While in some cases an analytical model exists for P(C i,k ), in most cases we have only a few datapoints indicating performance of prior runs.

25 25 CQoS Architecture Enable the implementation of CQoS-enhanced applications by providing necessary performance data gathering, manipulation and dynamic execution control infrastructure, e.g.: –Performance instrumentation, monitoring, and database management –Off-line access to performance data for analysis, uniform access to analysis engines: machine learning, statistics, etc. –Control law language for dynamic adaptation based on CQoS metadata from analytical performance models or synthesized from past performance history of the application –Databases 1 for performance experiment data and performance models Automation! 1 Kevin A. Huck, Allen D.Malony, Robert Bell, and Alan Morris, “Design and Implementation of a Parallel Performance Data Management Framework,” Proc. International Conference on Parallel Processing (ICPP 2005), IEEE Computer Society, 2005.

26 26 Performance Databases (historical & runtime) Interactive Analysis and Model Building Substitution Assertion Database Substitution Assertion Database Scientist can analyze data interactively Scientist can provide decisions on substitution and reparameterization Instrumented Component Application Cases Instrumented Component Application Cases Analysis Infrastructure Performance monitoring, problem/solution characterization, and performance model building Control Infrastructure Interpretation and execution of control laws to modify an application’s behavior Control System (parameter changes and component substitution) Control System (parameter changes and component substitution) CQoS-Enabled Component Application CQoS-Enabled Component Application Component A Component B Component C Component Substitution Set Component Substitution Set

27 27 CQoS Metadata Ideally, for each application execution, the metadata should provide enough information to be able to reproduce a particular application instance. Examples: –Input data (reduced representations) Matrix properties, condition number –Model parameters Physical constants –Algorithmic parameters Convergence tolerance CFL number Maximum number of iterations –Compilers Name, version, options –Hardware CPU type, memory and caches Parallel configuration

28 28 Sources of CQoS Metadata Performance (general) –Theoretical models –Source-based models –Historical performance data from different instances of the same application or related applications: Obtained through source instrumentation, e.g., TAU (U. Oregon) Binary instrumentation, e.g., HPCToolkit (Rice U.) Domain-specific –Provided by scientist/algorithm developer

29 29 Analysis Infrastructure (for Performance Monitoring and Modeling, Problem/Solution Characterization, and Machine Learning) Application Driver Application Driver Component Logger Measurement Component Measurement Component Modeling Information (metadata, models, viz) Modeling Information (metadata, models, viz) Performance Databases (historical & runtime) Verification & Validation Machine Learning Iterative Analysis and Model Building Interactive Analysis Interface to Control Infrastructure Substitution Assertion Database Performance- Related Proxy Performance- Related Proxy Scientist can analyze data interactively

30 30 Gathering and Analyzing Performance Data Challenges: whole-application profiles do not provide enough information; complete tracing is too resource-intensive Approach: –Generate component proxies that are instrumented to capture the performance of a given component (these can be generated, built, and loaded on demand) –Add finer-grain instrumentation for computationally intensive kernels (once they are identified), use context-sensitive monitoring –Use a database that allows the representation and storage of application metadata and performance models in addition to raw performance data Enable different analyses through well-defined APIs to the performance data and metadata in the performance database

31 31 Abstract Framework Abstract Framework Application Component Application Component Component Implementation A Component Proxy Component Proxy Component Implementation B Component Implementation C Proxy Generator Proxy Generator Replacement Service (provisioning & deployment) Replacement Service (provisioning & deployment) Substitution & Reparameterization Decision Service Substitution & Reparameterization Decision Service Component Substitution Set Parameter 1 Parameter 2 Parameter 3 Parameter 1 Parameter 2 Control Law Characterization of Problem Characterization of Problem (characterization of component functionality with respect to problem via application- provided heuristics and/or machine learning) Interface to Analysis Infrastructure Substitution Assertion Database Scientist can provide decisions on substitution and reparameterization Control Infrastructure

32 32 Initial Focus: Application Adaptation Scientific computations may consist of different phases, with different resource needs and performance characteristics –Identify those phases, first manually, then with automated heuristics Exploit component-based (and service-based) software engineering to enable automated solution method selection and substitution at runtime based on changing application characteristics and available resources Some initial efforts also on application composition by automatically deriving whole-application performance models from single-component ones [R. Armstrong, J. Ray, Sandia National Laboratory]

33 33 Example: Multimethod Linear Solver Components in Nonlinear PDE Solution Nonlinear Solver Mesh Linear Solver Adaptive Heuristic Performance Monitor MeshCheckpointingPhysics Linear Solver A Nonlinear Solver Linear Solver B Linear Solver C Example adaptive strategies based on:  CFL number  Rate of nonlinear convergence  Known phases  Matrix properties

34 34 Outline Motivation Goals and assumptions A brief introduction to components for high-performance computing Approach: A quality of service architecture Application examples Summary and future work

35 35 Adaptive Example: FUN3D 80 60 40 20 2500 1500 500 Adapt1: (1) 1 st order: BCGS / BJacobi with ILU(0) (25) 1 st order: FGMRES(30) / BJacobi with ILU(0) (28) 2 nd order: BCGS / Bjacobi with ILU(0) (66) 2 nd order: FGMRES(30) / BJacobi with ILU(0) (80) 2 nd order: FGMRES(30) / BJacobi with ILU(1) Adapt2: (1) 1 st order: GMRES(30) / Bjacobi with SOR (2) 1 st order: BCGS / BJacobi with ILU(0) (25) 1 st order: FGMRES(30) / BJacobi with ILU(0) (28) 2 nd order: BCGS / Bjacobi with ILU(0) (66) 2 nd order: FGMRES(30) / BJacobi with ILU(0) (80) 2 nd order: FGMRES(30) / BJacobi with ILU(1) base methodsadaptive Cumulative time (seconds) Nonlinear Iteration Number Linear solution time per nonlinear iteration

36 36 Comparison of traditional fixed linear solvers and an adaptive scheme, which uses a different preconditioner during each of the phases of the pseudo-transient Newton- Krylov algorithm MCS Jazz cluster (2.4 GHz Pentium Xeon with 1 or 2 GB RAM), 4 nodes Adaptive methods provide reliable and robust solution and reduced the number of nonlinear iterations and overall time to solution. Adaptive Example: FUN3D (cont.) Convergence rates of base and adaptive methods Cumulative time (seconds)Nonlinear iterations Adapt1: (1) 1 st order: BCGS / BJacobi with ILU(0) (25) 1 st order: FGMRES(30) / BJacobi with ILU(0) (28) 2 nd order: BCGS / Bjacobi with ILU(0) (66) 2 nd order: FGMRES(30) / BJacobi with ILU(0) (80) 2 nd order: FGMRES(30) / BJacobi with ILU(1) Adapt2: (1) 1 st order: GMRES(30) / Bjacobi with SOR (2) 1 st order: BCGS / BJacobi with ILU(0) (25) 1 st order: FGMRES(30) / BJacobi with ILU(0) (28) 2 nd order: BCGS / Bjacobi with ILU(0) (66) 2 nd order: FGMRES(30) / BJacobi with ILU(0) (80) 2 nd order: FGMRES(30) / BJacobi with ILU(1)

37 37 Example 2: 2D Driven Cavity Flow 1 Linear solver: GMRES(30), vary only fill level of ILU preconditioner Adaptive heuristic based on: –Previous linear solution convergence rate, nonlinear solution convergence rate, rate of increase of linear solution iterations 96x96 mesh, Grashof = 10 5, lid velocity = 100 Intel P4 Xeon, dual 2.2 GHz, 4GB RAM 1 T. S. Coffey, C.T. Kelley, and D.E. Keyes. Pseudo-transient continuation and differential algebraic equations. SIAM J. Sci. Comp, 25:553–569, 2003.

38 38 GMRES(30)/ SOR BCGS / ILU(1) FGMRES(30)/ ILU(1) BCGS / ILU(1) Example 3: Radiation Transport

39 39 Example 3: Radiation Transport (cont.)

40 40 Some Related Multimethod Work Performance of several preconditioned Krylov iterations and stationary methods in nonlinear elliptic PDEs [Ern et al. 1994] Polyalgorithmic parallel linear system solution [Barret et al. 1996] SALSA – Self-Adapting Large-scale Software Architecture [Dongarra and Eijkhout, 2004] Application of Machine Learning to Selecting Solvers for Sparse Linear Systems, S. Bhowmick, D. Keyes, Y. Freund, V. Eijkhout and E. Fuentes, SIAM Conference on Parallel Processing for Scientific Computing, February 22-24 2006, San Francisco, California. Also a number of research efforts in the area of QoS for components in general

41 41 Summary Adaptive linear solvers provide reliable & robust solution and can reduce overall time to solution. Observed a need for an expanded notion of quality for service and component-based applications. Designed an infrastructure for performance analysis and adaptation of scientific applications, leading to a general CQoS architecture for component or service-based applications. –Initial implementation Performance monitoring Database access Algorithm adaptation

42 42 Ongoing and Future Work (Incomplete List) Component implementation of CQoS architecture Some initial efforts also on application composition by automatically deriving whole-application performance models from single- component ones [R. Armstrong, J. Ray, Sandia National Laboratory] Incorporate more off-line performance analysis capabilities (machine learning, statistical analysis, etc.) Apply to more problem domains, implementing extensions as necessary Integration with grid services Integration of ongoing efforts in –Performance tools: common interfaces and data represenation (leverage PERI tools, PerfDMF, TAU performance interfaces, and other efforts) –Numerical components: emerging common interfaces (e.g., TOPS solver interfaces) increase choice of solution method  automated composition and adaptation strategies

43 43 Collaborators and Funding Sanjukta Bhowmick, ANL/Columbia Dinesh Kaushik, Lois McInnes, Argonne National Laboratory Padma Raghavan (Penn State), Ivana Veljkovic Jaideep Ray, Rob Armstrong, Sandia National Laboratory Allen Malony, Sameer Shende, Alan Morris, U. Oregon Tamara Dahlgdren, Lawrence Livermore National Laboratory Funding: –Department of Energy (DOE) Mathematical, Information, and Computational Science (MICS) program –DOE Scientific Discovery through Advanced Computing (SciDAC) program –National Science Foundation

44 44 Thank you!

45 45 Some Initial CQoS Infrastructure Implementations Goal: provide a framework for –Performance monitoring and analysis of numerical components –Initial application composition and dynamic adaptivity based on: Off-line analyses of past performance information Online analysis of current execution performance information Motivating application examples: –Driven cavity flow [Coffey et al, 2003], nonlinear PDE solution –FUN3D – incompressible and compressible Euler equations Prior work in multimethod linear solvers –McInnes et al, ’03, Bhowmick et al,’03 and ’05, Norris at al. ’05.

46 46 Prototype Software for Managing Adaptive Strategies Provider Component C Provider Component B Provider Component A Component Proxy Runtime Monitoring Historical Database Runtime Database Access Component Framework Application Component(s) Adaptive Strategy Component Adaptive Strategy Component Adaptive Strategy Component Adaptive Strategy Component Abstract Interface

47 47 Interface Example: Adaptive Method Interface AdaptiveAlgorithm { int Adapt (inout opaque appData); } Interface AdaptiveContext { int apply(); AdaptiveAlgorithm getAdaptiveAlgorithm(); void setAdaptiveAlgorithm(in AdaptiveAlgorithm alg); Properties getProperties(); void setProperties(in Properties prop); }

48 48 Runtime Performance Monitoring  Requirements:  Low overhead  Context information  Access to performance data from other components in the application

49 49 Runtime Support Implementation TAU performance measurement (provided by TAU developers, see http://www.cca- forum.org/components/profiling/tau.htm) Checkpointing component –Checkpoint and store collected data into runtime database, variable sampling rates supported –Efficient queries during the application’s execution –Can also be used by any other component in the application to collect and query context-dependent and high-level performance information for itself or other checkpointed components

50 50 Runtime Support (Cont.) Performance metadata table: –Iteration range for which the metadata result applies (this is a two-element, one-dimensional array) –Average slope of change of metadata in this range –Input parameter range for which the metadata result applies (this is a two-dimensional array).

51 51 Offline Analysis  Requirements:  Database access (query and write)  Context information: experiment metadata  Access to performance data for multiple application instances


Download ppt "Enabling Adaptive Algorithms through Component-Based Software Engineering Boyana Norris Argonne National Laboratory RWTH."

Similar presentations


Ads by Google