Presentation is loading. Please wait.

Presentation is loading. Please wait.

Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009.

Similar presentations


Presentation on theme: "Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009."— Presentation transcript:

1 Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009

2 Computational Quality of Service Definition: The ability to change a simulation code (a collection of CCA components) on-the-fly in order to maintain optimality –Optimality: Determined by a user-defined cost function of simulation behavior and/or solution properties Choose components to make the resultant code –Fast –Robust –Accurate Component behavior/performance depends on the problem at hand (i.e. the input)‏ –Requirement: quantification of problem “difficulty” –Requirement: metrics for component performance and input difficulty Competing constraints

3 Theoretical Needs What metrics define performance? –FLOPS, iterations to convergence, % load imbalance What metrics define robustness? –Convergence failure, bad load-imbalance, etc. What metrics define accuracy? –Global, local, statistical, deterministic, etc. All metrics are functions of the component’s input We need a model that, given component inputs and machine characteristics, predicts the component’s performance

4 Practical Needs To make performance models, we need –A collection of components to choose from –A test harness for components –A performance measurement tool – TAU –A database for storing empirical performance data –Statistical tools for model making To use performance models and make adaptive codes, we need a control system that contains –An optimization system to choose the “best” component –A feedback system that can take corrective action if a bad component is chosen by the optimization system

5 Our Interests Problem: Create a control system that can choose the best load-balancer for a simulation –Load per grid cell varies in both space and time over Cartesian mesh What numerical techniques lead to imbalance? –Adaptive Cartesian meshes –Some operator-split constructions What applications show such behavior? –Hydrocarbon combustion –Astrophysics – Type II supernovae Net result: Simulation becomes load-imbalanced

6 Considerations Solution: Repartition frequently using fast, dynamic load-balancer –Speed achieved mainly by sacrificing partition quality –Some are partial to load-balance, others minimize communication time or data migration during repartition Physics and numerics determine if the simulation is computation- or communication-dominated, so –Same load-balancer may not work throughout the run –We need to choose load-balancers anew every time we repartition

7 Control System Configuration What would a control infrastructure for an analytical control law look like? Partitioner-C DriverPartitioner Mesh Partitions Driver Meta- Partitioner (if-then-else)‏   Mesh Characterizer Partitioner-B Partitioner-A Mesh PartitionsMesh  Mesh,  Control law

8 Load-Balancer Selection Model the simulation to formulate metrics that depend on the current state –e.g., communication/computation cost Characterize the dynamic load-balancers with simplified metrics –e.g., communication time, data migration effort, grid shape, runtime Develop rules to pair simulation state with appropriate partitioner –Implement a “meta-partitioner” to select a load- balancing partitioner using the rules Essentially, the code adapts to the problem!

9 Control Systems Research Mostly done by J. Steensland & H. Johansson –Johansson H.; Design and Implementation of a Dynamic and Adaptive Meta-Partitioner for Parallel SAMR Grid Hierarchies Have a set of parameterized load-balancers Modeled relationship between mesh characteristics and load-balancer inputs that lead to optimal partitions Performed tests to predict if, given a mesh, the model can predict the best load-balancer –It cannot predict it reliably, but –Provides set of (~10) candidates that contains best one –Brute-force solution: test all candidates and select best (takes ~10 seconds)‏

10 Required Components Essentially, things in the CCA toolkit –Simulation components: a mesh, some integrators, some linear solvers, some physics components, etc. –A variety of load-balancers And a control system to choose load-balancers

11 2D mesh already exists Part of the tutorial and toolkit Parallel capabilities Works in Bocca Used in reaction-diffusion problems, with multiple integration techniques Can accommodate slab-wise and block- wise decomposition No connection to load-balancers yet –Does its own simple domain decomposition Great for tutorials, but too simple for CQoS Mesh Component Status

12 Over the next 6 months... Extend mesh to 3D to tackle harder problems Lemaster, Stone, & Gardiner (2007)‏

13 Over the next 6 months... Extend mesh to 3D to tackle harder problems Extend it to incorporate domain decomposition beyond slab- and block-wise

14 Over the next 6 months... Extend mesh to 3D to tackle harder problems Extend it to incorporate domain decomposition beyond slab- and block-wise –Sub-domains consisting of a disjoint set of abutting rectangles Design ports to load-balancers Identify more interesting applications for use in CQoS testing –Construct any extra components needed –Solve the problem; quantify the degree of difficulty

15 Results to come! Contact info: Nicole Lemaster Sandia National Labs mnlemas@sandia.gov


Download ppt "Loads Balanced with CQoS Nicole Lemaster, Damian Rouson, Jaideep Ray Sandia National Laboratories Sponsor: DOE CCA Meeting – January 22, 2009."

Similar presentations


Ads by Google