MS 15: Data-Aware Parallel Computing Data-Driven Parallelization in Multi-Scale Applications – Ashok Srinivasan, Florida State University Dynamic Data.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Arc-length computation and arc-length parameterization
5.1 Real Vector Spaces.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Experiments and Variables
Slide 1 Bayesian Model Fusion: Large-Scale Performance Modeling of Analog and Mixed- Signal Circuits by Reusing Early-Stage Data Fa Wang*, Wangyang Zhang*,
Pattern Recognition and Machine Learning
Experimental Design, Response Surface Analysis, and Optimization
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
MRA basic concepts Jyun-Ming Chen Spring Introduction MRA (multi- resolution analysis) –Construct a hierarchy of approximations to functions in.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Development of Empirical Models From Process Data
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Ordinary least squares regression (OLS)
Orthogonality and Least Squares
Molecular Dynamics Classical trajectories and exact solutions
Joo Chul Yoon with Prof. Scott T. Dunham Electrical Engineering University of Washington Molecular Dynamics Simulations.
MCE 561 Computational Methods in Solid Mechanics
Monte Carlo Methods in Partial Differential Equations.
More Realistic Power Grid Verification Based on Hierarchical Current and Power constraints 2 Chung-Kuan Cheng, 2 Peng Du, 2 Andrew B. Kahng, 1 Grantham.
Adaptive Signal Processing
On the Accuracy of Modal Parameters Identified from Exponentially Windowed, Noise Contaminated Impulse Responses for a System with a Large Range of Decay.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Summarized by Soo-Jin Kim
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Javier Junquera Molecular dynamics in the microcanonical (NVE) ensemble: the Verlet algorithm.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
V. Space Curves Types of curves Explicit Implicit Parametric.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
J. L. Bassani and V. Racherla Mechanical Engineering and Applied Mechanics V. Vitek and R. Groger Materials Science and Engineering University of Pennsylvania.
Computational Nanotechnology A preliminary proposal N. Chandra Department of Mechanical Engineering Florida A&M and Florida State University Proposed Areas.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Long-Time Molecular Dynamics Simulations through Parallelization of the Time Domain Ashok Srinivasan Florida State University
AGC DSP AGC DSP Professor A G Constantinides©1 Hilbert Spaces Linear Transformations and Least Squares: Hilbert Spaces.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
Order of Magnitude Scaling of Complex Engineering Problems Patricio F. Mendez Thomas W. Eagar May 14 th, 1999.
Illustration of FE algorithm on the example of 1D problem Problem: Stress and displacement analysis of a one-dimensional bar, loaded only by its own weight,
1 MODELING MATTER AT NANOSCALES 4. Introduction to quantum treatments The variational method.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
Introduction to Research 2007 Introduction to Research 2007 Ashok Srinivasan Florida State University Recent collaborators V.
STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Scalable Time-Parallelization of Molecular Dynamics Simulations in Nano Mechanics Y. Yu, Ashok Srinivasan, and N. Chandra Florida State University
Data-Driven Time-Parallelization in the AFM Simulation of Proteins L. Ji, H. Nymeyer, A. Srinivasan, and Y. Yu Florida State University
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
1-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. April 30, 2009 Inventory # Workbench - Mechanical Structural Nonlinearities Chapter.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Computational Techniques for Efficient Carbon Nanotube Simulation
ECE3340 Numerical Fitting, Interpolation and Approximation
Regression Analysis Module 3.
Long-Time Molecular Dynamics Simulations in Nano-Mechanics through Parallelization of the Time Domain Ashok Srinivasan Florida State University
Fluid Flow Regularization of Navier-Stokes Equations
Introduction.
Computational Techniques for Efficient Carbon Nanotube Simulation
Computational issues Issues Solutions Large time scale
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Approximation of Functions
Approximation of Functions
Presentation transcript:

MS 15: Data-Aware Parallel Computing Data-Driven Parallelization in Multi-Scale Applications – Ashok Srinivasan, Florida State University Dynamic Data Driven Finite Element Modeling of Brain Shape Deformation During Neurosurgery – Amitava Majumdar, San Diego Supercomputer Center Dynamic Computations in Large-Scale Graphs – David Bader, Georgia Tech Tackling Obesity in Children – Radha Nandkumar, NCSA

Data-Driven Parallelization in Multi-Scale Applications Ashok Srinivasan Computer Science, Florida State University Aim: Simulate for long time spans Solution features: Use data from prior simulations to parallelize the time domain Acknowledgements: NSF, ORNL, NERSC, NCSA Collaborators: Yanan Yu and Namas Chandra

Outline Background –Limitations of Conventional Parallelization –Example Application: Carbon Nanotube Tensile Test Small Time Step Size in Molecular Dynamics Simulations Data-Driven Time Parallelization Experimental Results –Scaled efficiently to ~ 1000 processors, for a problem where conventional parallelization scales to just 2-3 processors Other time parallelization approaches Conclusions

Background Limitations of Conventional Parallelization Example Application: Carbon Nanotube Tensile Test –Molecular Dynamics Simulations Problems with Multiple Time-Scales

Limitations of Conventional Parallelization Conventional parallelization decomposes the state space across processors –It is effective for large state space –It is not effective when computational effort arises from a large number of time steps … or when granularity becomes very fine due to a large number of processors

Example Application Carbon Nanotube Tensile Test Pull the CNT at a constant velocity –Determine stress-strain response and yield strain (when CNT starts breaking) using MD Strain rate dependent

A Drawback of Molecular Dynamics Molecular dynamics –In each time step, forces of atoms on each other modeled using some potential –After force is computed, update positions –Repeat for desired number of time steps Time steps size ~ 10 –15 seconds, due to physical and numerical considerations –Desired time range is much larger A million time steps are required to reach s Around a day of computing for a 3000-atom CNT MD uses unrealistically large strain-rates

Problems with multiple time-scales Fine-scale computations (such as MD) are more accurate, but more time consuming –Much of the details at the finer scale are unimportant, but some are A simple schematic of multiple time scales

Data-Driven Time Parallelization Time parallelization Data Driven Prediction –Dimensionality Reduction –Relate Simulation Parameters –Static Prediction –Dynamic Prediction Verification

Time Parallelization Each processor simulates a different time interval Initial state is obtained by prediction, except for processor 0 Verify if prediction for end state is close to that computed by MD Prediction is based on dynamically determining a relationship between the current simulation and those in a database of prior results If time interval is sufficiently large, then communication overhead is small

Dimensionality Reduction Movement of atoms in a 1000-atom CNT can be considered the motion of a point in 3000-dimensional space Find a lower dimensional subspace close to which the points lie We use principal orthogonal decomposition –Find a low dimensional affine subspace Motion may, however, be complex in this subspace –Use results for different strain rates Velocity = 10m/s, 5m/s, and 1 m/s –At five different time points [U, S, V] = svd(Shifted Data) –Shifted Data = U*S*V T –States of CNT expressed as  + c 1 u 1 + c 2 u 2 uu  uu

Basis Vectors from POD CNT of ~ 100 A with 1000 atoms at 300 K u 1 (blue) and u 2 (red) for z u 1 (green) for x is not “significant” Blue: z Green, Red: x, y

Relate strain rate and time Coefficients of u 1 –Blue: 1m/s –Red: 5 m/s –Green: 10m/s –Dotted line: same strain Suggests that behavior is similar at similar strains In general, clustering similar coefficients can give parameter-time relationships

Prediction When v is the only parameter Dynamic Prediction –Correct the above coefficients, by determining the error between the previously predicted and computed states Direct Predictor –Independently predict change in each coordinate Use precomputed results for 40 different time points each for three different velocities –To predict for (t; v) not in the database Determine coefficients for nearby v at nearby strains Fit a linear surface and interpolate/extrapolate to get coefficients c 1 and c 2 for (t; v) Get state as  + c 1 u 1 + c 2 u 2 Green: 10 m/s, Red: 5 m/s, Blue: 1 m/s, Magenta: 0.1 m/s, Black: 0.1m/s through direct prediction

Verification of prediction Definition of equivalence of two states –Atoms vibrate around their mean position –Consider states equivalent if difference in position, potential energy, and temperature are within the normal range of fluctuations Mean position Displacement (from mean)

Experimental Results Relate simulations with different strain rates –Use the above strategy directly Relate simulations with different strain rates and different CNT sizes –Express basis vectors in a different functional form Relate simulations with different temperatures and strain rates –Dynamically identify different simulations that are similar in current behavior

Stress-strain response at 0.1 m/s Blue: Exact result Green: Direct prediction with interpolation / extrapolation –Points close to yield involve extrapolation in velocity and strain Red: Time parallel results

Speedup Red line: Ideal speedup Blue: v = 0.1m/s Green: The next predictor v = 1m/s, using v = 10m/s CNT with 1000 atoms Xeon/ Myrinet cluster

CNTs of varying sizes Use a 1000-atom CNT result –Parallelize 1200, 1600, 2000-atom CNT runs –Observe that the dominant mode is approximately a linear function of the initial z-coordinate Normalize coordinates to be in [0,1] z t+  t = z t + z’ t+  t  t, predict z’ Speedup atoms atoms __ 1200 atoms … Linear Stress-strain Blue: Exact 2000 atoms Red: 200 processors

Predict change in coordinates Express x’ in terms of basis functions –Example: x’ t+  t = a 0, t+  t + a 1, t+  t x t –a 0, t+  t, a 1, t+  t are unknown –Express changes, y, for the base (old) simulation similarly, in terms of coefficients b and perform least squares fit Predict a i, t+  t as b i, t+  t + R t+  t R t+  t = (1-  ) R t +  (a i, t - b i, t ) Intuitively, the difference between the base coefficient and the current coefficient is predicted as a weighted combination of previous weights We use  = 0.5 –Gives more weight to latest results –Does not let random fluctuations affect the predictor too much Velocity estimated as latest accurate results known

Temperature and velocity vary Use 1000-atom CNT results –Temperatures: 300K, 600K, 900K, 1200K –Velocities: 1m/s, 5m/s, 10m/s Dynamically choose closest simulation for prediction Speedup __ 450K, 2m/s … Linear Stress-strain Blue: Exact 450K Red: 200 processors

Other time parallelization approaches Waveform relaxation –Repeatedly solve for the entire time domain –Parallelizes well but convergence can be slow –Several variants to improve convergence Parareal approach –Features similar to ours and to waveform relaxation Precedes our approach –Not data-driven –Sequential phase for prediction –Not very effective in practice so far Has much potential to be improved

Conclusions Data-driven time parallelization shows significant improvement in speed, without sacrificing accuracy significantly Direct prediction is very effective when applicable The 980-processor simulation attained a flop rate of ~ 420 Gflops –Its flops per atom rate of 420 Mflops/atom is likely the largest flop per atom rate in classical MD simulations

Future Work More complex problems –Better prediction POD is good for representing data, but not necessarily for identifying patterns Use better dimensionality reduction / reduced order modeling techniques Use experimental data for prediction –Better learning –Better verification –In CP8: Application of Dimensionality Reduction Techniques to Time Parallelization, Yanan Yu Tomorrow, 2:30 – 3:00 pm