Xing Cai University of Oslo

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Introduction to Finite Elements
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
Applied Linear Algebra - in honor of Hans SchneiderMay 25, 2010 A Look-Back Technique of Restart for the GMRES(m) Method Akira IMAKURA † Tomohiro SOGABE.
Geometric (Classical) MultiGrid. Hierarchy of graphs Apply grids in all scales: 2x2, 4x4, …, n 1/2 xn 1/2 Coarsening Interpolate and relax Solve the large.
UMR CNRS 6599 HeuDiaSyC, UMR CNRS 6066 Roberval 1 A M odular D esign for a P arallel M ultifrontal M esh G enerator J.P. Boufflet, P. Breitkopf, C. Longeau,
Parallel Solution of Navier Stokes Equations Xing Cai Dept. of Informatics University of Oslo.
Numerical Parallel Algorithms for Large-Scale Nanoelectronics Simulations using NESSIE Eric Polizzi, Ahmed Sameh Department of Computer Sciences, Purdue.
Landscape Erosion Kirsten Meeker
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
MCE 561 Computational Methods in Solid Mechanics
Module on Computational Astrophysics Jim Stone Department of Astrophysical Sciences 125 Peyton Hall : ph :
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
Numerical methods for PDEs PDEs are mathematical models for –Physical Phenomena Heat transfer Wave motion.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
1CPSD NSF/DARPA OPAAL Adaptive Parallelization Strategies using Data-driven Objects Laxmikant Kale First Annual Review October 1999, Iowa City.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Crack propagation on highly heterogeneous composite materials Miguel Patrício.
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
CADD: Component-Averaged Domain Decomposition Dan Gordon Computer Science University of Haifa Rachel Gordon Aerospace Engg. Technion January 13,
CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.
Efficient Integration of Large Stiff Systems of ODEs Using Exponential Integrators M. Tokman, M. Tokman, University of California, Merced 2 hrs 1.5 hrs.
Parallel Simulation of Continuous Systems: A Brief Introduction
The swiss-carpet preconditioner: a simple parallel preconditioner of Dirichlet-Neumann type A. Quarteroni (Lausanne and Milan) M. Sala (Lausanne) A. Valli.
A Software Strategy for Simple Parallelization of Sequential PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Parallelizing finite element PDE solvers in an object-oriented framework Xing Cai Department of Informatics University of Oslo.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Domain Decomposition in High-Level Parallelizaton of PDE codes Xing Cai University of Oslo.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Parallel and Distributed Simulation Time Parallel Simulation.
An Evaluation of Partitioners for Parallel SAMR Applications Sumir Chandra & Manish Parashar ECE Dept., Rutgers University Submitted to: Euro-Par 2001.
23/5/20051 ICCS congres, Atlanta, USA May 23, 2005 The Deflation Accelerated Schwarz Method for CFD C. Vuik Delft University of Technology
Discretization for PDEs Chunfang Chen,Danny Thorne Adam Zornes, Deng Li CS 521 Feb., 9,2006.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
An Object-Oriented Software Framework for Building Parallel Navier-Stokes Solvers Xing Cai Hans Petter Langtangen Otto Munthe University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
On the Performance of PC Clusters in Solving Partial Differential Equations Xing Cai Åsmund Ødegård Department of Informatics University of Oslo Norway.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Adaptive grid refinement. Adaptivity in Diffpack Error estimatorError estimator Adaptive refinementAdaptive refinement A hierarchy of unstructured gridsA.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.
Parallel Computing Activities at the Group of Scientific Software Xing Cai Department of Informatics University of Oslo.
A Software Framework for Easy Parallelization of PDE Solvers Hans Petter Langtangen Xing Cai Dept. of Informatics University of Oslo.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
High Performance Computing Seminar II Parallel mesh partitioning with ParMETIS Parallel iterative solvers with Hypre M.Sc. Caroline Mendonça Costa.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Relaxation Methods in the Solution of Partial Differential Equations
LabVIEW Real Time for High Performance Control Applications
Hui Liu University of Calgary
Analysis of the Solver Performance for Stokes Flow Problems in Glass Forming Process Simulation Models Speaker: Hans Groot Supervisors: Dr. Hegen (TNO.
Programming Models for SimMillennium
Nodal Methods for Core Neutron Diffusion Calculations
GENERAL VIEW OF KRATOS MULTIPHYSICS
Supported by the National Science Foundation.
Numerical Linear Algebra
Introduction to Scientific Computing II
A Software Framework for Easy Parallelization of PDE Solvers
Comparison of CFEM and DG methods
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson Oct 14, 2014 slides6b.ppt 1.
Parallelizing Unstructured FEM Computation
Introduction to High Performance Computing Lecture 16
Stencil Pattern ITCS 4/5145 Parallel computing, UNC-Charlotte, B. Wilkinson StencilPattern.ppt Oct 14,
Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Xing Cai University of Oslo Numerical Simulation of 3D Fully Nonlinear Waters Waves on Parallel Computers Xing Cai University of Oslo

Outline of the Talk Mathematical model Numerical scheme (sequential) Parallelization strategy (domain decomposition) Object-oriented implementation Numerical experiment PARA'98

Mathematical Model Fully nonlinear 3D water waves Primary unknowns: PARA'98

Numerical Scheme Physical domain: Transformation: (a fixed domain) PARA'98

Numerical Scheme Operator splitting At each time level: PARA'98 FDM for updating free surface conditions FEM solution of an elliptic boundary value problem in PARA'98

Preconditioning Computational cost Elliptic boundary value problem - most CPU intensive Resulting system of linear equations Preconditiong Computational cost PARA'98 N- number of unknowns

The Question Starting point: an o-o water wave simulator (built in Diffpack: C++ environment for scientific computing) How to do the parallelization? Different approaches on different levels: Automatic parallelization Parallelization on the low matrix-vector level Parallelization on the level of simulators PARA'98

Parallelization Strategy Domain Decomposition Divide and conquer Solution of the original large problem through iteratively solving many smaller subproblems -- solution method or preconditioner Flexible -- localized treatment of irregular geometries, singularities etc Very efficient numerical methods -- even on sequential computers Suitable for coarse grained parallelization PARA'98

Overlapping Domain Decomposition Alternating Schwarz method for two subdomains Example: solving an elliptic boundary value problem in A sequence of approximations where PARA'98

Additive Schwarz Method Numerical Foundation Additive Schwarz Method Subproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundaries. Subproblems can be solved in parallel. PARA'98

Convergence of the Solution Example: Solving the Poisson problem on the unit square PARA'98

Coarse Grid Correction Numerical Foundation Coarse Grid Correction Important for good DD convergence Run on each processor, shared with subdomain simulators on the same processor PARA'98

Some Observations Parallel Computing Domain Decomposition efficiency relies on the parallelization Domain Decomposition suits well for parallel computing a good parallelization strategy Object-Oriented Programming Technique flexible and efficient sequential simulators can be used in subdomain solves -- main ingredient of DD PARA'98

A simulator-parallel model New Programming Model A simulator-parallel model Each processor hosts an arbitrary number of subdomains balance between numerical efficiency and load balancing One subdomain is assigned a sequential simulator Flexibility -- different types of grids, linear system solvers, preconditioners, convergence monitors etc. are allowed for different subproblems Domain decomposition on the level of subdomain simulators! PARA'98

Simulator-Parallel Reuse of existing sequential simulators Data distribution is implied No need for global data Needs additional functionalities for exchanging nodal values inside the overlapping region Needs some global administration PARA'98

A Generic Programming Framework An add-on library (SPMD model) Use of object-oriented programming technique Flexibility and portability Simplified parallelization process for end-user PARA'98

The Administrator Parameter Interface solution method or preconditioner, max iterations, stopping criterion etc DD algorithm Interface access to predifined numerical algorithm e.g. CG Operation Interface (standard codes & UDC) access to subdomain simulators, matrix-vector product, inner product etc PARA'98

The Subdomain Simulator Subdomain Simulator -- a generic representation C++ class hierarchy Interface of generic member functions PARA'98

Adaptation of Sequential Simulator Class SubdomainSimulator - generic representation of a sequential simulator. Class SubdomainFEMSolver - generic representation of a sequential simulator using FEM. A new sequential wave simulator that fits in the framework is readily extended from the existing sequential simulator, also being a subclass of SubdomainFEMSolver. SubdomainSimulator PARA'98 SubdomainFEMSolver WaveSimulator NewWSimulator

Algorithmic efficiency Performance Algorithmic efficiency efficiency of original sequential simulator(s) efficiency of domain decomposition method Parallel efficiency communication overhead (low) coarse grid correction overhead (normally low) synchronization overhead load balancing subproblem size work on subdomain solves PARA'98

Parallel Simulation of Waves

Parallel Efficiency Fixed number of subdomains M=16. Subdomain grids from partition of a global 41x41x41 grid. Simulation over 32 time steps. DD as preconditioner of CG for the Laplace eq. Multigrid V-cycle as subdomain solver. PARA'98

Overall Efficiency Number of subdomains equal to number of processors PARA'98 *For P=2 parallel BiCGStab is used.

Summary Efficient solution of elliptic boundary value problems Parallelization based on DD Introduction of a simulator-parallel model A generic framework for implementation http:www.nobjects.com PARA'98