- Department of Engineering Science and Ocean Engineering, National Taiwan University - Department of Mathematics, National Taiwan University - Center.

Slides:

Advertisements

Similar presentations

THE FINITE ELEMENT METHOD

Advertisements

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.

Christopher McCabe, Derek Causon and Clive Mingham Centre for Mathematical Modelling & Flow Analysis Manchester Metropolitan University MANCHESTER M1 5GD.

Computational Modeling for Engineering MECN 6040

Chapter 8 Elliptic Equation.

Structure-Preserving B-spline Methods for the Incompressible Navier-Stokes Equations John Andrew Evans Institute for Computational Engineering and Sciences,

Point-wise Discretization Errors in Boundary Element Method for Elasticity Problem Bart F. Zalewski Case Western Reserve University Robert L. Mullen Case.

Solving Linear Systems (Numerical Recipes, Chap 2)

OpenFOAM on a GPU-based Heterogeneous Cluster

Modern iterative methods For basic iterative methods, converge linearly Modern iterative methods, converge faster –Krylov subspace method Steepest descent.

Coupled Fluid-Structural Solver CFD incompressible flow solver has been coupled with a FEA code to analyze dynamic fluid-structure coupling phenomena CFD.

12/21/2001Numerical methods in continuum mechanics1 Continuum Mechanics On the scale of the object to be studied the density and other fluid properties.

DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.

MCE 561 Computational Methods in Solid Mechanics

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.

1 Parallel Simulations of Underground Flow in Porous and Fractured Media H. Mustapha 1,2, A. Beaudoin 1, J. Erhel 1 and J.R. De Dreuzy IRISA – INRIA.

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

ME964 High Performance Computing for Engineering Applications “The real problem is not whether machines think but whether men do.” B. F. Skinner © Dan.

Slide 1/8 Performance Debugging for Highly Parallel Accelerator Architectures Saurabh Bagchi ECE & CS, Purdue University Joint work with: Tsungtai Yeh,

© Fluent Inc. 9/5/2015L1 Fluids Review TRN Solution Methods.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.

Two Phase Flow using two levels of preconditioning on the GPU Prof. Kees Vuik and Rohit Gupta Delft Institute of Applied Mathematics.

Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.

CFD Lab - Department of Engineering - University of Liverpool Ken Badcock & Mark Woodgate Department of Engineering University of Liverpool Liverpool L69.

1 Variational and Weighted Residual Methods. 2 The Weighted Residual Method The governing equation for 1-D heat conduction A solution to this equation.

The Finite Element Method A Practical Course

Fast Support Vector Machine Training and Classification on Graphics Processors Bryan Catanzaro Narayanan Sundaram Kurt Keutzer Parallel Computing Laboratory,

GPU Architecture and Programming

Module 4 Multi-Dimensional Steady State Heat Conduction.

1 Complex Images k’k’ k”k” k0k0 -k0-k0 branch cut   k 0 pole C1C1 C0C0 from the Sommerfeld identity, the complex exponentials must be a function.

High Performance Computational Fluid-Thermal Sciences & Engineering Lab GenIDLEST Co-Design Virginia Tech AFOSR-BRI Workshop July 20-21, 2014 Keyur Joshi,

© Fluent Inc. 11/24/2015J1 Fluids Review TRN Overview of CFD Solution Methodologies.

QCAdesigner – CUDA HPPS project

Parallel Solution of the Poisson Problem Using MPI

HEAT TRANSFER FINITE ELEMENT FORMULATION

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Case Study in Computational Science & Engineering - Lecture 5 1 Iterative Solution of Linear Systems Jacobi Method while not converged do { }

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.

Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.

Partial Derivatives Example: Find If solution: Partial Derivatives Example: Find If solution: gradient grad(u) = gradient.

Smoothed Particle Hydrodynamics Matthew Zhu CSCI 5551 — Fall 2015.

MULTIDIMENSIONAL HEAT TRANSFER  This equation governs the Cartesian, temperature distribution for a three-dimensional unsteady, heat transfer problem.

Programming Massively Parallel Graphics Multiprocessors using CUDA Final Project Amirhassan Asgari Kamiabad

F. Fairag, H Tawfiq and M. Al-Shahrani Department of Math & Stat Department of Mathematics and Statistics, KFUPM. Nov 6, 2013 Preconditioning Technique.

Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation.

An Introduction to Computational Fluids Dynamics Prapared by: Chudasama Gulambhai H ( ) Azhar Damani ( ) Dave Aman ( )

Large-scale geophysical electromagnetic imaging and modeling on graphical processing units Michael Commer (LBNL) Filipe R. N. C. Maia (LBNL-NERSC) Gregory.

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Model Anything. Quantity Conserved c  advect  diffuse S ConservationConstitutiveGoverning Mass, M  q -- M Momentum fluid, Mv -- F Momentum fluid.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.

Amit Amritkar & Danesh Tafti Eric de Sturler & Kasia Swirydowicz

Convection-Dominated Problems

Parallel Plasma Equilibrium Reconstruction Using GPU

Boundary Element Analysis of Systems Using Interval Methods

© Fluent Inc. 1/10/2018L1 Fluids Review TRN Solution Methods.

Convergence in Computational Science

Analytical Tools in ME Course Objectives

GENERAL VIEW OF KRATOS MULTIPHYSICS

Supported by the National Science Foundation.

Objective Numerical methods Finite volume.

Investigators Tony Johnson, T. V. Hromadka II and Steve Horton

Ph.D. Thesis Numerical Solution of PDEs and Their Object-oriented Parallel Implementations Xing Cai October 26, 1998.

Presentation transcript:

- Department of Engineering Science and Ocean Engineering, National Taiwan University - Department of Mathematics, National Taiwan University - Center for Advanced Studies on Theoretical Sciences (CASTS), National Taiwan University Tony Wen-Hann Sheu CASTS-LJLL workshop on Applied Mathematics and Mathematical Sciences, Taipei, May 26-29, 2014 Acknowledgement : Neo Shih-Chao Kao, Wei-Chung Wang

I.Computational challenges in solving 3D steady incompressible Navier-Stokes equations by finite element method II.In-house developed finite element code III.Code implementation in Kepler K20 GPU IV.Verification and numerical results V.Concluding remarks

nonlinear convective termPressure gradient Viscous term Source term Divergence-free constrain condition Elliptic Incompressible Navier-Stokes equations The Reynolds number comes out as the result of normalization  The mixed FEM formulation is adopted to get the solution simultaneously George Gabriel Stokes, ( ) Claude-Louis Navier ( ) Three-dimensional steady incompressible Navier-Stokes equations in primitive variables are as follows : on (1) (2) (3)

C1 – Rigorous implementation of boundary condition and assurance of divergence-free constrain for the elliptic differential system of equations C2 - Accurate approximation of convection term in fixed grid to avoid oscillatory velocity field C3- Proper storage of primitive variables and to avoid even-odd pressure solution C4- Effective calculation of from the unsymmetric and indefinite matrix equation using modern iterative solver C5- Parallel implementation of finite element program

O1- Adopt mixed formulation of equations cast in primitive variables O2- Derive a streamline upwinding Petrov-Galerkin formulation featuring the numerical wavenumber error reducing property for the convection term O3- Formulate finite element equations in weak sense in trilinear pressure /triquadratic element to satisfy the LBB compatibility condition O4- Apply preconditioner to the normalized symmetric and positive definite matrix equations to ensure fast unconditional convergence for high Reynolds number simulation O5- Implement CUDA programming finite element code in hybrid CPU- GPU (Graphic Process Unit ) platform to largely reduce the simulation time

(Numerical Heat Transfer ; in press )

The proposed Petrov-Galerkin finite element model is as follows where and are the weighting functions for the momentum and continuity equations, respectively

The weighted residual weak formulation for the convection term in one-dimensional Substituting the quadratic interpolation function into the above weak statement to derive the discrete equations at the center and corner nodes, respectively - Galerkin model : - Wavenumber optimized model : biased part

The discretization equation for the convection term in a quadratic element is expressed as follows : To minimize the numerical wavenumber error, the Fourier transform and its inverse are applied to derive the numerical wavenumber as

 The actual wavenumber can be regarded as the right hand side of the numerical wavenumber  We define the error function to minimize the discrepancy between and  The limiting condition is enforced to get the expression of as

 In multi-dimensional problem, the method of operator splitting is adopted to get the stabilization term along the streamline direction, where

For the smallest problem (2 3 elements), the indefinite matrix equations takes the following profile - unsymmetric - diagonal void - ill-conditioned

 The linear system derived from the FEM is usually written in the form  90% of computation time of FEM is used to solve (*) by GMRES or BICGSTAB  It is a grand challenge to solve (*) efficiently using iterative solution solver for three-dimensional large size problems => our proposed iterative solver given below Global matrix Solution vector Right hand side where (*)

 To avoid Lanczos or pivoting breakdown, the linear system has been normalized, leading to the following positive definite and symmetric matrix equation  Since the normalized linear system becomes symmetric and positive definite, the unconditionally congergent Conjugate Gradient (CG) iterative solver is applied  Preconditioning of matrix is adopted to reduce the corresponding increase of condition number  Two techniques will be adopted to accelerate matrix calculation ◦ The mesh coloring technique (To avoid the race-condition) ◦ The EBE technique (To avoid global matrix assembling) symmetric, positive definite symmetric and positive definite

Mesh coloring technique

 In the “Update step” of PCG algorithm, provided that any two threads update the same vector component simultaneously, it will result in the so-called race-condition  To avoid the race-condition, we divide the all elements into a finite number of subsets such that any two elements in a subset are not allowed to share the same node (If two men ride on a horse, one must ride behind) “ 一山不容二虎 ” * Different colors will be proceeded in sequence * Elements with the same color are proceeded simultaneously and in parallel

In FEM, the global matrix can be formulated as : Underlying this concept, we have => Time-consuming the matrix-vector product can be therefore implemented in an element-wise level

 Computation is carried out on GPUs (Graphics Processing Unit) to get a good performance in scientific and engineering computing Intel i7 4820K Processor ~ 3.7GHz NVIDIA Tesla K series Kepler K20 Processor clock rate = 705MHz Bandwidth = 208 GB/s Memory = 5 GB (DDR5) * Parallelism - CPU has 8 cores - GPU has more than 1000 cores * Memory - CPU can reach 59 GB/s bandwidth - GPU can reach 208 GB/s bandwidth * Performance - CPU can reach 59.2 GB (double) - GPU can reach 3.52 TB/s (single) 1.17 TB/s (double)

CUDA  ( C ompute U nified D evice A rchitecture)  CUDA is a general purpose parallel computational model released by NVIDIA in 2007  CUDA only runs on NVIDIA GPUs  Heterogeneous serial (CPU)-parallel (GPU) computing  Scalable programming model  CUDA permits Fortran (PGI) and C/C++ (NVIDIA NVCC) programming compliers

 CPU (host) plays the role of “master” and GPUs (device) play the role of “workers”  CPU maps computational tasks onto GPUs  CPU can carry out computations at the same time when code is run on GPUs  Basic flow chart ◦ CPU transfers the data to the GPU ◦ Parallel code (kernel) are run on the GPU ◦ GPU transfers the computational results back to CPU

1. Block is the smallest execution unit in GPU 2. All threads in a block are executed simultaneously Note Current CPU-GPU hardware architecture

1.Global memory (5GB DDR5) (D ouble D ata R ate ) 2.Shared memory (48KB/SMX) 3.Cache memory (1536KB L 2 ) 4.Texture memory (48KB)

 To implement the EBE on GPU, the new formulation will be adopted. Given an e-th element matrix, RHS and its product are as follows  One needs to compute the product, The resulting matrix-vector product results in a race-condition

Race-condition algorithm Non-race-condition algorithm Elements with the same color will be executed in parallel in a non-race-condition algorithm simultaneously (Traditional) (New operation) M S S S M : Multiplication operation S : Sum operation M M S M

 The analytic solutions which satisfy the steady-state 3D Navier-Stokes equations are given as follows : The corresponding exact pressure is given below  Problem setting ◦ Re=1,000 ◦ Non uniform grid size : 11 3,21 3,41 3,61 3 ◦ Matrix Solver : PCG solver (iterative)

 The L 2 error norms are listed as follows :  R.O.C. (Rate Of Convergence )  The predicted solutions show good agreement with the exact solutions L2U L2V L2W L2P R.O.C × × × × × × × × × × × × × × × ×

 Problem domain :  Boundary condition ◦ u=1,v=w=0 at upper plane ◦ Non-slip at other planes  Non-uniform grid number : 51 3  Re=100,400,1000,2000  Jacobi preconditioner is adopted  Newton-linearization is adopted Schematic of the lid-driven cavity flow problem - D. C. Lo, K. Murugesan, D. L. Young, Numerical solution of three-dimensional velocity Navier-Stokes equations by finite difference method, International Journal for numerical methods in Fluids,47, , C. Shu, L. Wang and Y. T. Chew Numerical computation of three-dimensional incompressible Navier–Stokes equations in primitive variable form by DQ method, International Journal for Numerical methods in Fluids, 43 :345–368, 2005.

Re=100Re=400 Re=2000Re=1000 Compare the velocity profiles u(0.5,y,0.5 ) w(x,0.5,0.5)

 Computations are implemented on the following two different platforms  The lid-driven cavity problem at different Reynolds and grid numbers is used to access the performance of the current hybrid CPU/GPU calculation Platform #1 specification Platform #2 specification Single CPU platform CPU : Intel i7-4820K GPU : Non Single CPU platform CPU : Intel i7-4820K GPU : Non Hybrid CPU/GPU platform CPU : Intel i7-4820K GPU : NVIDIA Kepler K20 Hybrid CPU/GPU platform CPU : Intel i7-4820K GPU : NVIDIA Kepler K20

Platform Re=100 Re=400 Re=600 Re=800 Platform # 1 (a) Platform # 2 (b) Speedup (b)/(a)  Grid number = 31 3  Grid number = 41 3 Re= Platform Re=100 Re=400 Re=600 Re=800 Platform # 1 (a) Platform # 2 (b) Speedup (b)/(a) Re=

Platform Re=100 Re=400 Re=600 Re=800 Platform # 1 (a) Platform # 2 (b) Speedup (b)/(a)  Grid number = 51 3  Grid number = 61 3 Re= Platform Re=100 Re=400 Re=600 Re=800 Platform # 1 (a) Platform # 2 (b) Speedup (b)/(a) Re=

 A new streamline upwind FEM model accommodating the optimized numerical wavenumber along the streamline has been developed  The unconditionally convergent finite element solution can be iteratively obtained from the normalized symmetric and positive definite matrix equation using the PCG solver  Novel non-race-condition and EBE techniques have been successfully implemented on GPU architecture  Computational time has been reduced more that ten times in a hybrid CPU/GPU platform

Thank for your attention ! Q/A ?