Mesh-free numerical methods for large-scale engineering problems.

Slides:



Advertisements
Similar presentations
DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM DEVELOPMENT OF ONLINE EVENT SELECTION IN CBM I. Kisel (for CBM Collaboration) I. Kisel (for CBM Collaboration)
Advertisements

CHAPTER 1: COMPUTATIONAL MODELLING
Spectral Analysis of Function Composition and Its Implications for Sampling in Direct Volume Visualization Steven Bergner GrUVi-Lab/SFU Torsten Möller.
Comparative Visualization for Wave-based and Geometric Acoustics Eduard Deines 1, Martin Bertram 3, Jan Mohring 4, Jevgenij Jegorovs 4, Frank Michel 1,
Steady-state heat conduction on triangulated planar domain May, 2002
Data Visualization Lecture 4 Two Dimensional Scalar Visualization
16.1 Si23_03 SI23 Introduction to Computer Graphics Lecture 16 – Some Special Rendering Effects.
1 A parallel software for a saltwater intrusion problem E. Canot IRISA/CNRS J. Erhel IRISA/INRIA Rennes C. de Dieuleveult IRISA/INRIA Rennes.
The first-generation Cell Broadband Engine (BE) processor is a multi-core chip comprised of a 64-bit Power Architecture processor core and eight synergistic.
Yi Heng Second Order Differentiation Bommerholz – Summer School 2006.
Fourier’s Law and the Heat Equation
1 Application of for Predicting Indoor Airflow and Thermal Comfort.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Virginia Birch MFNERC Numeracy Specialist
A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.
1 Modal methods for 3D heterogeneous neutronics core calculations using the mixed dual solver MINOS. Application to complex geometries and parallel processing.
Group Meeting Presented by Wyman 10/14/2006
Raspberry Pi Performance Benchmarking
Scalable Learning in Computer Vision
9. Two Functions of Two Random Variables
GPU Programming using BU Shared Computing Cluster
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2013 – 12269: Continuous Solution for Boundary Value Problems.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
3D Graphics Rendering and Terrain Modeling
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Steady Aeroelastic Computations to Predict the Flying Shape of Sails Sriram Antony Jameson Dept. of Aeronautics and Astronautics Stanford University First.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
© 2011 Autodesk Freely licensed for use by educational institutions. Reuse and changes require a note indicating that content has been modified from the.
1 CFD Analysis Process. 2 1.Formulate the Flow Problem 2.Model the Geometry 3.Model the Flow (Computational) Domain 4.Generate the Grid 5.Specify the.
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Performance and Energy Efficiency of GPUs and FPGAs
CompuCell Software Current capabilities and Research Plan Rajiv Chaturvedi Jesús A. Izaguirre With Patrick M. Virtue.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
Computer Graphics Graphics Hardware
Fast Thermal Analysis on GPU for 3D-ICs with Integrated Microchannel Cooling Zhuo Fen and Peng Li Department of Electrical and Computer Engineering, {Michigan.
STE 6239 Simulering Friday, Week 1: 5. Scientific computing: basic solvers.
ParCFD Parallel computation of pollutant dispersion in industrial sites Julien Montagnier Marc Buffat David Guibert.
Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
C GasparAdvances in Numerical Algorithms, Graz, Fast interpolation techniques and meshless methods Csaba Gáspár Széchenyi István University, Department.
CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.
Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?
Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.
Reconstruction of Solid Models from Oriented Point Sets Misha Kazhdan Johns Hopkins University.
Parallel Solution of the Poisson Problem Using MPI
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
HEAT TRANSFER FINITE ELEMENT FORMULATION
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
Outline Introduction Research Project Findings / Results
Discretization Methods Chapter 2. Training Manual May 15, 2001 Inventory # Discretization Methods Topics Equations and The Goal Brief overview.
1 IV European Conference of Computational Mechanics Hrvoje Gotovac, Veljko Srzić, Tonći Radelja, Vedrana Kozulić Hrvoje Gotovac, Veljko Srzić, Tonći Radelja,
Linear Algebra Operators for GPU Implementation of Numerical Algorithms J. Krüger R. Westermann computer graphics & visualization Technical University.
Space Charge with PyHEADTAIL and PyPIC on the GPU Stefan Hegglin and Adrian Oeftiger Space Charge Working Group meeting –
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Martin Kruliš by Martin Kruliš (v1.0)1.
Defining the Competencies for Leadership- Class Computing Education and Training Steven I. Gordon and Judith D. Gardiner August 3, 2010.
Computer Graphics Graphics Hardware
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Graphics Processing Unit
3D Graphics Rendering PPT By Ricardo Veguilla.
Computer Graphics Graphics Hardware
Presentation transcript:

Mesh-free numerical methods for large-scale engineering problems. Basics of GPU-based computing. D.Stevens, M.Lees, P. Orsini Supervisors: Prof. Henry Power, Herve Morvan January 2008

Outline: Meshless numerical methods using Radial Basis Functions (RBFs) Basic RBF interpolation Brief overview of the work done in our research group, under the direction of Prof. Henry Power Example results Large scale future problems GPU computing Introduction to the principle of using graphics processors (GPUs), as floating point co-processors Current state of GPU hardware and software Basic implementation strategy for numerical simulations

GABARDINE - EU project GABARDINE: Groundwater Artificial recharge Based on Alternative sources of wateR: aDvanced INtegrated technologies and managEment Aims to investigate the viability of artificial recharge in groundwater aquifers, and produce decision support mechanisms Partners include: Portugal, Germany, Greece, Spain, Belgium, Israel, Palestine University of Nottingham is developing numerical methods to handle: Phreatic aquifers (with associated moving surfaces) Transport processes Unsaturated zone problems (with nonlinear Governing equations)

RBF Interpolation Methods N unknowns… Apply the above at each of the N test locations:

Kansa’s Method (Diffusion operator) Collocating:

Hermitian Method Operators are also applied to basis functions: SYMMETRIC SYSTEM

RBF – Features and Drawbacks Partial derivatives are obtained cheaply and accurately by differentiating the (known) basis functions Leads to a highly flexible formulation allowing boundary operators to be implemented exactly and directly Once solution weights are obtained, a continuous solution can be reconstructed over the entire solution domain A densely populated linear system must be solved to obtain the solution weights Leads to high computational cost – O(N2), and numerical conditioning issues with large systems, setting a practical limit of ~1000 points

Formulation: LHI method Initial value problem: Hermitian Interpolation using solution values, and boundary operator(s) if present

LHI method formulation cont… Form N systems, based on a local support: H ~ 10 x 10 matrix Hence, can reconstruct the solution in the vicinity of local system k via: where:

CV-RBF (Modified CV Scheme) Classic CV approach CV-RBF approach Apply internal operator Apply Dirichlet operator Apply Boundary operator Polynomial interpolation to compute the flux RBF interpolation to compute the flux

Simulation Workflow Our Code Dataset Generation Post Processing GridGen RBF specific TecPlot Pre-Processing RBF Triangle CAD CVRBF Meshless Meshless

Convection-Diffusion: Validation Both methods have been validated against a series of 1D and 3D linear and nonlinear advection-diffusion reaction problems, eg:

CV-RBF: Infiltration well + Pumping Well diameter: Pumping location: 25m from the infiltration well (height y=15m) Infiltration-Pumping rate: Soil properties:

CV-RBF: Infiltration well + Pumping 3D model: Mesh (60000 cells) and BC Boundary conditions Everywhere else

CV-RBF: Infiltration well + Pumping Piezometric head contour and streamlines at t=30h plane at z=25m plane at y=29m Length scale: 100m Maximum Displacements:

LHI: Infiltration model - Setup Solving Richards’ equation: Infiltrate over 10m x 10m region at ground surface Infiltration pressure = 2m Zero-flux boundary at base (‘solid wall’) Fixed pressure distribution at sides Initial pressure:

LHI: Infiltration model – Soil properties Saturated conductivity: Storativity: Using Van-Genuchten soil representation:

LHI: Infiltration model - Results 11,585 points arranged in 17 layers ‘Short’ runs: solution to 48 hours ‘Long’ run: solution to 160 hours

Richards’ equation - First example Using the steady-state example given in Tracy (2006)* for the solution of Richards’ equation, with: On a domain: With: Top face All other faces * F.T.Tracy, Clean two- and three-dimensional analytical solutions of Richards’ equation for testing numerical solvers

Richards’ equation - First example With: (11 x 11 x 11) and (21 x 21 x 21) uniformly spaced points α = 0.164 N = 11 N = 22 α = 0.328 N =11 N = 22

Richards’ equation - First example (error analysis) Finite Volume – max error Improvement factor: Alpha = 0.164 11*11*11 1.72E-02 3.16E-02 21*21*21 3.52E-03 7.09E-03 6.62e-2 L2 error norm Max error 9.34 Alpha = 0.328 11*11*11 1.61E-02 3.20E-02 21*21*21 4.24E-03 8.57E-03 1.11 129.5 Alpha = 0.492 11*11*11 1.77E-02 2.43E-02 21*21*21 5.16E-03 7.26E-03 5.13 706.6 Good improvement over finite volume results from Tracy paper, particularly with rapidly decaying K and θ functions Reasonable convergence rate, with increasing point density

Future work Future work will focus on large-scale problems, including: Regional scale models of real-world experiments in Portugal and Greece Country-scale models of aquifer pumping and seawater intrusion across Israel and Gaza The large problem size will require a parallel implementation for efficient solution – hence our interest in HPC and GPUs Practical implementation will require the parallelisation of large, sparsely-populated, iterative matrix solvers To our knowledge, we are the only group working on large-scale hydrology problems using meshless numerical techniques

GPU Computing: GPU: Graphics Processing Unit Originally designed to accelerate floating-point heavy calculations in computer games eg. Pixel shader effects (Lighting effects, reflection/refraction, other effects) Geometry setup (character meshes etc) Solid-body physics (…not yet widely adapted) Massively parallel architecture – currently up to 128 floating point processing units Recent hardware (from Nov 2006) and software (Feb 2007) advances have allowed programmable processing units (rather than units specialised for pixel or vertex processing) Has led to "General Purpose GPUs" - ‘GPGPUs’

GPU Computing: GPUs are extremely efficient at handling add-multiply instructions in small ‘packets’ (usually the main computational cost in numerical simulations) FP capacity outstrips CPUs, in both theoretical capacity and efficiency (if properly implemented)

GPU Computing: Modern GPUs effectively work like a shared-memory cluster: GPUs have an extremely fast (~1000Mhz vs ~400Mhz), dedicated onboard memory Onboard memory sizes currently range up to 1.5Gb (in addition to system memory)

CUDA - BLAS and FFT libraries Available Examples/Demos: Parallel bitonic sort Matrix multiplication Matrix transpose Performance profiling using timers Parallel prefix sum (scan) of large arrays Image convolution 1D DWT using Haar wavelet graphics interoperation examples CUDA BLAS and FFT examples CPU-GPU C and C++ code integration Binomial Option Pricing Black-Scholes Option Pricing Monte-Carlo Option Pricing Parallel Mersenne Twister Parallel Histogram Image Denoising Sobel Edge Detection Filter The CUDA toolkit is a C language development environment for CUDA-enabled GPUs. Two libraries implemented on top of CUDA: Basic Linear Algebra System (BLAS) Fast Fourier Transform (FFT) Pre-parallelised routines.

TESLA - GPUs for HPC Deskside (2 GPUs 3GB) and Rackmount (4 GPUs 6GB) options With ~500Gflops per GPU, that is ~2 Teraflops per rack Deskside is listed at around $4200

GPU Computing – Some results GPU specs: GPU model Processors GPU Clock speed Memory size Memory bus COST 8600GTS 32 675Mhz 512Mb 256bit ~£75 8800GT 112 600Mhz ~£150 8800GTX 128 575Mhz 768Mb 320bit ~£250 Use CUDA – nVidia’s C-based API for GPUs Example: Multiplication of densely populated matrices O(N3) algorithm… Matrices are broken down into vector portions, and sent to the GPU stream processors

Various CPUs vs GPU Note: Performance of dual-core and quad-core CPUs is approximated from an idealised parallelisation (ie. 100% efficiency)

More GPU propaganda: Good anecdotal evidence for improvement in real-world simulations is available from those who have switched to GPU computing: Dr. John Stone, Beckmann Institute of Advanced Technology, NAMD virus simulations: 110 CPU hours on SGI Itanium supercomputer => 47minutes with a single GPU Represents a 240-fold speedup Dr. Graham Pullan, University of Cambridge, CFD with turbine blades (LES and RANS models) 40x absolute speedup switching from a CPU cluster to ‘a few’ GPUs Use 10 million cells on GPU, up from 500,000 on CPU cluster

Closing words GPU performance is advancing at a much faster rate than CPUs. This is expected to continue for some time yet With CUDA and BLAS, exploiting parallelism of the GPUs is in some cases easier than traditional MPI approaches Later this year: 2-3 times the performance of current hardware (over 1 TFLOP per card) Native 64bit capability More info: www.nvidia.com/tesla www.nvidia.com/cuda www.GPGPU.org