GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science,

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

Region-Scale Evacuation Modeling using GPUs Towards Highly Interactive, GPU-based Evaluation of Evacuation Transport Scenarios at State-Scale Kalyan S.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan GCafe December 10th, 2003.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Why GPU Computing. GPU CPU Add GPUs: Accelerate Science Applications © NVIDIA 2013.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
OpenFOAM on a GPU-based Heterogeneous Cluster
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
A many-core GPU architecture.. Price, performance, and evolution.
Parallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs Lifan Xu, Michela Taufer, Stuart Collins, Dionisios G. Vlachos Global Computing.
Distributed Interactive Ray Tracing for Large Volume Visualization Dave DeMarle Steven Parker Mark Hartner Christiaan Gribble Charles Hansen.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
Real-World GPGPU Mark Harris NVIDIA Developer Technology.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
Slide 1 / 16 On Using Graphics Hardware for Scientific Computing ________________________________________________ Stan Tomov June 23, 2006.
QCD Project Overview Ying Zhang September 26, 2005.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Cyberinfrastructure for Distributed Rapid Response to National Emergencies Henry Neeman, Director Horst Severini, Associate Director OU Supercomputing.
Brookhaven Science Associates U.S. Department of Energy 1 Data Analysis and Visualization Numerical Simulations Using Programmable GPUs Stan Tomov September.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
Feng Qiu, Ye Zhao, Zhe Fan, Xiaomin Wei, Haik Lorenz, Jianning Wang, Suzanne Yoakum-Stover, Arie Kaufman, Klaus Mueller Center for Visual Computing and.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
1 Evaluation of parallel particle swarm optimization algorithms within the CUDA™ architecture Luca Mussi, Fabio Daolio, Stefano Cagnoni, Information Sciences,
NVIDIA Tesla GPU Zhuting Xue EE126. GPU Graphics Processing Unit The "brain" of graphics, which determines the quality of performance of the graphics.
Simulated vascular reconstruction in a virtual operating theatre Robert G. Belleman, Peter M.A. Sloot Section Computational Science, University of Amsterdam,
GPU-Accelerated Surface Denoising and Morphing with LBM Scheme Ye Zhao Kent State University, Ohio.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Computer Systems Lab The University of Wisconsin - Madison Department of Computer Sciences Linux Clusters David Thompson
CSE 690: GPGPU Lecture 7: Matrix Multiplications Klaus Mueller Computer Science, Stony Brook University.
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.
Parallelization of 2D Lid-Driven Cavity Flow
1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:
Introduction: Lattice Boltzmann Method for Non-fluid Applications Ye Zhao.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Xingfu Wu Xingfu Wu, Valerie Taylor Department of Computer Science Texas A&M University Third Annual Workshop on OptIPuter.
CCR = Connectivity Residue Ratio = Pr. [ node pair connected by an edge are together in a common page on computer disk drive.] “U of M Scientists were.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
CSE 690: GPGPU Lecture 8: Image Processing PDE Solvers Klaus Mueller Computer Science, Stony Brook University.
A Scalable Distributed Datastore for BioImaging R. Cai, J. Curnutt, E. Gomez, G. Kaymaz, T. Kleffel, K. Schubert, J. Tafas {jcurnutt, egomez, keith,
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Interactive Terascale Particle Visualization Ellsworth, Green, Moran (NASA Ames Research Center)
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
Wei Hong, Feng Qiu, Arie Kaufman Center for Visual Computing and Department of Computer Science, Stony Brook University
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,
HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
System G And CHECS Cal Ribbens
Physics-based simulation for visual computing applications
Department of Computer Science University of York
Monte Carlo Simulations Using Programmable Graphics Cards
Presentation transcript:

GPU Cluster for Scientific Computing Zhe Fan, Feng Qiu, Arie Kaufman, Suzanne Yoakum-Stover Center for Visual Computing and Department of Computer Science, Stony Brook University Stony Brook Visual Computing Cluster GPU Cluster GPU Cluster 35 nodes with nVIDIA GeForce FX 5800 Ultra 35 nodes with nVIDIA GeForce FX 5800 Ultra Gigabit Ethernet Gigabit Ethernet 70 Pentium Xeon 2.4GHz CPUs 70 Pentium Xeon 2.4GHz CPUs 35 VolumePro VolumePro HP Sepia-2A with ServerNet II 9 HP Sepia-2A with ServerNet II Stony Brook Visual Computing Cluster GPU Cluster GPU Cluster 35 nodes with nVIDIA GeForce FX 5800 Ultra 35 nodes with nVIDIA GeForce FX 5800 Ultra Gigabit Ethernet Gigabit Ethernet 70 Pentium Xeon 2.4GHz CPUs 70 Pentium Xeon 2.4GHz CPUs 35 VolumePro VolumePro HP Sepia-2A with ServerNet II 9 HP Sepia-2A with ServerNet II LBM on the GPU Application: large-scale CFD simulations using Lattice Boltzmann Model (LBM) Boltzmann Model (LBM) LBM Computation: Particles stream along lattice links Particles stream along lattice links Particles collide when they meet at a site Particles collide when they meet at a site Map to GPU: Pack 3D lattice states into a series of 2D textures Pack 3D lattice states into a series of 2D textures Update the lattice with fragment programs Update the lattice with fragment programs LBM on the GPU Application: large-scale CFD simulations using Lattice Boltzmann Model (LBM) Boltzmann Model (LBM) LBM Computation: Particles stream along lattice links Particles stream along lattice links Particles collide when they meet at a site Particles collide when they meet at a site Map to GPU: Pack 3D lattice states into a series of 2D textures Pack 3D lattice states into a series of 2D textures Update the lattice with fragment programs Update the lattice with fragment programs Scale up LBM to the GPU Cluster Each GPU computes a sub-lattice Each GPU computes a sub-lattice Particles stream out of the sub-lattice Particles stream out of the sub-lattice 1.Gather particle distributions in a texture 2.Read out from GPU in a single operation 3.Transfer through GigaE (MPI) 4.Write into neighboring GPU nodes Network performance optimization: Network performance optimization: 1.Conduct network transfer while computing 2.Schedule to reduce the likelihood of interruption 3.Simplify the connection pattern Scale up LBM to the GPU Cluster Each GPU computes a sub-lattice Each GPU computes a sub-lattice Particles stream out of the sub-lattice Particles stream out of the sub-lattice 1.Gather particle distributions in a texture 2.Read out from GPU in a single operation 3.Transfer through GigaE (MPI) 4.Write into neighboring GPU nodes Network performance optimization: Network performance optimization: 1.Conduct network transfer while computing 2.Schedule to reduce the likelihood of interruption 3.Simplify the connection pattern Times Square Area of NYC Flow Streamlines 0.31 second / step on 30 GPUs 0.31 second / step on 30 GPUs 4.6 times faster than software version on 30 CPUs 4.6 times faster than software version on 30 CPUs Times Square Area of NYC Flow Streamlines 0.31 second / step on 30 GPUs 0.31 second / step on 30 GPUs 4.6 times faster than software version on 30 CPUs 4.6 times faster than software version on 30 CPUs Acknowledgements NSF CCR NSF CCR Department of Homeland Security, Environment Measurement Lab Department of Homeland Security, Environment Measurement Lab HP HP Terarecon TerareconAcknowledgements NSF CCR NSF CCR Department of Homeland Security, Environment Measurement Lab Department of Homeland Security, Environment Measurement Lab HP HP Terarecon Terarecon GPU Cluster / CPU Cluster Speedup Each node computes an 80 x 80 x 80 sub-latticeEach node computes an 80 x 80 x 80 sub-lattice GeForce FX 5800 Ultra / Pentium Xeon 2.4GHzGeForce FX 5800 Ultra / Pentium Xeon 2.4GHz GPU Cluster / CPU Cluster Speedup Each node computes an 80 x 80 x 80 sub-latticeEach node computes an 80 x 80 x 80 sub-lattice GeForce FX 5800 Ultra / Pentium Xeon 2.4GHzGeForce FX 5800 Ultra / Pentium Xeon 2.4GHz Dispersion Plume 1.66 km x 1.13 km1.66 km x 1.13 km 91 blocks91 blocks 851 buildings851 buildings 480 x 400 x 80 lattice 480 x 400 x 80 lattice and Large-Scale Simulation