Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Lecture 6: Multicore Systems
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.
Last Lecture The Future of Parallel Programming and Getting to Exascale 1.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
A 4-year $2.6 million grant from the National Institute of Biomedical Imaging and Bioengineering (NIBIB), to perform “real-time” CT imaging dose calculations.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
A many-core GPU architecture.. Price, performance, and evolution.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
The PTX GPU Assembly Simulator and Interpreter N.M. Stiffler Zheming Jin Ibrahim Savran.
Lecture 1: Introduction to High Performance Computing.
Contemporary Languages in Parallel Computing Raymond Hummel.
Panda: MapReduce Framework on GPU’s and CPU’s
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
COMPUTER ARCHITECTURE (for Erasmus students)
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Computer System Architectures Computer System Software
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
David Luebke NVIDIA Research GPU Computing: The Democratization of Parallel Computing.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
1 Chapter 04 Authors: John Hennessy & David Patterson.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.
GPU Computing April GPU Outpacing CPU in Raw Processing GPU NVIDIA GTX cores 1.04 TFLOPS CPU GPU CUDA Architecture Introduced DP HW Introduced.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
GPU Programming Shirley Moore CPS 5401 Fall 2013
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
“Processors” issues for LQCD January 2009 André Seznec IRISA/INRIA.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
APE group Many-core platforms and HEP experiments computing XVII SuperB Workshop and Kick-off Meeting Elba, May 29-June 1,
General Purpose computing on Graphics Processing Units
GPU Architecture and Its Application
CS427 Multicore Architecture and Parallel Computing
NVIDIA’s Extreme-Scale Computing Project
Lecture 4 MapReduce Software Frameworks and CUDA GPU Architectures
Graphics Processing Unit
Multicore and GPU Programming
Presentation transcript:

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism

Overview of GPU Clusters GPUs are becoming high-performance accelerators for data-parallel computing. – Modern GPU chips contain hundreds of processor cores per chip. – Each GPU chip is capable of achieving up to 1 Tflops for single-precision (SP) arithmetic, and more than 80 Gflops for double-precision (DP) calculations. – Recent HPC-optimized GPUs contain up to 4 GB of on- board memory, and are capable of sustaining memory bandwidths exceeding 100 GB/second.

GPU clusters are built with a large number of GPU chips. Most GPU clusters are structured with homogeneous GPUs of the same hardware class, make, and model. The software used in a GPU cluster includes the OS, GPU drivers, and cluster API such as an MPI.

GPU clusters have already demonstrated their capability to achieve Pflops performance in some of the Top 500 systems. The high performance of a GPU cluster is attributed mainly to the following factors: – massively parallel multicore architecture, – high throughput in multithreaded floating-point arithmetic, – significantly reduced time in massive data movement using large on- chip cache memory. GPU clusters result in not only a quantum jump in speed performance, but also significantly reduced space, power, and cooling demands. These reductions in power, environment, and management complexity make GPU clusters very attractive for use in future HPC applications.

Case Study – Echelon GPU Cluster NVIDIA Echelon GPU cluster is the state-of- the-art design for Exascale computing. This Echelon project is led by Bill Dally at NVIDIA and is partially funded by DARPA under the Ubiquitous High-Performance Computing (UHPC) program. The Echelon GPU design shows the architecture of a future GPU accelerator

Echelon GPU Chip Design

Image from Echelon GPU Cluster Architecture

To achieve Eflops performance, we need to use at least N = 400 cabinets. Or 327,680 processor cores in 400 cabinets. The Echelon system is supported by a self- aware OS and runtime system. The Echelon system is also designed to preserve locality with the support of compiler and autotuner.

CUDA Support for GPU Clusters The CUDA version 3.2 is used for a single GPU module in The CUDA version 4.0 allows using multiple GPUs with unified virtual address space of shared memory.

Applications on GPU Clusters Distributed calculations to predict the native conformation of proteins Medical analysis simulations based on CT and MRI scan images Physical simulations in fluid dynamics and environment statistics Accelerated 3D graphics, cryptography, compression, and interconversion of video file formats Building the single-chip cloud computer (SCC) through virtualization in many-core architecture.