Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.

Slides:



Advertisements
Similar presentations
1 Computational models of the physical world Cortical bone Trabecular bone.
Advertisements

+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Istituto Tecnico Industriale "A. Monaco"
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.
BY MANISHA JOSHI.  Extremely fast data processing-oriented computers.  Speed is measured in “FLOPS”.  For highly calculation-intensive tasks.  For.
СЪВРЕМЕННИ СУПЕРКОМПЮТРИ
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.
A 4-year $2.6 million grant from the National Institute of Biomedical Imaging and Bioengineering (NIBIB), to perform “real-time” CT imaging dose calculations.
Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Tamnun Hardware. Tamnun Cluster inventory – system Login node (Intel 2 E GB ) – user login – PBS – compilations, – YP master Admin.
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
1 petaFLOPS+ in 10 racks TB2–TL system announcement Rev 1A.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
COMPUTER ARCHITECTURE (for Erasmus students)
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Exploiting Disruptive Technology: GPUs for Physics Chip Watson Scientific Computing Group Jefferson Lab Presented at GlueX Collaboration Meeting, May 11,
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Sobolev Showcase Computational Mathematics and Imaging Lab.
Status of the L1 STS Tracking I. Kisel GSI / KIP CBM Collaboration Meeting GSI, March 12, 2009.
The Finite-Volume Dynamical Core on GPUs within GEOS-5 William Putman Global Modeling and Assimilation Office NASA GSFC 9/8/11 Programming weather, climate,
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Carlo del Mundo Department of Electrical and Computer Engineering Ubiquitous Parallelism Are You Equipped To Code For Multi- and Many- Core Platforms?
ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.
Personal Chris Ward CS147 Fall  Recent offerings from NVIDA show that small companies or even individuals can now afford and own Super Computers.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Biryaltsev E.V., Galimov M.R., Demidov D.E., Elizarov A.M. HPC CLUSTER DEVELOPMENT AND OPERATION EXPERIENCE FOR SOLVING THE INVERSE PROBLEMS OF SEISMIC.
Volume Ray-Casting on Graphics Hardware
The Internet (Gaming) Windows XP or later 1.7 GHz Intel or AMD Processor 512 MB of RAM DirectX 8.1 graphics card Sound card (These requirements are based.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
National Nanotechnology Infrastructure Network Michael Stopa, Harvard University NNIN Computation Coordinator 2009 NSF Nanoscale Science and Engineering.
How to use HybriLIT Matveev M. A., Zuev M.I. Heterogeneous Computations team HybriLIT Laboratory of Information Technologies (LIT), Joint Institute for.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Graphic Processing Units Presentation by John Manning.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
SGI Rackable C2108-GP5 “Arcadia” Server
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
Computer Graphics Graphics Hardware
Manycore processors Sima Dezső October Version 6.2.
NFV Compute Acceleration APIs and Evaluation
HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.
Gruppo Server CCR michele.michelotto at pd.infn.it
Scientific requirements and dimensioning for the MICADO-SCAO RTC
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME Outreach SESAME,
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Nov 4, 2013.
Footer.
Tamnun Hardware.
Computer Graphics Graphics Hardware
Machine Learning at the Edge
CUBAN ICT NETWORK UNIVERSITY COOPERATION (VLIRED
Intel Core 2 Duo Processor T5750
Multicore and GPU Programming
Jianting Zhang1,2 Simin You2, Le Gruenwald3
Multicore and GPU Programming
Presentation transcript:

Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator

Supercomputer   The No. 1, Tianhe-2, and the No. 7, Stampede.  -Intel Xeon Phi processors  The No. 2, Titan, and the No. 6, Piz Daint.  -NVIDIA GPUs  Share  GPU : NVIDIA 46, ATI Radeon 3  Xeon Phi : 21  Hybrid : 4

Hardware Specification  Main module  4 * Intel Xeon X7550 : 2GHz, 18MB Cache, 8Cores  Memory : 64GB  QDR 40Gb/s Infiniband  Sub-module (*5)  2 * Intel Xeon X5660 : 2.8GHz, 12MB Cache, 6Cores  Memory : 48GB  QDR 40Gb/s Infiniband  Sub-module (*2)  2 * Intel Xeon E : 2.6GHz, 20MB Cache, 8Cores  Memory : 128GB  QDR 40Gb/s Infiniband

Monitoring : sobolev.kaist.ac.krsobolev.kaist.ac.kr Sobolev Node1 Node2 Node3 Node4 Node5 GPU Node6 Node7

Tesla K20m  CUDA parallel processing cores : 2496  Memory size : 5GB GDDR5  Processor core clock : 706 MHz  Peak double precision floating point performance : 1.17Tflops  Thermal solution : Passive

Test problem

1.Jacobi (GPU) vs Block Jacobi (CPU) Meshsize(h)JacobiBlock Jacobi CUDA(GPU)mpi3*3(CPU)mpi6*6(CPU)mpi9*9(CPU) 1/ / / /

1.Jacobi (GPU) vs Block Jacobi (CPU)

2.Conjugate Gradient CUDAmpi1mpi2mpi4mpi8mpi16mpi32mpi64 1/ / / /

2.Conjugate Gradient