Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tesla: Fastest Processor Adoption in HPC History

Similar presentations


Presentation on theme: "Tesla: Fastest Processor Adoption in HPC History"— Presentation transcript:

1 Tesla: Fastest Processor Adoption in HPC History

2 2 4 cores GPU Computing CPU + GPU Co-Processing Heterogeneous Computing

3 3 Computation Discontinuity Double Precision debut

4 4 146X Medical Imaging U of Utah 36X Molecular Dynamics U of Illinois, Urbana 18X Video Transcoding Elemental Tech 50X Matlab Computing AccelerEyes 100X AstrophysicsRIKEN 149X Financial simulation Oxford 47X Linear Algebra Universidad Jaime 20X 3D Ultrasound Techniscan 130X Quantum Chemistry U of Illinois, Urbana 30X Gene Sequencing U of Maryland 50x – 150x

5 5 L1

6 6 Tesla GPU Computing Products Tesla S1070 1U System Tesla C1060 Computing Board GPUs4 Tesla GPUs1 Tesla GPU Single Precision Performance 4.14 Teraflops933 Gigaflops Double Precision Performance 346 Gigaflops78 Gigaflops Memory16 GB (4 GB / GPU)4 GB

7 7 New Class of Hybrid CPU-GPU Servers 2 Tesla M1060 GPUs Upto 18 Tesla M1060 GPUs Bull Bullx Blade Enclosure SuperMicro 1U GPU Server

8 8 M$ Performance 100x 1x 10,000x Traditional CPU Cluster CPU Workstation K$ Tesla Personal Supercomputer TeslaCo-processingCluster

9 9 UPenn: Finding a Better Shampoo Tesla PSC32 CPU Servers 1 kWatt19.2 kWatts ~$7 K$128 K No Data Center Required 9.6x Lower Power 13x Lower Cost Equal Performance 11

10 10 Finance: Equity Pricing 2 Tesla S1070s500 CPU Servers 2.8 kWatts37.5 kWatts $24 K$250 K 16x Less Space 13x Lower Power 10x Lower Cost Equal Performance 11

11 11 Oil & Gas: Seismic Processing ~$400 K~$8 M 45 kWatts1200 kWatts 27x Lower Power 20x Lower Cost Equal Performance Tesla S1070s2000 CPU Servers 31x Less Space

12 12 Workstation Supercomputing HRL LabsCarnegie Mellon University Korean GovernmentMIT Lincoln Lab US ArmyUC San Diego Northrop GrummanUniversity of Wisconsin Halliburton Energy ServicesOxford University North Star ImagingUniversity of Michigan Pacific BiosciencesJohns Hopkins Kodak Canada Genome Sciences Centre Tesla Personal Supercomputer ~5000 Customers

13 CSIRO - AustraliaArgonne National Labs Tokyo TechNCSA BNP-ParibasPacific Northwest Labs HarvardOak Ridge Nat’l Laboratory National Taiwan UniversityAmes Lab – Iowa State Federal agenciesCambridge PetrobrasBritish Aerospace TOTALFermi Research Labs HessHLRS – Germany Max Planck InstituteUniversity of Michigan Daresbury Labs, UKChinese Academy of Sciences Tesla Cluster Installations

14 14 Supercomputing for the Masses Millions of researchers < $5K Tesla Personal Supercomputer 100,000s of researchers 100s $50K-$1M $10M+ Tesla Preconfigured Clusters LargeClusters

15 15 GPU Computing Applications CC++FortranCC++FortranJavaPythonJavaPython OpenCL OpenCL tm DirectX Compute NVIDIA GPU CUDA Parallel Computing Architecture NVIDIA GPU CUDA Parallel Computing Architecture OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc. CUDA Parallel Computing Architecture

16 16 CUDA: Widely Adopted Parallel Programming Model Research Papers 200+ universities teaching CUDA 120 Million CUDA GPUs 60,000+ Active Developers

17 17 CUDA Ecosystem ApplicationsLibraries FFT BLAS LAPACK Image processing Video processing Signal processing Vision ConsultantsOEMs Languages C, C++ DirectX Fortran Java OpenCL Python Compilers PGI Fortran CAPs HMPP MCUDA MPI NOAA Fortran2C OpenMP UIUCMITHarvardBerkeleyCambridgeOxford… IIT Delhi TsinghuaDortmundt ETH Zurich MoscowNTU… Over 200 Universities Teaching CUDA ANEO GPU Tech Oil & Gas Finance Medical Biophysics Numerics Imaging CFD DSPEDA

18 18 Bio-SciencesBio-InformaticsMedical ImagingDefenseOil and Gas GROMACS using OpenMMGROMACS using OpenMM NAMD alpha VMD, beta HOOMD GPU HMMER MUMmerGPU: Sequence AlignmentMUMmerGPU: Sequence Alignment Accelereyes: MATLAB pluginAccelereyes: MATLAB plugin GPULib: IDL accelerationGPULib: IDL acceleration Acceleware CT Recon Digisens CT Recon Accelereyes: MATLAB pluginAccelereyes: MATLAB plugin GPU VSIPL: Signal ProcessingGPU VSIPL: Signal Processing GPULib: IDL accelerationGPULib: IDL acceleration Ikena: Imagery Analysis, Video ForensicsIkena: Imagery Analysis, Video Forensics GIS: Manifold Accelereyes: MATLAB pluginAccelereyes: MATLAB plugin Acceleware: Time MigrationAcceleware: Time Migration SeismicCity: Prestack Headwave: Prestack OpenGeoSolutions: Spectral DecompOpenGeoSolutions: Spectral Decomp Mercury: 3D viz ffA: 3D Seismic process GIS: Manifold Released Applications EDA Weather & Ocean ModelingFinanceElectro-magnetics CST: 3D EM Agilent: ADS SPICE Synopsys: TCAD WRF beta release Particle simulation Boltzmann solverParticle simulation Boltzmann solver Tsunami simulation: Tokyo Tech NOAA new model being developed Numerix: Counterparty Scicomp: Derivative PricingScicomp: Derivative Pricing Hanweck: Options PricingHanweck: Options Pricing Exegy: Risk Analysis Aqumin: 3D Viz Acceleware: FDTD Solver Quantum electrodynamics libraryQuantum electrodynamics library CST Microwave Studio GPMAD : Particle beam dynamics simulatorGPMAD : Particle beam dynamics simulator

19 19 More Information Products Vertical Solutions CUDA GPU Programming Training GPU Developer Conference Sept 30 – Oct 2, 2009 San Jose, CA

20 20 Programming the GPU

21 21 Compiling C for CUDA Applications void serial_function(… ) {... } void other_function(int... ) {... } void saxpy_serial(float... ) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } void main( ) { float x; saxpy_serial(..);... } NVCC (Open64) CPU Compiler C CUDA Key Kernels CUDA object files Rest of C Application CPU object files Linker CPU-GPU Executable Modify into Parallel CUDA code

22 22 C for CUDA : C with a few keywords void saxpy_serial(int n, float a, float *x, float *y) { for (int i = 0; i < n; ++i) y[i] = a*x[i] + y[i]; } // Invoke serial SAXPY kernel saxpy_serial(n, 2.0, x, y); __global__ void saxpy_parallel(int n, float a, float *x, float *y) { int i = blockIdx.x*blockDim.x + threadIdx.x; if (i < n) y[i] = a*x[i] + y[i]; } // Invoke parallel SAXPY kernel with 256 threads/block int nblocks = (n + 255) / 256; saxpy_parallel >>(n, 2.0, x, y); Standard C Code Parallel C Code

23 23 CUDA Programming Effort / Performance Source : MIT CUDA Course

24 24 Science Manufacturing Medical Finance Computed Tomography (CT) Source: Ufimtsev, Martinez Source: Batenburg, Sijbers, et al Source: Tolke, Krafczyk Source: CUDA SDK, NAG

25 25 FFT Performance: CPU vs GPU cuFFT 2.3: NVIDIA Tesla C1060 GPU MKL 10.1r1: Quad-Core Intel Core i7 (Nehalem) 3.2GHz

26 26 BLAS Performance: CPU vs GPU CUBLAS: CUDA 2.2, Tesla C1060 MKL : Intel Core2 Extreme, 3.00GHz

27 27 Heterogeneous Computing Domains Oil & GasFinanceMedicalBiophysicsNumericsAudioVideoImaging GPU (Parallel Computing) Graphics CPU (Sequential Computing) Highly Parallel Computation Control and Communication Productivity ApplicationData Intensive Application

28 28 Life Sciences & Medical Equipment Productivity / Misc Oil and GasEDAFinance CAE / Mathematical Communi cation Max Planck FDA Robarts Research Medtronic AGC Evolved machines Smith-Waterman DNA sequencing AutoDock NAMD/VMD Howard Hughes Medical CRIBI Genomics GE Healthcare Siemens Techniscan Boston Scientific Eli Lilly Silicon Informatics Stockholm Research Harvard Delaware Pittsburg ETH Zurich Institute Atomic Physics CEA NCSA WRF Weather Modeling OptiTex Tech-X Elemental Technologies Dimensional Imaging Manifold Digisens General Mills Rapidmind Rhythm & Hues xNormal Elcomsoft LINZIK Hess TOTAL CGG/Veritas Chevron Headwave Acceleware Seismic City P-Wave Seismic Imaging Mercury Computer ffA Geostar Synopsys Nascentric Gauda CST Agilent Symcor Level 3 SciComp Hanweck Quant Catalyst RogueWave BNP Paribas AccelerEyes MathWorks Wolfram National Instruments Ansys Access Analytics Tech-x RIKEN SOFA Renault Boeing Nokia RIM Philips Samsung LG Sony Ericsson NTT DoCoMo Mitsubishi Hitachi Radio Research Laboratory US Air Force Customers / ISVs


Download ppt "Tesla: Fastest Processor Adoption in HPC History"

Similar presentations


Ads by Google