© NVIDIA Corporation 2011 The ‘Super’ Computing Company From Super Phones to Super Computers CUDA 4.0.

© NVIDIA Corporation 2011 Rapid Application Porting Unified Virtual Addressing Faster Multi-GPU Programming GPUDirect 2.0 CUDA 4.0 Application Porting Made Simpler Easier Parallel Programming in C++ Thrust

© NVIDIA Corporation 2011 NVIDIA GPUDirect™ : Towards Eliminating the CPU Bottleneck Direct access to GPU memory for 3 rd party devices Eliminates unnecessary sys mem copies & CPU overhead Supported by Mellanox and Qlogic Up to 30% improvement in communication performance Direct access to GPU memory for 3 rd party devices Eliminates unnecessary sys mem copies & CPU overhead Supported by Mellanox and Qlogic Up to 30% improvement in communication performance Version 1.0 for applications that communicate over a network Version 1.0 for applications that communicate over a network Peer-to-Peer memory access, transfers & synchronization MPI implementations natively support GPU data transfers Less code, higher programmer productivity Peer-to-Peer memory access, transfers & synchronization MPI implementations natively support GPU data transfers Less code, higher programmer productivity Details @ http://www.nvidia.com/object/software-for-tesla-products.htmlhttp://www.nvidia.com/object/software-for-tesla-products.html Version 2.0 for applications that communicate within a node Version 2.0 for applications that communicate within a node

© NVIDIA Corporation 2011 Unified Virtual Addressing Easier to Program with Single Address Space No UVA: Multiple Memory Spaces UVA : Single Address Space System Memory CPU GPU 0 Memory GPU 1 Memory System Memory CPU GPU 0 Memory GPU 1 Memory PCI-e 0x0000 0xFFFF 0x0000 0xFFFF 0x0000 0xFFFF 0x0000 0xFFFF

© NVIDIA Corporation 2011 C++ Templatized Algorithms & Data Structures (Thrust) Powerful open source C++ parallel algorithms & data structures Similar to C++ Standard Template Library (STL) Automatically chooses the fastest code path at compile time Divides work between GPUs and multi-core CPUs Parallel sorting @ 5x to 100x faster than STL and TBB Data Structures thrust::device_vector thrust::host_vector thrust::device_ptr Etc. Algorithms thrust::sort thrust::reduce thrust::exclusive_scan Etc.

© NVIDIA Corporation 2011 CUDA 4.0: Highlights Share GPUs across multiple threads Single thread access to all GPUs No-copy pinning of system memory New CUDA C/C++ features Thrust templated primitives library NPP image/video processing library Layered Textures Share GPUs across multiple threads Single thread access to all GPUs No-copy pinning of system memory New CUDA C/C++ features Thrust templated primitives library NPP image/video processing library Layered Textures Easier Parallel Application Porting Easier Parallel Application Porting Auto Performance Analysis C++ Debugging GPU Binary Disassembler cuda-gdb for MacOS Auto Performance Analysis C++ Debugging GPU Binary Disassembler cuda-gdb for MacOS New & Improved Developer Tools Unified Virtual Addressing NVIDIA GPUDirect™ v2.0 Peer-to-Peer Access Peer-to-Peer Transfers GPU-accelerated MPI Unified Virtual Addressing NVIDIA GPUDirect™ v2.0 Peer-to-Peer Access Peer-to-Peer Transfers GPU-accelerated MPI Faster Multi-GPU Programming Faster Multi-GPU Programming

© NVIDIA Corporation 2011 GPU Technology Conference 2011 Oct. 11-14 | San Jose, CA 3 rd annual GPU Technology Conference New for 2011: Co-located with Los Alamos HPC Symposium 300+ Research Scientists from National Labs 2010 highlights 280 hours of sessions 100+ Research posters 42 countries represented www.gputechconf.com

© NVIDIA Corporation 2011 NVIDIA CUDA Summary New in CUDA 4.0 Libraries Thrust C++ Library Templated Performance Primitives NVIDIA Library Support Complete math.h Complete BLAS Library (1, 2 and 3) Sparse Matrix Math Library RNG Library FFT Library (1D, 2D and 3D) Image Processing Library (NPP) Video Processing Library (NPP) 3 rd Party Math Libraries CULA Tools MAGMA IMSL VSIPL VSIPLTools Parallel Nsight Pro NVIDIA Tools Support Parallel Nsight 1.0 IDE cuda-gdb Debugger with multi-GPU CUDA/OpenCL Visual Profiler CUDA Memory Checker CUDA C SDK CUDA Disassembler CUDA Partner Tools Allinea DDT RogueWave /Totalview Vampir Tau CAPS HMPPPlatform GPUDirect 2.0 Fast Path to Data Hardware Support ECC Memory Double Precision Native 64-bit Architecture Concurrent Kernel Execution Dual Copy Engines Multi-GPU support 6GB per GPU supported Operating System Support MS Windows 32/64 Linux 32/64 support Mac OSX support Cluster Management GPUDirect Tesla Compute Cluster (TCC) Graphics Interoperability Programming Model Unified Virtual Addressing C++ new/delete C++ Virtual Functions C support NVIDIA C Compiler CUDA C Parallel Extensions Function Pointers Recursion Atomics malloc/free C++ support Classes/Objects Class Inheritance Polymorphism Operator Overloading Class Templates Function Templates Virtual Base Classes Namespaces Fortran, OpenCL

© NVIDIA Corporation 2011 Automated Performance Analysis in Visual Profiler Summary analysis & hints Session Device Context Kernel New UI for kernel analysis Identify limiting factor Analyze instruction throughput Analyze memory throughput Analyze kernel occupancy

© NVIDIA Corporation 2011 NVIDIA Parallel Nsight ™ Professional features now available free of charge! Key Features Professional Profiler Standard Microsoft Visual Studio 2010 support Single System Debugging Tesla Compute Cluster CUDA Toolkit 3.2

© NVIDIA Corporation 2011 CUDA 3 rd Party Ecosystem Tools Parallel Debuggers Visual Studio IDE with Parallel Nsight Pro Allinea DDT Debugger TotalView Debugger Performance Tools ParaTools VampirTrace TauCUDA Performance Tools PAPI HPC Toolkit Compute Platform Providers Cloud Compute Amazon EC2 Peer 1 OEM’sDellHPIBM Cluster Tools Cluster Management Platform LSF Cluster Manager Platform LSF Cluster Manager Platform Symphony Platform Symphony Bright Cluster manager Bright Cluster manager Job Scheduling Altair PBS Altair PBS Cluster Resources TORQUE Cluster Resources TORQUE MPI Libraries MPIOpenMPI Qlogic OFED Compilers PGI CUDA Fortran PGI Accelerators PGI CUDA x86 CAPS HMPP TidePowerd GPU.net pyCUDA

NVIDIA CUDA Developer Resources ENGINES & LIBRARIES Math Libraries CUFFT, CUBLAS, CUSPARSE, CURAND 3 rd Party Libraries CULA LAPACK, VSIPL, NPP Image Libraries Performance primitives for imaging App Acceleration Engines Ray Tracing: Optix, iRay Video Libraries NVCUVID / NVCUVENC DEVELOPMENT TOOLS CUDA Toolkit Complete GPU computing development kit cuda-gdb GPU hardware debugging Visual Profiler GPU hardware profiler for CUDA C and OpenCL Parallel Nsight Integrated development environment for Visual Studio SDKs AND CODE SAMPLES GPU Computing SDK CUDA C/C++, DirectCompute, OpenCL code samples and documentation Books CUDA by Example, GPU Gems Optimization Guides Best Practices for GPU computing and graphics development http://developer.nvidia.com

© NVIDIA Corporation 2011 Proven Research Vision John Hopkins University Nanyan University Technical University-Czech CSIRO SINTEF HP Labs ICHEC Barcelona SuperComputer Center Clemson University Fraunhofer SCAI Karlsruhe Institute Of Technology World Class Research Leadership and Teaching University of Cambridge Harvard University University of Utah University of Tennessee University of Maryland University of Illinois at Urbana-Champaign Tsinghua University Tokyo Institute of Technology Chinese Academy of Sciences National Taiwan University Georgia Institute of Technology http://research.nvidia.com GPGPU Education 350+ Universities Academic Partnerships / Fellowships GPU Computing Research & Education Mass. Gen. Hospital/NE Univ North Carolina State University Swinburne University of Tech. Techische Univ. Munich UCLA University of New Mexico University Of Warsaw-ICM VSB-Tech University of Ostrava And more coming shortly.

© NVIDIA Corporation 2011 Today’s CUDA CAE Solutions Structural Mechanics Electromagnetics ANSYS Mechanical AFEA Abaqus/Standard (beta) AcuSolve Moldflow Culises (OpenFOAM) Particleworks Nexxim EMPro CST MS XFdtd SEMCAD X Fluid Dynamics

© NVIDIA Corporation 2011 The ‘Super’ Computing Company From Super Phones to Super Computers CUDA 4.0.

Similar presentations

Presentation on theme: "© NVIDIA Corporation 2011 The ‘Super’ Computing Company From Super Phones to Super Computers CUDA 4.0."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© NVIDIA Corporation 2011 The ‘Super’ Computing Company From Super Phones to Super Computers CUDA 4.0.

Similar presentations

Presentation on theme: "© NVIDIA Corporation 2011 The ‘Super’ Computing Company From Super Phones to Super Computers CUDA 4.0."— Presentation transcript:

Similar presentations

About project

Feedback