Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

Slides:



Advertisements
Similar presentations
Speed, Accurate and Efficient way to identify the DNA.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2011 GPUMemories.ppt GPU Memories These notes will introduce: The basic memory hierarchy.
Optimization on Kepler Zehuan Wang
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.
1 Threading Hardware in G80. 2 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA.
L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Synergistic Execution of Stream Programs on Multicores with Accelerators Abhishek Udupa et. al. Indian Institute of Science.
CUDA and the Memory Model (Part II). Code executed on GPU.
Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.
Accelerating SYMV kernel on GPUs Ahmad M Ahmad, AMCS Division, KAUST
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
An Introduction to Programming with CUDA Paul Richmond
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Extracted directly from:
CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
GPU Programming with CUDA – Optimisation Mike Griffiths
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.
CUDA Optimizations Sathish Vadhiyar Parallel Programming.
GPU Architecture and Programming
Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.
(1) Kernel Execution ©Sudhakar Yalamanchili and Jin Wang unless otherwise noted.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
CUDA. Assignment  Subject: DES using CUDA  Deliverables: des.c, des.cu, report  Due: 12/14,
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Would'a, CUDA, Should'a. CUDA: Compute Unified Device Architecture OU Supercomputing Symposium Highly-Threaded HPC.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
GPU Performance Optimisation Alan Gray EPCC The University of Edinburgh.
GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.
My Coordinates Office EM G.27 contact time:
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2013 Branching.ppt Control Flow These notes will introduce scheduling control-flow.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
CS 179: GPU Computing LECTURE 2: MORE BASICS. Recap Can use GPU to solve highly parallelizable problems Straightforward extension to C++ ◦Separate CUDA.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
Computer Engg, IIT(BHU)
Introduction to CUDA Li Sung-Chi Taiwan Evolutionary Intelligence Laboratory 2016/12/14 Group Meeting Presentation.
Sathish Vadhiyar Parallel Programming
EECE571R -- Harnessing Massively Parallel Processors ece
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Presented by: Isaac Martin
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
Introduction to CUDA.
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
©Sudhakar Yalamanchili and Jin Wang unless otherwise noted
6- General Purpose GPU Programming
CS 295: Modern Systems GPU Computing Introduction
CIS 6930: Chip Multiprocessor: Parallel Architecture and Programming
Presentation transcript:

Some key aspects of NVIDIA GPUs and CUDA

Silicon Usage

Performance and Bandwidth

PCIe Connected Separate memory space on CPU (host) and GPU (device) Must malloc space on GPU and copy data from CPU memory to GPU memory Need means to copy executable to GPU and start it (kernel launch)

Hardware Architecture

CUDA Streaming Multiprocessor 32 cores assigned to 32 threads (a warp) Limited number of SFUs In-order execution Latency masked by swamping between warps of threads

Warp Processing (hide latency)

Memory Model

Blocks and Grids