© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 VSCSE Summer School 2008 Accelerators for Science and Engineering Applications:

Slides:

Advertisements

Similar presentations

EEL6686 Guest Lecture February 25, 2014 A Framework to Analyze Processor Architectures for Next-Generation On-Board Space Computing Tyler M. Lovelly Ph.D.

Advertisements

ECE 598HK Computational Thinking for Many-core Computing Lecture 2: Many-core GPU Performance Considerations © Wen-mei W. Hwu and David Kirk/NVIDIA,

GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Structuring Parallel Algorithms.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.

© David Kirk/NVIDIA and Wen-mei W. Hwu, , SSL 2014, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

Last time: Runtime infrastructure for hybrid (GPU-based) platforms  Task scheduling Extracting performance models at runtime  Memory management Asymmetric.

© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 11 Parallel.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

UIUC CSL Global Technology Forum © NVIDIA Corporation 2007 Computing in Crisis: Challenges and Opportunities David B. Kirk.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 1 Programming Massively Parallel Processors Lecture Slides for Chapter 1: Introduction.

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 10-14, VSCSE Summer School 2009 Many-core processors for Science and Engineering.

GPU Programming and Architecture: Course Overview Patrick Cozzi University of Pennsylvania CIS Spring 2012.

GPU Computing April GPU Outpacing CPU in Raw Processing GPU NVIDIA GTX cores 1.04 TFLOPS CPU GPU CUDA Architecture Introduced DP HW Introduced.

© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, 2008 Taiwan 2008 CUDA Course Programming Massively Parallel Processors: the CUDA experience.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

© David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al, University of Illinois, ECE408 Applied Parallel Programming Lecture 12 Parallel.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.

1 ECE408/CS483 Applied Parallel Programming Lecture 10: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu ECE408/CS483/ECE498al University.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.

FIGURE 11.1 Mapping between OpenCL and CUDA data parallelism model concepts. KIRK CH:11 “Programming Massively Parallel Processors: A Hands-on Approach.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CMPS 5433 Dr. Ranette Halverson Programming Massively.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.

Multicore Computing Lecture 1 : Course Overview Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Programming Massively Parallel Processors.

1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 18: Final Project Kickoff.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 Final Project Notes.

GPU Programming Shirley Moore CPS 5401 Fall 2013

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483/498AL, University of Illinois, Urbana-Champaign 1 ECE 408/CS483 Applied Parallel Programming.

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 10-14, VSCSE Summer School 2009 Many-core Processors for Science and Engineering.

© David Kirk/NVIDIA and Wen-mei W. Hwu, University of Illinois, Urbana-Champaign 1 CS/EE 217 GPU Architecture and Parallel Programming Project.

CS6068 Week 2 Quiz. What are David Patterson’s Three Wall of Computer Architecture?

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.

© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lecture 15: Basic Parallel Programming Concepts.

University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, ECE 498AL, University of Illinois, Urbana-Champaign ECE408 / CS483 Applied Parallel Programming.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Programming Massively Parallel.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 Graphic Processing Processors (GPUs) Parallel.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.

Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator Paper Presentation Yifeng (Felix) Zeng University of Missouri.

©Wen-mei W. Hwu and David Kirk/NVIDIA Urbana, Illinois, August 2-5, 2010 VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture.

1 CS/EE 217 GPU Architecture and Parallel Programming Lecture 9: Tiled Convolution Analysis © David Kirk/NVIDIA and Wen-mei W. Hwu,

3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Xen GPU Rider. Outline Target & Vision GPU & Xen CUDA on Xen GPU Hardware Acceleration On VM - VMGL.

Productive Performance Tools for Heterogeneous Parallel Computing

VSCSE Summer School Proven Algorithmic Techniques for Many-core Processors Lecture 5: Data Layout for Grid Applications ©Wen-mei W. Hwu and David Kirk/NVIDIA.

Programming Massively Parallel Processors Lecture Slides for Chapter 9: Application Case Study – Electrostatic Potential Calculation © David Kirk/NVIDIA.

Introduction to Heterogeneous Parallel Computing

Presentation transcript:

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 VSCSE Summer School 2008 Accelerators for Science and Engineering Applications: GPUs and Multi-cores Lecture 10 Future Directions and Conclusions

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 Why is programming many-core processors costly today? Separate structure from CPU –Data isolation and marshalling with pressure to reduce overhead –GMAC tool coming Lack of standardized programming interface –Each has its own app development models and tools –OpenCL a good start, Gluon will raise the level much higher Management of memory resources –CUDA-lite under Gluon coming Multi-dimensional optimizations required for achieving performance goals –CUDA-tune and ADAPT model tools coming Programming multiple GPUs and CPU cores still tedious –WorkForce runtime coming Too many people start projects from scartch –CUDA Research Zone coming, new application frameworks

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 A different approach from the past Simple parallelism –Focus on simple forms of parallelism for programmers –Trade some generality and performance for productivity Power tools –Leverage and strengthen app development frameworks –Empower tools with specification, analysis and domain information

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 Reduced Tuning Efforts Hwu, UIUC

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 To Learn More A set of more comprehensive lectures – (slides, voice recordings) A new book – (chapters 7-9 brand new!) CUDA Programming Guide and Related Documents – _0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdfhttp://developer.download.nvidia.com/compute/cuda/2 _0/docs/NVIDIA_CUDA_Programming_Guide_2.0.pdf

© David Kirk/NVIDIA and Wen-mei W. Hwu Urbana, Illinois, August 18-22, 2008 A Great Opportunity for Many GPU parallel computing allows –Drastic reduction in “time to discovery” –1 st principle-based simulation at meaningful scale –New, 3 rd paradigm for research: computational experimentation The “democratization” of power to discover –$2,000/Teraflop SPFP in personal computers today –$5,000,000/Petaflops DPFP in clusters in two years –HW cost will no longer be the main barrier for big science –You will make the difference! Join the international CUDA research, education and development community