GPU Programming Overview Spring 2011 류승택. What is a GPU? GPU stands for Graphics Processing Unit Simply – It is the processor that resides on your graphics.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
Patrick Cozzi University of Pennsylvania CIS Fall 2013
Patrick Cozzi University of Pennsylvania CIS Spring 2012
Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1.
The Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Fall 2012.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Vertex & Pixel Shaders CPS124 – Computer Graphics Ferdinand Schober.
Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Raghu Machiraju Slides: Courtesy - Prof. Huamin Wang, CSE, OSU
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
Interactive Visualization of Volumetric Data on Consumer PC Hardware: Introduction Daniel Weiskopf Graphics Hardware Trends Faster development than Moore’s.
Background image by chromosphere.deviantart.com Fella in following slides by devart.deviantart.com DM2336 Programming hardware shaders Dioselin Gonzalez.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
CHAPTER 4 Window Creation and Control © 2008 Cengage Learning EMEA.
GPGPU Ing. Martino Ruggiero Ing. Andrea Marongiu
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Graphics Hardware and Graphics in Video Games COMP136: Introduction to Computer Graphics.
Computer Graphics Graphics Hardware
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Cg Programming Mapping Computational Concepts to GPUs.
1 SIC / CoC / Georgia Tech MAGIC Lab Rossignac GPU  Precision, Power, Programmability –CPU: x60/decade, 6 GFLOPS,
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.
The programmable pipeline Lecture 3.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
David Luebke 1 1/25/2016 Programmable Graphics Hardware.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Computer Graphics Graphics Hardware
GPU Architecture and Its Application
Programmable Pipelines
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
Graphics Processing Unit
Computer Graphics Graphics Hardware
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

GPU Programming Overview Spring 2011 류승택

What is a GPU? GPU stands for Graphics Processing Unit Simply – It is the processor that resides on your graphics card. GPUs allow us to achieve the unprecedented graphics capabilities now available in games (Demo: NVIDIA GTX 400)NVIDIA GTX 400

Introduction ■ GPGPU (General-Purpose Computation on GPUs)  The first commodity, programmable parallel architecture  GPU evolution driven by computer game market  Advantage of data-parallelism GPUs are >10x faster than CPU for appropriate problems  Advantage of commodity GPUs are inexpensive GPUs are Ubiquitous Desktops, laptops, PDAs, cell phones  Achieving this speedup Requires a large amount of GPU-specific knowledge

Motivation ■ Challenge Statement  GPGPU signifies the dawn of the desktop parallel computing age

Why Program on the GPU ? Graph from:

Why Program on the GPU ? ■ Compute  Intel Core i7 – 4 cores – 100 GFLOP  NVIDIA GTX280 – 240 cores – 1 TFLOP ■ Memory Bandwidth  System Memory – 60 GB/s  NVIDIA GT200 – 150 GB/s ■ Install Base  Over 200 million NVIDIA G80s shipped

How did this happen? ■ Games demand advanced shading ■ Fast GPUs = better shading ■ Need for speed = continued innovation ■ The gaming industry has overtaken the defense, finance, oil and healthcare industries as the main driving factor for high performance processors.

NVIDIA GPU Evolution Slide from David Luebke:

Real-time Rendering ■ Realtime Rendering  Graphics hardware enables real-time rendering  Real-time means display rate at more than 10 images per second 3D Scene = Collection of 3D primitives (triangles, lines, points) Image = Array of pixels

Graphics Review ■ Modeling ■ Rendering ■ Animation

Graphics Review: Modeling ■ Modeling  Polygons vs Triangles How do you store a triangle mesh?  Implicit Surfaces  Height maps  …

Triangles Image courtesy of A K Peters, Ltd.

Triangles Image courtesy of A K Peters, Ltd. Imagery from NASA Visible Earth: visibleearth.nasa.gov.

Triangles

Implicit Surfaces Images from GPU Gems 3:

Height Maps Image courtesy of A K Peters, Ltd.

Graphics Review: Rendering ■ Rendering  Goal: Assign color to pixels ■ Two Parts  Visible surfaces What is in front of what for a given view  Shading Simulate the interaction of material and light to produce a pixel color

Rasterization What about ray tracing?

Visible Surfaces Image courtesy of A K Peters, Ltd.

Visible Surfaces Z-Buffer / Depth Buffer Fragment vs Pixel Image courtesy of A K Peters, Ltd.

Shading Images courtesy of A K Peters, Ltd.

Shading Image from GPU Gems 3:

Graphics Pipeline Primitive Assembly Primitive Assembly Vertex Transforms Vertex Transforms Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation Scissor Test Stencil Test Depth Test Blending

Graphics Pipeline Images courtesy of A K Peters, Ltd.

Graphics Pipeline Images courtesy of A K Peters, Ltd.

Graphics Pipeline Images courtesy of A K Peters, Ltd.

Graphics Pipeline Images courtesy of A K Peters, Ltd.

Graphics Review: Animation ■ Move the camera and/or agents, and re-render the scene  In less than 16.6 ms (60 fps)

Evolution of the Programmable Graphics Pipeline ■ Pre GPU ■ Fixed function GPU ■ Programmable GPU ■ Unified Shader Processors

Early 90s – Pre GPU Slide from Mike Houston:

OpenGL Pipeline

GPU Shader ■ Fixed functionalities ■ Programmable functionalities ■ Flexible memory access

Stream Program => GPU ■ A stream is a sequence of data (could be numbers, colors, RGBA vectors, … )

Vertex Shader ■ Vertex transformation ■ Once per vertex ■ Input attributes  Normal  Texture coordinates  Colors

Geometry Shader ■ Geometry composition ■ Once per geometry ■ Input primitives  Points, lines, triangles  Lines and triangles with adjacency ■ Output primitives  Points, line strips or triangle strips  [0, n] primitives outputted

Fragment Shader ■ Pre-pixel (or fragment) composition ■ Once per fragment ■ Operations on interpolated values  Vertex attributes  User-defined varying variables

GPU Shader

Programming Graphics Hardware

PC Architecture

Bus Interface ■ ISA (Industry Standard Architecture)  버스 인터페이스  90 년대 초반의 XT, AT 시절부터 사용  이론적으로 최대 16Mbps 의 속도  주변기기에서의 병목현상은 심각 처리속도가 크게 문제되지 않는 사운드카드나 모뎀등을 연결하 는 정도로 쓰이고 있음 ■ PCI (Peripheral Component Interconnect)  parallel connection  ISA 후속으로 주변장치 연결을 위해 사용되고 있는 인터페이스  ISA 슬롯보다 크기가 작고 IRQ 공유  일반적인 32 비트 33MHz 는 133Mbps 의 속도, 64 비트 66MHz 는 524Mbps 속도  주변 장치 대부분이 PCI 인터페이스를 사용 ISA PCI AGP

Bus Interface ■ AGP (Accelerated Graphics Port)  Serial Connection (cheap, scalable)  인텔에 의해 개발  PCI 에 기반을 두고 있으나 전송 속도는 PCI 보다 두배 이상 빠름  기본적으로 66MHz 로 작동  AGP = 2 x PCI (AGP 2x = 2 x AGP) AGP 1x 방식일 경우는 최고 264Mbps AGP 2x 방식에서는 최고 533Mbps  3D 그래픽 카드용 ■ PCIe (PCI Express)  Serial Connection  최대 8.0 GB/s 의 대역폭 (PCIe = 2 x AGP x 8)  전 세계 그래픽 시장을 책임지고 있는 인텔 / ATI / NVIDIA 가 이 새로운 규격을 차세 대 그래픽 인터페이스로 확실하게 인정  기존 PCI 의 제한 때문에 탄생한 그래픽 프로세싱 유닛 (GPUs) 에 독보적 존재였던 AGP 가 PCI Express 로 대체되고 있는 상황 PCI PCIe x1 PCIe x16 GeForce 7800 GTX (PCIe x16)

Generation I: 3dfx Voodoo (1996) One of the first true 3D game cards Worked by supplementing standard 2D video card. Did not do vertex transformations: these were done in the CPU Did do texture mapping, z-buffering. Primitive Assembly Primitive Assembly Vertex Transforms Vertex Transforms Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation CPUGPU PCI Image from “7 years of Graphics”

: Texture Mapping and Z-Buffer - PCI: Peripheral Component Interconnect - 3dfx’s Voodoo

Texture Mapping

Texture Mapping : Perspective-Correct Interpolation

Aside: Mario Kart 64 ■ High fragment load / low vertex load Image from:

Aside: Mario Kart Wii ■ High fragment load / low vertex load? Image from:

Vertex Transforms Vertex Transforms Generation II: GeForce/Radeon 7500 (1998) Main innovation: shifting the transformation and lighting calculations to the GPU Allowed multi-texturing: giving bump maps, light maps, and others.. Faster AGP bus instead of PCI Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation GPU AGP Image from “7 years of Graphics”

1998: Multitexturing - AGP: Accelerated Graphics Port - NVIDIA’s TNT, ATI’s Rage

Multitexturing Light Mapping

: Transform and Lighting - Register Combiner: Offer many more texture/color combinations - NVIDIA’s Geforce 256 and Geforce2, ATI’s Radeon 7500)

Bump Mapping

Environment Mapping

Projective Texture Mapping

Vertex Transforms Vertex Transforms Generation III: GeForce3/Radeon 8500(2001) For the first time, allowed limited amount of programmability in the vertex pipeline Also allowed volume texturing and multi-sampling (for antialiasing) Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation GPU AGP Small vertex shaders Small vertex shaders Image from “7 years of Graphics”

2001: Programmable Vertex Shader - Z-Cull: Predicts which fragments will fail the Z test and discard them - Texture Shader: Offer more texture addressing and operations - NVIDIA’s Geforce3 and Geforce4 Ti, ATI’s Radeon 8500 A programmable processor for any per-vertex computation

Volume Texture Mapping

Vertex Transforms Vertex Transforms Generation IV: Radeon 9700/GeForce FX (2002) This generation is the first generation of fully-programmable graphics cards Different versions have different resource limits on fragment/vertex programs Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation AGP Programmable Vertex shader Programmable Vertex shader Programmable Fragment Processor Programmable Fragment Processor Texture Memory Image from “7 years of Graphics” Slide from Suresh Venkatasubramanian and Joe Kider

: Programmable Pixel Shader - MRT: Multiple Render Target - NVIDIA’s Geforce FX, ATI’s Radeon 9600 to 9800 A programmable processor for any per-pixel computation

Shader: Static vs. Dynamic flow control ■ Static flow control  Condition varies per batch of triangles ■ Dynamic flow control  Condition varies per vertex or pixel ■ Full flow control  Static and dynamic flow control

Generation IV.V: GeForce6/X800 (2004) ■ Simultaneous rendering to multiple buffers ■ True conditionals and loops ■ PCIe bus ■ Vertex texture fetch Vertex Transforms Vertex Transforms Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation PCIe Programmable Vertex shader Programmable Vertex shader Programmable Fragment Processor Programmable Fragment Processor Texture Memory

2004: Shader Model 3.0 and 64 bit Color Support - PCIe: Peripheral Component Interconnect Express - NVIDIA’s Geforce 6800

Real-time Tone Mapping ■ The image is entirely computed in 64-bit color and tone-mapped for display  64-bit color  16 bit floating-point value per channel (R, G, B, A)  Tone Mapping HDRI(High Dynamic Range Image)  low dynamic range device From low to high exposure image of the same scene

Generation V: GeForce8800/HD2900 (2006) Input Assembler Input Assembler Programmable Pixel Shader Programmable Pixel Shader Raster Operations Programmable Geometry Shader PCIe Programmable Vertex shader Programmable Vertex shader Output Merger Ground-up GPU redesign Support for Direct3D 10 / OpenGL 3 Geometry Shaders Stream out / transform-feedback Unified shader processors Support for General GPU programming

Geometry Shaders: Point Sprites

Geometry Shaders Image from David Blythe :

NVIDIA G80 Architecture Slide from David Luebke:

Why Unify Shader Processors? Slide from David Luebke:

Why Unify Shader Processors? Slide from David Luebke:

Unified Shader Processors Slide from David Luebke:

Terminology Shader Model Direct3DOpenGLVideo card Example 292.x NVIDIA GeForce 6800 ATI Radeon X x3.x NVIDIA GeForce 8800 ATI Radeon HD x4.x NVIDIA GeForce GTX 480 ATI Radeon HD 5870

Evolution of the Programmable Graphics Pipeline Slide from Mike Houston:

Evolution of the Programmable Graphics Pipeline Slide from Mike Houston:

Vertex Index Stream 3D API Commands Assembled Primitives Pixel Updates Pixel Location Stream Programmable Fragment Processor Programmable Fragment Processor Transformed Vertices Programmable Vertex Processor Programmable Vertex Processor GPU Front End GPU Front End Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation 3D API: OpenGL or Direct3D 3D API: OpenGL or Direct3D 3D Application Or Game 3D Application Or Game Pre-transformed Vertices Pre-transformed Fragments Transformed Fragments GPU Command & Data Stream CPU-GPU Boundary (AGP/PCIe) Fixed-function pipeline

Vertex Index Stream 3D API Commands Assembled Primitives Pixel Updates Pixel Location Stream Programmable Fragment Processor Programmable Fragment Processor Transformed Vertices Programmable Vertex Processor Programmable Vertex Processor GPU Front End GPU Front End Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation 3D API: OpenGL or Direct3D 3D API: OpenGL or Direct3D 3D Application Or Game 3D Application Or Game Pre-transformed Vertices Pre-transformed Fragments Transformed Fragments GPU Command & Data Stream CPU-GPU Boundary (AGP/PCIe) Programmable pipeline

The Future ■ Unified general programming model at primitive, vertex and pixel levels ■ Scary amount of:  Floating point horsepower  Video memory  Bandwidth b/w system and video memory ■ Lower chip costs and power requirements to make 3D graphics hardware ubiquitous  Automotive (gaming, navigation, head-up displays)  Home (remotes, media center, automation)  Mobile (PDAs, cell phones)

Programming the GPU

The Evolution of GPU Programming Language

Programmable Pipeline

GPU Programming ■ GPU Programming  Low-level Language Assembler-like best performance Platform-dependent Vertex programming, Fragment programming Ex) OpenGL extensions, Direct 9  High-level shading language Easier programming Easier code reuse Easier debugging Easy to read Ex) Cg, HLSL, GLSL

Assembly vs. High-Level Language

Data Flow through Pipeline

GPU Programming ■ GPU Programming  Low-level Language OpenGL extensions GL_ARB_vertex_program, GL_ARB_fragment_program Direct 9 Vertex Shader 2.0, Pixel Shader 2.0  High-level shading language Cg “C for Graphics” By Nvidia HLSL “High-Level Shading Language”, Part of DirectX 9 (Microsoft) GLSL “OpenGL 2.0 Shading Language”, Proposal by 3D Labs HLSL and Cg are much more similar to each other than they are to GLSL

Workflow in Cg

Reference ■ Reference  David Luebke, General-Purpose Computation on Graphics Hardware  Daniel Weiskopf, Basic of GPU-Based Programming  Cyril Zeller, Introduction to the Hardware Graphics Pipeline  Randy Fernando, Programming the GPU  Suresh Venkatasubramanian, GPU Programming and ArchitectureGPU Programming and Architecture  GPGPU (  GPU Programming  Shader::Tech  Nvidia Developer  GPGPU DEVELOPER RESOURCES GPGPU DEVELOPER RESOURCES  CIS 665: GPU Programming and Architecture : University of Pennsylvania