Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003.

Slides:



Advertisements
Similar presentations
GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.
Advertisements

Is There a Real Difference between DSPs and GPUs?
Kayvon Fatahalian, Jeremy Sugerman, Pat Hanrahan
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan GCafe December 10th, 2003.
GPU Programming using BU Shared Computing Cluster
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
Data Parallel Computing on Graphics Hardware Ian Buck Stanford University.
February 11, Streaming Architectures and GPUs Ian Buck Bill Dally & Pat Hanrahan Stanford University February 11, 2004.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Jan 30, 2003 GCAFE: 1 Compilation Targets Ian Buck, Francois Labonte February 04, 2003.
CUDA (Compute Unified Device Architecture) Supercomputing for the Masses by Peter Zalutski.
Data Parallel Computing on Graphics Hardware Ian Buck Stanford University.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Some Things Jeremy Sugerman 22 February Jeremy Sugerman, FLASHG 22 February 2005 Topics Quick GPU Topics Conditional Execution GPU Ray Tracing.
Interactive k-D Tree GPU Raytracing Daniel Reiter Horn, Jeremy Sugerman, Mike Houston and Pat Hanrahan.
Brook for GPUs. May 6, Status –Current efforts toward supporting Reservoir RStream compiler –Brook version 0.2 spec:
General-Purpose Computation on Graphics Hardware David Luebke University of Virginia.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Interactive Visualization of Volumetric Data on Consumer PC Hardware: Introduction Daniel Weiskopf Graphics Hardware Trends Faster development than Moore’s.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Enhancing GPU for Scientific Computing Some thoughts.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Computer Graphics Graphics Hardware
GPU Computation Strategies & Tricks Ian Buck Stanford University.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
Cg Programming Mapping Computational Concepts to GPUs.
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
General-Purpose Computation on Graphics Hardware Adapted from: David Luebke (University of Virginia) and NVIDIA.
Pseudorandom Number Generation on the GPU Myles Sussman, William Crutchfield, Matthew Papakipos.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
The programmable pipeline Lecture 3.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
High-Level Languages for GPUs Ian Buck Stanford University.
High Level Languages for GPUs Ian Buck NVIDIA. 2 High Level Shading Languages Cg, HLSL, & OpenGL Shading Language Cg, HLSL, & OpenGL Shading Language.
GPGPU Tools and Source Code Mark HarrisNVIDIA Developer Technology.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
Ray Tracing using Programmable Graphics Hardware
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware Tim Foley Mike Houston Pat Hanrahan Computer Graphics Lab Stanford University.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
IBM Cell Processor Ryan Carlson, Yannick Lanner-Cusin, & Cyrus Stoller CS87: Parallel and Distributed Computing.
Martin Kruliš by Martin Kruliš (v1.0)1.
CUDA Interoperability with Graphical Environments
Chapter 6 GPU, Shaders, and Shading Languages
Brook GLES Pi: Democratising Accelerator Programming
NVIDIA Fermi Architecture
Data Parallel Computing on Graphics Hardware
Ray Tracing on Programmable Graphics Hardware
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Presentation transcript:

Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003

February 11th, Brook: general purpose streaming language developed for PCA Program/Merrimac –compiler: RStream Reservoir Labs –DARPA PCA Program Stanford: SmartMemories UT Austin: TRIPS MIT: RAW –Brook version 0.2 spec: –Brook for GPUs: Stream Execution Unit Stream Register File Memory System Network Interface Scalar Execution Unit text DRDRAM Network

February 11th, Brook: general purpose streaming language stream programming model –enforce data parallel computing streams –encourage arithmetic intensity kernels C with streams

February 11th, Brook for gpus demonstrate gpu streaming coprocessor –make programming gpus easier hide texture/pbuffer data management hide graphics based constructs in CG/HLSL hide rendering passes virtualize resources –performance! … on applications that matter –highlight gpu areas for improvement features required general purpose stream computing

February 11th, system outline.br Brook source files brcc source to source compiler brt Brook run-time library

February 11th, Brook language streams streams –collection of records requiring similar computation particle positions, voxels, FEM cell, … float3 positions ; float3 velocityfield ; – encourage data parallelism

February 11th, Brook language kernels kernels –functions applied to streams similar to for_all construct kernel void foo (float a<>, float b<>, out float result<>) { result = a + b; } float a ; float b ; float c ; foo(a,b,c); for (i=0; i<100; i++) c[i] = a[i]+b[i]; – no dependencies between stream elements encourage high arithmetic intensity

February 11th, Brook language kernels Ray Triangle Intersection kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[], RayState oldraystate<>, GridTrilist trilist[], out Hit candidatehit<>) { float idx, det, inv_det; float3 edge1, edge2, pvec, tvec, qvec; if(oldraystate.state.y > 0) { idx = trilist[oldraystate.state.w].trinum; edge1 = tris[idx].v1 - tris[idx].v0; edge2 = tris[idx].v2 - tris[idx].v0; pvec = cross(ray.d, edge2); det = dot(edge1, pvec); inv_det = 1.0f/det; tvec = ray.o - tris[idx].v0; candidatehit.data.y = dot( tvec, pvec ) * inv_det; qvec = cross( tvec, edge1 ); candidatehit.data.z = dot( ray.d, qvec ) * inv_det; candidatehit.data.x = dot( edge2, qvec ) * inv_det; candidatehit.data.w = idx; } else { candidatehit.data = float4(0,0,0,-1); }

February 11th, Brook language additional features reductions –scalar –stream stride & repeat GatherOp & ScatterOp –a[i] += p –p = a[i]++

February 11th, brcc compiler infrastructure based on ctool – parser –build code tree –extend C grammar to accept Brook convert –tree transformations codegen –generate cg & hlsl code –call cgc, fxc –generate stub function

February 11th, Applications Ray-tracer FFT Segmentation Linear Algebra: –BLAS, LINPACK, LAPACK

February 11th, Brook Performance

February 11th, GPU Gotchas Time Registers Used

February 11th, GPU Gotchas NVIDIA NV3x: Register usage vs. Time Time Registers Used

February 11th, GPU Gotchas NVIDIA: Register Penalty Render to Texture Limitation –Requires explicit copy or heavy pbuffer solution –Superbuffer extension needed SIG03.pdf

February 11th, GPU Gotchas ATI Radeon 9800 Pro Limited dependent texture lookup 96 instructions 24-bit floating point –s16e7 Integers up to 131,072 (s23e8: 16,777,216) Memory Refs Math Ops Memory Refs Math Ops Memory Refs Math Ops Memory Refs Math Ops

February 11th, GPU Catch-Up! Integer & Bit Ops & Double Precision Memory Addressing CGC/FXC Performance –Hand code performance critical code No native reduction support No native scatter support –p[i] = a (indirect write) No programmable blend –GatherOp / ScatterOp Limited 4x4 output –Brook virtualized kernel outputs Readback still slow –NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback –ATI DirectX: 550 MB/sec Download 50 MB/sec Readback

February 11th, GPUs of the future (we hope) Complete Instruction Sets –Integers, Bit Ops, Doubles, Mem Access Integration –Streaming coprocessor not just a rendering device Streaming architectures SDRAM Stream Register File ALU Cluster

February 11th, Brook for GPUs Release v0.3 available on Sourceforge Project Page – Source – Over 4K downloads! Questions? Fly-fishing fly images from The English Fly Fishing ShopThe English Fly Fishing Shop