Presentation is loading. Please wait.

Presentation is loading. Please wait.

Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003.

Similar presentations


Presentation on theme: "Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003."— Presentation transcript:

1 Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003

2 February 11th, 20042 Brook: general purpose streaming language developed for PCA Program/Merrimac –compiler: RStream Reservoir Labs –DARPA PCA Program Stanford: SmartMemories UT Austin: TRIPS MIT: RAW –Brook version 0.2 spec: http://merrimac.stanford.edu –Brook for GPUs: http://brook.sourceforce.net Stream Execution Unit Stream Register File Memory System Network Interface Scalar Execution Unit text DRDRAM Network

3 February 11th, 20043 Brook: general purpose streaming language stream programming model –enforce data parallel computing streams –encourage arithmetic intensity kernels C with streams

4 February 11th, 20044 Brook for gpus demonstrate gpu streaming coprocessor –make programming gpus easier hide texture/pbuffer data management hide graphics based constructs in CG/HLSL hide rendering passes virtualize resources –performance! … on applications that matter –highlight gpu areas for improvement features required general purpose stream computing

5 February 11th, 20045 system outline.br Brook source files brcc source to source compiler brt Brook run-time library

6 February 11th, 20046 Brook language streams streams –collection of records requiring similar computation particle positions, voxels, FEM cell, … float3 positions ; float3 velocityfield ; – encourage data parallelism

7 February 11th, 20047 Brook language kernels kernels –functions applied to streams similar to for_all construct kernel void foo (float a<>, float b<>, out float result<>) { result = a + b; } float a ; float b ; float c ; foo(a,b,c); for (i=0; i<100; i++) c[i] = a[i]+b[i]; – no dependencies between stream elements encourage high arithmetic intensity

8 February 11th, 20048 Brook language kernels Ray Triangle Intersection kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[], RayState oldraystate<>, GridTrilist trilist[], out Hit candidatehit<>) { float idx, det, inv_det; float3 edge1, edge2, pvec, tvec, qvec; if(oldraystate.state.y > 0) { idx = trilist[oldraystate.state.w].trinum; edge1 = tris[idx].v1 - tris[idx].v0; edge2 = tris[idx].v2 - tris[idx].v0; pvec = cross(ray.d, edge2); det = dot(edge1, pvec); inv_det = 1.0f/det; tvec = ray.o - tris[idx].v0; candidatehit.data.y = dot( tvec, pvec ) * inv_det; qvec = cross( tvec, edge1 ); candidatehit.data.z = dot( ray.d, qvec ) * inv_det; candidatehit.data.x = dot( edge2, qvec ) * inv_det; candidatehit.data.w = idx; } else { candidatehit.data = float4(0,0,0,-1); }

9 February 11th, 20049 Brook language additional features reductions –scalar –stream stride & repeat GatherOp & ScatterOp –a[i] += p –p = a[i]++

10 February 11th, 200410 brcc compiler infrastructure based on ctool –http://ctool.sourceforge.net parser –build code tree –extend C grammar to accept Brook convert –tree transformations codegen –generate cg & hlsl code –call cgc, fxc –generate stub function

11 February 11th, 200411 Applications Ray-tracer FFT Segmentation Linear Algebra: –BLAS, LINPACK, LAPACK

12 February 11th, 200412 Brook Performance

13 February 11th, 200413 GPU Gotchas Time Registers Used

14 February 11th, 200414 GPU Gotchas NVIDIA NV3x: Register usage vs. Time Time Registers Used

15 February 11th, 200415 GPU Gotchas NVIDIA: Register Penalty Render to Texture Limitation –Requires explicit copy or heavy pbuffer solution –Superbuffer extension needed http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions SIG03.pdf

16 February 11th, 200416 GPU Gotchas ATI Radeon 9800 Pro Limited dependent texture lookup 96 instructions 24-bit floating point –s16e7 Integers up to 131,072 (s23e8: 16,777,216) Memory Refs Math Ops Memory Refs Math Ops Memory Refs Math Ops Memory Refs Math Ops 1 1 2 2 3 3 4 4

17 February 11th, 200417 GPU Catch-Up! Integer & Bit Ops & Double Precision Memory Addressing CGC/FXC Performance –Hand code performance critical code No native reduction support No native scatter support –p[i] = a (indirect write) No programmable blend –GatherOp / ScatterOp Limited 4x4 output –Brook virtualized kernel outputs Readback still slow –NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback –ATI DirectX: 550 MB/sec Download 50 MB/sec Readback

18 February 11th, 200418 GPUs of the future (we hope) Complete Instruction Sets –Integers, Bit Ops, Doubles, Mem Access Integration –Streaming coprocessor not just a rendering device Streaming architectures SDRAM Stream Register File ALU Cluster

19 February 11th, 200419 Brook for GPUs Release v0.3 available on Sourceforge Project Page –http://graphics.stanford.edu/projects/brook Source –http://www.sourceforge.net/projects/brook Over 4K downloads! Questions? Fly-fishing fly images from The English Fly Fishing ShopThe English Fly Fishing Shop


Download ppt "Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman Pat Hanrahan February 10th, 2003."

Similar presentations


Ads by Google