Download presentation

Presentation is loading. Please wait.

Published bySavanna Nations Modified over 3 years ago

1
† Saarland University, Germany ‡ University of Utah, USA Estimating Performance of a Ray-Tracing ASIC Design Sven Woop † Erik Brunvand ‡ Philipp Slusallek †

2
Ray Tracing in Car Industry

3
Ray Tracing Games

4
Previous Work Ray Tracers for Static Scenes CPU based: [OpenRT], [MLRT SIGGRAPH05] GPU based: Purcell (Grids) [SIGGRAPH02], Foley et al. (KD Trees) [GH05] Custom Hardware: Commercial Hardware (ART-VPS) Schmittler (KD Trees) [GH04] RPU (KD Trees) [SIGGRAPH05] Ray Tracers for Dynamic Scenes CPU based: Wald (Grids) [SIGGRAPH06] Wald (AABVHs) [TOG / Tech. Rep. 2006] Custom Hardware: Woop (B-KD Trees) [GH06]

5
Outline Previous Work DRPU Architecture B-KD Trees Traversal Processor Prototype Implementations DRPU-FPGA DRPU-ASICs Conclusion

6
Definition of B-KD Trees B-KD Tree (Bounded KD-Tree) Binary Tree 1D bounding intervalls for each child Leaf nodes point to a single primitive

7
B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap Primitives in single leaf nodes More traversal steps as for KD Tree Support for dynamic scenes

8
B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap Primitives in single leaf nodes More traversal steps as for KD Tree Support for dynamic scenes

9
B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap Primitives in single leaf nodes More traversal steps as for KD Tree Support for dynamic scenes

10
B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap Primitives in single leaf nodes More traversal steps as for KD Tree Support for dynamic scenes

11
B-KD Tree Subdivision Bounding Volume Hierarchy (partially unbounded) Each node can be associated with a full bounding box Bounds may overlap Primitives in single leaf nodes More traversal steps as for KD Tree Support for dynamic scenes

12
Update of B-KD Trees Update Procedure Bounds updated on changed geometry B-KD tree structure remains constant Linear updating complexity

13
DRPU Architecture vertices from memory

14
DRPU Architecture Rendering Units Highly multi-threaded Higher hardware usage Synchronous execution of packets of 4 rays Memory bandwidth reduction First level caches Memory bandwidth reduction vertices from memory

15
DRPU Architecture Programmable Shading Processor Design similar to fragment processors on GPUs Improved Programming Model Add highly efficient recursion Add flexible memory access Programming Model Ray generation tasks Material shading Calls Ray Casting Units to cast rays vertices from memory

16
DRPU Architecture Programmable Shading Unit Ray Casting Units High-performance traversal and intersection Support for continous dynamic scenes B-KD Trees approach vertices from memory

17
DRPU Architecture Programmable Shading Unit Ray Casting Units Traversal Processor Efficient traversal of B-KD trees vertices from memory

18
DRPU Architecture Programmable Shading Unit Ray Casting Units Traversal Processor Efficient traversal of B-KD trees Geometry Unit Ray transformations Vertex-based ray/triangle intersection [Möller Trumbore] Shared vertices save memory 6x vertices from memory

19
DRPU Architecture Programmable Shading Unit Ray Casting Units Scene Changes Skinning Processor Skeleton Subspace Deformation Re-uses Geometry Unit Pure stream architecture vertices from memory

20
DRPU Architecture Programmable Shading Unit Ray Casting Units Scene Changes Skinning Processor (see paper) Skeleton Subspace Deformation Re-uses Geometry Unit Pure stream architecture Update Processor Stream-like architecture Partial breadth-first execution One B-KD node update per clock cycle peak vertices from memory

21
DRPU Architecture vertices from memory

22
Traversal of B-KD Trees Early ray termination Clipping of near/far interval against both bounding intervalls Take closer child, push farther child to stack Traversal order does not affect correctness Complexity 4x computational cost of KD tree traversal step 2x stack memory

23
Traversal Processor Stack control computes next address

24
Traversal Processing Unit Stack control computes next address Next node is fetched from cache

25
Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes

26
Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes 4 Decision Units compute per ray traversal decision

27
Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes 4 Decision Units compute per ray traversal decision Packet Decision Unit computes packet traversal decision Packet goes left if exists a that ray goes left Packet goes right if exists a ray that goes right Packet goes from left to right if exists a ray that goes into both children from left to right

28
Traversal Processing Unit Stack control computes next address Next node is fetched from cache 4 traversal slices compute 4x4 distances to bounding planes 4 Decision Units compute per ray traversal decision Packet Decision Unit computes packet traversal decision Packet goes left if exists a that ray goes left Packet goes right if exists a ray that goes right Packet goes from left to right if exists a ray that goes into both children from left to right Incoherent packets possible

29
FPGA Implementation Hardware Xilinx Virtex4 LX160 66 MHz 1.0 GB/s (limited to 0.5 GB/s) 7.5 Gflops 2,3 Gflops programmable 5,2 Gflops fixed function Implementation Packets of 4 rays 32 packets of rays 3x 8 KB caches, direct mapped 24 bit floating point Virtex4 Board

30
ASIC Design Synthesis Synopsys Synthesis UMC 130nm CMOS process Place & Route Cadence Encounter Some manual placements to achieve good results Only DRPU Core No chip interface designed (PCI Express, DRAM,...) No power estimation DRPU-ASIC

31
Hardware UMC 130nm process Die size: 49 mm 2 266 MHz clock 2.1 GB/s bandwidth 30 Gflops Implementation Differences Larger caches (3x 16 KB, 4-way associative) 32 bit floating point 7mm

32
GPU Complexity ATI R520 (October, 2005) 90nm process 288 mm 2 die 600 MHz clock speed 170 GFlops programmable? 44,8 GB/s memory bandwidth Implementation Packets of 4 fragments 16 fragment pipelines 8 vertex piplines 32 bit floating point 7mm

33
On-Chip Parallelization Thread Scheduler schedules packets High bandwidth memory interface to Rendering Units

34
DRPU4 ASIC Hardware UMC 130nm process 196 mm 2 die (4 x 49 mm 2 ) 266 MHz clock 8,5 GB/s 120 GFlops Implementation Differences 4x DRPU ASIC No high level control 14mm

35
DRPU8-ASIC Hardware 90nm process (extrapolated using constant field scaling) 186 mm 2 die 400 MHz clock speed 25,6 GB/s bandwidth 361 Gflops 110 Gflops programmable 471 Gflops fixed function Implementation Differences 8x DRPU-ASIC 9,6 mm 19,3 mm

36
Results 1024x768, shadows

37
Results 1024x768, shadows

38
Results for DRPU8 Performance sufficient for game play Room for improving image quality Gael 91.2 fpsDynGael 96.0 fps

39
Conclusions and Future Work Ray Tracing Hardware Design Support for programmable recursive shading Coherent scene changes Working Prototype Implementation Post layout ASIC Results Still no power results No direct performance comparison against GPU

40
Questions? Project Homepage: http://www.saarcor.de http://www.saarcor.de Computer Graphics Lab at Saarland University: http://graphics.cs.uni-sb.de http://graphics.cs.uni-sb.de

Similar presentations

OK

A SEMINAR ON 1 www.engineersportal.in. CONTENT 2 The Stream Programming Model The Stream Programming Model-II Advantage of Stream Processor Imagine’s.

A SEMINAR ON 1 www.engineersportal.in. CONTENT 2 The Stream Programming Model The Stream Programming Model-II Advantage of Stream Processor Imagine’s.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on electrical power transmission system Ppt on conservation of momentum lab Ppt on etiquettes for students Ppt on taj mahal pollution Ppt on natural resources class 8 Maths ppt on triangles for class 9 Ppt online open enrollment Mba ppt on body language Ppt on holographic technology software Ppt on rbi reforms for the completion