Accelerated Single Ray Tracing for Wide Vector Units

Slides:



Advertisements
Similar presentations
Sven Woop Computer Graphics Lab Saarland University
Advertisements

Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Restart Trail for Stackless BVH Traversal Samuli Laine NVIDIA Research.
Binary Shading using Geometry and Appearance Bert Buchholz Tamy Boubekeur Doug DeCarlo Marc Alexa Telecom ParisTech – CNRS Rutgers University TU Berlin.
Exploiting Spatial Locality in Data Caches using Spatial Footprints Sanjeev Kumar, Princeton University Christopher Wilkerson, MRL, Intel.
IIIT Hyderabad Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad,
A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität.
Searching on Multi-Dimensional Data
Latency considerations of depth-first GPU ray tracing
Visibility Culling using Hierarchical Occlusion Maps Hansong Zhang, Dinesh Manocha, Tom Hudson, Kenneth E. Hoff III Presented by: Chris Wassenius.
RT06 conferenceVlastimil Havran On the Fast Construction of Spatial Hierarchies for Ray Tracing Vlastimil Havran 1,2 Robert Herzog 1 Hans-Peter Seidel.
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.
Afrigraph 2004 Interactive Ray-Tracing of Free-Form Surfaces Carsten Benthin Ingo Wald Philipp Slusallek Computer Graphics Lab Saarland University, Germany.
Bounding Volume Hierarchy “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presented by Mathieu Brédif.
GH05 KD-Tree Acceleration Structures for a GPU Raytracer Tim Foley, Jeremy Sugerman Stanford University.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Optimized Subdivisions for Preprocessed Visibility Oliver Mattausch, Jiří Bittner, Peter Wonka, Michael Wimmer Institute of Computer Graphics and Algorithms.
Fast Agglomerative Clustering for Rendering Bruce Walter, Kavita Bala, Cornell University Milind Kulkarni, Keshav Pingali University of Texas, Austin.
Interactive k-D Tree GPU Raytracing Daniel Reiter Horn, Jeremy Sugerman, Mike Houston and Pat Hanrahan.
RT08, August ‘08 Large Ray Packets for Real-time Whitted Ray Tracing Ryan Overbeck Columbia University Ravi Ramamoorthi Columbia University William R.
Cache-Conscious Runtime Optimization for Ranking Ensembles Xun Tang, Xin Jin, Tao Yang Department of Computer Science University of California at Santa.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Lecture 7 – Data Reorganization Pattern Data Reorganization Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science.
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
By Mahmoud Moustafa Zidan Basic Sciences Department Faculty of Computer and Information Sciences Ain Shams University Under Supervision of Prof. Dr. Taymoor.
An Evaluation of Existing BVH Traversal Algorithms for Efficient Multi-Hit Ray Tracing Jefferson Amstutz (SURVICE) Johannes Guenther (Intel) Ingo Wald.
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
1 1 Exploiting Local Orientation Similarity for Efficient Ray Traversal of Hair and Fur Sven Woop, Carsten Benthin, Ingo Wald, Gregory S. Johnson Intel.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
Hardware-accelerated Point-based Rendering of Surfaces and Volumes Eduardo Tejada, Tobias Schafhitzel, Thomas Ertl Universität Stuttgart, Germany.
Interactive Ray Tracing of Dynamic Scenes Tomáš DAVIDOVIČ Czech Technical University.
Stable 6-DOF Haptic Rendering with Inner Sphere Trees René Weller, Gabriel Zachmann Clausthal University, Germany IDETC/CIE.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.
Compact, Fast and Robust Grids for Ray Tracing
Efficient Ray Tracing of Compressed Point Clouds Erik Hubo Tom Mertens Tom Haber Philippe Bekaert Expertise Centre for Digital Media Hasselt University.
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
Learning A Better Compiler Predicting Unroll Factors using Supervised Classification And Integrating CPU and L2 Cache Voltage Scaling using Machine Learning.
CHC ++: Coherent Hierarchical Culling Revisited Oliver Mattausch, Jiří Bittner, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University.
1 The Method of Precomputing Triangle Clusters for Quick BVH Builder and Accelerated Ray Tracing Kirill Garanzha Department of Software for Computers Bauman.
Watertight Reduced-Precision Traversal Karthik Vaidyanathan Tomas Akenine-Möller Marco Salvi.
Layout by orngjce223, CC-BY Compact BVH StorageFabianowski ∙ Dingliana Compact BVH Storage for Ray Tracing and Photon Mapping Bartosz Fabianowski ∙ John.
Fabianowski · DinglianaInteractive Global Photon Mapping1 / 22 Interactive Global Photon Mapping Bartosz Fabianowski · John Dingliana Trinity College Dublin.
Real-Time ray casting for virtual reality
Parallel Plasma Equilibrium Reconstruction Using GPU
Hybrid Ray Tracing and Path Tracing of Bezier Surfaces using a mixed hierarchy Rohit Nigam, P. J. Narayanan CVIT, IIIT Hyderabad, Hyderabad, India.
STBVH: A Spatial-Temporal BVH for Efficient Multi-Segment Motion Blur
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
Henri Ylitie, Tero Karras, Samuli Laine
Toward Advocacy-Free Evaluation of Packet Classification Algorithms
Cache Memory Presentation I
Real-Time Ray Tracing Stefan Popov.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Hybrid Ray Tracing of Massive Models
Ray Tracing, Part 1 Dinesh Manocha COMP 575/770.
Memory Management 11/17/2018 A. Berrached:CS4315:UHD.
Mattan Erez The University of Texas at Austin
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Compressed-LEAF bounding volume hierarchies
Ray Tracing on Programmable Graphics Hardware
BVH Student: Jack Chang.
An Analysis of Region Clustered BVH Volume Rendering on GPU
Wide BVH traversal with a short stack
Presentation transcript:

Accelerated Single Ray Tracing for Wide Vector Units Valentin Fuetterling°*, Carsten Lojewski°, Franz-Josef Pfreundt°, Bernd Hamann^ and Achim Ebert* ° * ^

Introduction – Ray Tracing Set of triangles Find first intersection Acceleration Bounding Volume Hierarchy (BVH) Requires traversal BVH

Motivation Accelerating single ray traversal on data-parallel hardware Why single rays? Ease of use Consistent performance Immediate results

Current Single Ray Traversal Dissection Current Single Ray Traversal

State-Of-The-Art Multi-branching BVH Data parallelism Vector instructions (SIMD) Less traversal iteration More coherent memory access Binary BVH BVH4 [Dammertz et al. 2008; Ernst and Greiner 2008; Wald et al. 2008]

Building Blocks 1 3 2 Ray Setup Inner Node Iteration Bounding Box Intersection Node Ordering Stack Operation 2 1 1 3 2 1 2 3 Leaf Node Triangle Intersection Stack Compaction

Node Ordering: Heuristics Ideal traversal order Minimal traversal iterations Unknown Distance heuristic Intersection distance Sign heuristic Pre-compute order 0x 1x

Data-parallel Limitations? Inner Node Bounding Box Intersection Node Ordering Stack Operation Variable number of active nodes N Treat N < 3 as special cases for efficiency Branchy code

Accelerated Single Ray Tracing for Wide Vector Units Wive

Bounding Box Intersection Contributions WiVe: Single Ray Traversal Algorithm Encoding for sign-based ordering heuristic Full data-parallel traversal for multi-branch BVHs Constant-time (i.e. branch-less) inner node iteration Inner Node Bounding Box Intersection Node Ordering Stack Operation

WiVe Key Ideas Encode fixed traversal orders into nodes One order per ray signs combination (8) Order defined by permute vector Map node ordering to a single permute vector operation Map stack operation to a single vector compress operation 1 2 3 4 5 6 7 1 2 3 4 5 6 7

WiVe Algorithm Initially nodes are in memory order Permute nodes (A) Intersection test (B) Compress (C) Store to stack (D) 1 2 3 4 5 6 7 A B C D

WiVe Node Order Binary treelet with split axis labels (a) 7 6 1 3 2 5 4 Y X a) Binary treelet with split axis labels (a) Leaves form BVH8 node cluster Split hierarchy determines permute vectors BVH8 node cluster in spatial domain (b) Ray with +x and -y signs (octant is 10b = 2) If signs[split axis] negative: Swap default order 1 6 3 2 7 5 4 X Y b) Permute_vector[2] = 1 6 7 3 2 5 4

WiVe Configurations (BVH8-half) −t min 1 t max 1 −t min 0 t max 0 −t min 7 t max 7 even odd t min n 8b 1 2 3 4 5 6 7 A B C D BVH branching-factor equals half the vector width E.g BVH8 with 16-wide vector instructions tmin and tmax values interleaved in register Child offset and tmin interleaved on stack

WiVe Configurations (BVH8-full) 1 2 3 4 5 6 7 A B C D n t min t max 0 t max 1 t max 7 t min 0 t min 1 t min 7 BVH branching-factor equals full vector width E.g. BVH8 with 8-wide vector instructions tmin and tmax values in separate registers Child offset and tmin on separate stacks

WiVe EvaLUation

Experimental Setup Compare WiVe to Embree 2.15.0 single ray Integrated into Embree Protoray benchmark WiVe BVH derived from Embree BVH, same topology BVH uses SAH-based binning (no spatial splits) Path tracing with 8 bounces Two implementations of WiVe BVH8-half for AVX-512 (16-wide) BVH8-full for AVX2 (8-wide)

Input / Output White environment light Diffuse shading Normal colors 3840 x 2160 resolution Mazda (5.7M) Art Deco (10.5M) San Miguel (10.7M) Villa (37.5M) Powerplant (12.8M)

Results – Order Heuristic Sign-based vs. distance-based heuristic Per-ray average indicators Very close, slight bias towards distance Worst case is San Miguel with 6-7% discrepancy

Results – Performance (BVH8-half) AVX-512 implementation Native permute and compress instructions Measured on Intel® Xeon Phi™ 7250 @ 1.4GHz (Knight’s Landing) Timings include full (off-screen) rendering

Results – Performance (BVH8-full) AVX2 implementation Native permute instruction, compress emulated with permute + shift + look-up table (1kb) Measured on dual-socket Intel® Xeon™ E5 2680v3 @ 2.5GH (Haswell)

Conlusion & Outlook Introduced the WiVe algorithm Fully vectorized SIMD single ray traversal Branch-free traversal iteration for multi-branching BVHs Fastest on CPUs Combine WiVe with compression Investigate WiVe on GPUs Trade throughput for latency

Thank you! Email: valentin.fuetterling@itwm.fraunhofer.de Blog: http://rapt.technology