Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough.

Slides:



Advertisements
Similar presentations
GR2 Advanced Computer Graphics AGR
Advertisements

Sven Woop Computer Graphics Lab Saarland University
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
A Coherent Grid Traversal Algorithm for Volume Rendering Ioannis Makris Supervisors: Philipp Slusallek*, Céline Loscos *Computer Graphics Lab, Universität.
Computer Organization CS224 Fall 2012 Lesson 12. Synchronization  Two processors or threads sharing an area of memory l P1 writes, then P2 reads l Data.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
Week 14 - Monday.  What did we talk about last time?  Bounding volume/bounding volume intersections.
RT06 conferenceVlastimil Havran On the Fast Construction of Spatial Hierarchies for Ray Tracing Vlastimil Havran 1,2 Robert Herzog 1 Hans-Peter Seidel.
Experiences with Streaming Construction of SAH KD Trees Stefan Popov, Johannes Günther, Hans-Peter Seidel, Philipp Slusallek.
Tomas Mőller © 2000 Speeding up your game The scene graph Culling techniques Level-of-detail rendering (LODs) Collision detection Resources and pointers.
RT 08 Efficient Clustered BVH Update Algorithm for Highly-Dynamic Models Symposium on Interactive Ray Tracing 2008 Los Angeles, California Kirill Garanzha.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Ray Tracing Dynamic Scenes using Selective Restructuring Sung-eui Yoon Sean Curtis Dinesh Manocha Univ. of North Carolina at Chapel Hill Lawrence Livermore.
Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.
Assets and Dynamics Computation for Virtual Worlds.
Bounding Volume Hierarchies and Spatial Partitioning Kenneth E. Hoff III COMP-236 lecture Spring 2000.
M. Lastra, R. García, Dpt. Lenguajes y Sistemas Informáticos E.T.S.I. Informática - University of Granada [jrevelle, mlastral, ruben, ugr.es.
Parallel Programming Todd C. Mowry CS 740 October 16 & 18, 2000 Topics Motivating Examples Parallel Programming for High Performance Impact of the Programming.
1 Advanced Scene Management System. 2 A tree-based or graph-based representation is good for 3D data management A tree-based or graph-based representation.
Parallel Programming: Case Studies Todd C. Mowry CS 495 September 12, 2002.
Collision Detection David Johnson Cs6360 – Virtual Reality.
Ray Tracing Primer Ref: SIGGRAPH HyperGraphHyperGraph.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Interactive Ray Tracing: From bad joke to old news David Luebke University of Virginia.
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.
Institute of C omputer G raphics, TU Braunschweig Hybrid Scene Structuring with Application to Ray Tracing 24/02/1999 Gordon Müller, Dieter Fellner 1 Hybrid.
(Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Real-time Graphics for VR Chapter 23. What is it about? In this part of the course we will look at how to render images given the constrains of VR: –we.
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
Department of Computer Science 1 Beyond CUDA/GPUs and Future Graphics Architectures Karu Sankaralingam University of Wisconsin-Madison Adapted from “Toward.
David Luebke11/26/2015 CS 551 / 645: Introductory Computer Graphics David Luebke
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Hierarchical Penumbra Casting Samuli Laine Timo Aila Helsinki University of Technology Hybrid Graphics, Ltd.
Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.
Compact, Fast and Robust Grids for Ray Tracing
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Ray Tracing Optimizations
David Luebke 3/5/2016 Advanced Computer Graphics Lecture 4: Faster Ray Tracing David Luebke
1 Advanced Scene Management. 2 This is a game-type-oriented issue Bounding Volume Hierarchies (BVHs) Binary space partition trees (BSP Trees) “Quake”
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
UNC Chapel Hill David A. O’Brien Automatic Simplification of Particle System Dynamics David O’Brien Susan Fisher Ming C. Lin Department of Computer Science.
Bounding Volume Hierarchies and Spatial Partitioning
Parallel Programming By J. H. Wang May 2, 2017.
Bounding Volume Hierarchies and Spatial Partitioning
Real-Time Ray Tracing Stefan Popov.
Martin Rinard Laboratory for Computer Science
Parallel Application Case Studies
Course Outline Introduction in algorithms and applications
Parallel Programming in C with MPI and OpenMP
Presentation transcript:

Dynamic Scenes Paul Arthur Navrátil ParallelismJustIsn’tEnough

Outline Toward Rapid Reconstruction for Animated Ray Tracing Lext and Akenine-Möller, Eurographics 2001 –Lessons for parallel implementations? Parallel Tree Building on a Range of Shared Address Space Multiprocessors: Algorithms and Application Performance Shan and Singh, Proc. IPPS/SPDP 1998 –Application to Ray Tracing acceleration structures?

Animated Ray Tracing: Motivation [Lext and Akenine-Möller, 2001] Two competing goals in graphics processing –Generate photo-realistic images –Render at real-time rates ( > 20 fps ) Can Ray Tracing give us both? –Parallelizing RT and frameless rendering help –Latest efforts yielding interactive rates for static scenes (e.g., [Wald et al., 01]) –Dynamic scenes still too computationally intensive

Why doesn’t parallelism solve the problem? Data Structure overhead –Reconstructing the acceleration data structures has worse complexity and less obvious parallelism –In a dynamic scene, all changed objects need to be updated in the acceleration structure –Traditionally, this means rebuild it!

Previous Work Special-case animated objects [Parker et al., 1999] – Objects outside the acceleration structure not scalable Reuse frame information to save render time [Adelson and Hodges, 1995] –Performance improved 92%, but only for scenes with eye movement Use lazy evaluation to prune acceleration structure [McNeill et al., 1992] –Evaluates only the structure that is actually used –Only tested on static scenes

Insight: Only Part of the Scene is Dynamic! Distinguish between static and dynamic objects in acceleration structure Dynamic objects exhibit spatial locality Update transform matrices for each scene node, transform rays before calculating intersection [Wald et al., 2002]

Dynamic versus Static Parts

Idea: Be Lazy! If modifying the scene graph fails to provide significant speedup (or even if it does) use lazy evaluation of the acceleration structure –Evaluate a subsection only when a ray enters the voxel Adapt acceleration structure according to use –Simplify or eliminate the structure if usage is low –Do this at runtime, based on some feedback measure? Neither of these ideas were in the tested system, but it can be extended to include them

Data Structure: Hierarchical Bounding Boxes Surround each set of primitives with a minimum area Oriented Bounding Box (OBB) –Set defined as primitives to which one transform is applied (static or dynamic) –Put a recursive grid in each OBB Encapsulate all top-level dynamic OBBs in a special OBB-grid –These are recalculated every frame due to the movement of the contained grids

Algorithm Execution Create OBB grids –One grid for static objects, the rest contain all dynamic objects Update OBB grids –Transform to root node CS, then to original node CS (previous frame?) Recurse if this contains subgrids –Apply incremental transformations to primitives –Create new OBB around subgrids (and primitives?) –Transform to new OBB CS

A Benchmark for Animated Ray Tracing (BART) [Lext et al., 2001] BART Robots Benchmark video

Results: A Silver Lining

Tree Building Methods: Motivation Use tree structures to organize work solving N-body problems –Classic example: find positions of N bodies attracted by gravity after a period of time –Graphics corollary? Find position of dynamic objects in given frame Studies 5 strategies across 4 systems –Physically distributed memory (SGI Origin 2000) –Bus-based shared memory MP (SGI Challenge) –Shared virtual memory in software (Intel Paragon) –Configurable memory in software with hardware assistance (Wisconsin Typhoon-zero)

Strategy Characteristics ORIG –Global octree built by processors loading objects into a single shared tree. –Split cells into 8 subcells when objects within cell exceed threshold –Processor operates on cells it ‘created’ ORIG-LOCAL –Optimized version of ORIG –Uses different data structures for internal nodes than leaves –Processor allocates and manages its own cell and leaf arrays –Thus cells can be kept in contiguous memory

Strategy Characteristics UPDATE –Insight: distributions evolve slowly over time (in animation too?) –Objects that move out of the cell in which they’re placed (think: location in the scene) is small –Update only as much as the tree as is necessary –Leverage tree hierarchy to find new cell (if cells arranged according to scene space)

Strategy Characteristics PARTREE –Insight: in previous algorithms, a lock is needed to ensure mutual exclusivity on the single global tree –Causes synchronization overhead, contention, and remote access overhead for root and high inner nodes of tree –Solution: each processor creates a local tree, populates it, then merges tree into global tree –Uses ‘tree template’ to simplify merge –Global inserts and synchronizations reduced –Redundant work minimized if spatial locality used to distribute objects to processors

Strategy Characteristics SPACE –Divide up space among processors, rather than objects (Pharr?) –Each process loads objects that are in its space Ideally space units (voxels) map to tree cells –No need for locking, but high potential for load imbalances if number of objects per space is unbound –Can lose data locality since processors don’t necessarily compute on the objects they put in the tree (true for graphics?) –No locking during global tree assembly, because only one processor has the cells for a given subtree