Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.

Slides:



Advertisements
Similar presentations
Parallel Volume Rendering for Ocean Visualization in a Cluster of PCs Alexandre Coelho Marcio Nascimento Cristiana Bentes Maria Clicia S. de Castro Ricardo.
Advertisements

On Dynamic Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Chalmers University of Technology.
Sven Woop Computer Graphics Lab Saarland University
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
Motivation Desktop accelerators (like GPUs) form a powerful heterogeneous platform in conjunction with multi-core CPUs. To improve application performance.
Ray Tracing Ray Tracing 1 Basic algorithm Overview of pbrt Ray-surface intersection (triangles, …) Ray Tracing 2 Brute force: Acceleration data structures.
Two-Level Grids for Ray Tracing on GPUs
OpenFOAM on a GPU-based Heterogeneous Cluster
©Wen-mei W. Hwu and David Kirk/NVIDIA 2010 ECE 498HK Computational Thinking for Many-core Computing Lecture 15: Dealing with Dynamic Data.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Memory-Savvy Distributed Interactive Ray Tracing David E. DeMarle Christiaan Gribble Steven Parker.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.
Ray Tracing Dynamic Scenes using Selective Restructuring Sung-eui Yoon Sean Curtis Dinesh Manocha Univ. of North Carolina at Chapel Hill Lawrence Livermore.
Image-Based Rendering using Hardware Accelerated Dynamic Textures Keith Yerex Dana Cobzas Martin Jagersand.
Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.
Task Based Execution of GPU Applications with Dynamic Data Dependencies Mehmet E Belviranli Chih H Chou Laxmi N Bhuyan Rajiv Gupta.
Adaptive Stream Processing using Dynamic Batch Sizing Tathagata Das, Yuan Zhong, Ion Stoica, Scott Shenker.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
RT08, August ‘08 Large Ray Packets for Real-time Whitted Ray Tracing Ryan Overbeck Columbia University Ravi Ramamoorthi Columbia University William R.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
Load Balancing Dan Priece. What is Load Balancing? Distributed computing with multiple resources Need some way to distribute workload Discreet from the.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Supporting GPU Sharing in Cloud Environments with a Transparent
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor Mark Gebhart 1,2 Stephen W. Keckler 1,2 Brucek Khailany 2 Ronny Krashinsky.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Stefan PopovHigh Performance GPU Ray Tracing Real-time Ray Tracing on GPU with BVH-based Packet Traversal Stefan Popov, Johannes Günther, Hans- Peter Seidel,
Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
By Mahmoud Moustafa Zidan Basic Sciences Department Faculty of Computer and Information Sciences Ain Shams University Under Supervision of Prof. Dr. Taymoor.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
An Evaluation of Existing BVH Traversal Algorithms for Efficient Multi-Hit Ray Tracing Jefferson Amstutz (SURVICE) Johannes Guenther (Intel) Ingo Wald.
Efficient Data Accesses for Parallel Sequence Searches Heshan Lin (NCSU) Xiaosong Ma (NCSU & ORNL) Praveen Chandramohan (ORNL) Al Geist (ORNL) Nagiza Samatova.
Fast BVH Construction on GPUs (Eurographics 2009) Park, Soonchan KAIST (Korea Advanced Institute of Science and Technology)
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Memory Management and Parallelization Paul Arthur Navrátil The University of Texas at Austin.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Compact, Fast and Robust Grids for Ray Tracing Ares Lagae & Philip Dutré 19 th Eurographics Symposium on Rendering EGSR 2008Wednesday, June 25th.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,
Path/Ray Tracing Examples. Path/Ray Tracing Rendering algorithms that trace photon rays Trace from eye – Where does this photon come from? Trace from.
CHC ++: Coherent Hierarchical Culling Revisited Oliver Mattausch, Jiří Bittner, Michael Wimmer Institute of Computer Graphics and Algorithms Vienna University.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
CGI2014 SPONSORED BY Ray tracing via GPU Rasterization Wei Hu 1 Yangyu Huang 1 Fan Zhang 1 Guodong Yuan 2 1 Beijing University of Chemical Technology,
Computer Architecture: Parallel Task Assignment
Distributed Dynamic BDD Reordering
Parallel Algorithm Design
Real-Time Ray Tracing Stefan Popov.
Linchuan Chen, Xin Huo and Gagan Agrawal
Department of Computer Science University of California, Santa Barbara
TensorFlow: A System for Large-Scale Machine Learning
Department of Computer Science University of California, Santa Barbara
Presentation transcript:

Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012

Background Heterogeneous Computing Platforms – Widely available at all scales Ray Tracing – Most popular technique for photorealism – Base of many rendering algorithms – Computationally intensive – Embarrassingly parallel Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform

Background: BVH Ray Traversal Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform 7 boxes + 15 triangles tests Workload of ray tracing is irregular Per ray BVH traversals are highly variable Cost is hard to know beforehand Dynamic workload balancing What about if we could predict the traversal costs? How to use them to improve the balancing efficiency and reduce rendering times? 5 boxes + 9 triangles tests

Overview of the Approach Offline Build a BVH for the scene Compute expected number of primitive intersections Online Predict costs of batch of rays Initialize workload balancer based on predicted costs Heterogeneous launch of ray tracer Repeat for generations of secondary rays Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform

Ray Traversal Cost Estimation EB = 4.2 ET =9.65 EB = 2 ET = 8.75 EB = 0 ET = 7 EB = 0 ET = 9 EB = 0 ET = 8 EB = 2 ET = 4 EB = 0 ET = 4 EB = 3.33 ET = 4.89 EB = 0 ET = 4 Boundary Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform (4) In this example C(r) = 7 KB + 15 KT We traverse the BVH to 60% of its depth and sample 10% of the rays within each task

Workload Balancing Task for ray tracing are fixed size group of rays Two-level workload balancing – Inter-processor: need to split work between “big” processing units (CPUs, GPUs) – Intra-processor: need to split work between “small” processing units (SPs within a GPU) A variation of one of these strategies is commonly used: – Centralized queue – Distributed static assignation – Distributed dynamic assignation with task stealing Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform

Balancing Strategies Distributed Queues Distributed Queues with Task Stealing Centralized Queue Static Balancing Dynamic Balancing Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform

Experiments Test platform – AMAX machine running Intel Xeon E5600 processor, 16GB of RAM memory, and 3 NVidia Tesla C1060 GPUs. We have compared regular and cost-initialized versions of the workload balancing policies over a number of test scenes – For task of varying size – For rays exhibiting different degree of spatial coherency (high to medium) In general, cost-based initialized versions outperform regular versions for large task sizes Results not sensitive to degree of spatial coherency of tested rays Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform

Conclusions and Future Work Cost-based initialization of task system can improve balancing efficiency of strategies. – The most benefit is gotten by the static strategy (comparable to dynamic balancing) – Dynamic strategies also showed improved balancing efficiency Cost based approach is particularly attractive for coarse grained task systems. – Best results achieved for large size tasks Work limited by degree of variability that rays can have. Rays with low spatial coherency pose a challenge due to the estimation overhead they impose. – Approach cannot be used in its current state for some rendering algorithms – We believe that fast ray reordering can help in this regard We have not considered yet using directly the costs for GPU workload balancing – An implementation of distributed queues on GPUs might also get some benefit from the estimated costs Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform