Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006.

Slides:



Advertisements
Similar presentations
Partitioning Screen Space for Parallel Rendering
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
Autonomic Systems Justin Moles, Winter 2006 Enabling autonomic behavior in systems software with hot swapping Paper by: J. Appavoo, et al. Presentation.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
Lecture 7-2 : Distributed Algorithms for Sorting Courtesy : Michael J. Quinn, Parallel Programming in C with MPI and OpenMP (chapter 14)
David Luebke5/11/2015 CS 551 / 645: Introductory Computer Graphics David Luebke
1 Dr. Scott Schaefer Hidden Surfaces. 2/62 Hidden Surfaces.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Chapter 4 Parallel Sort and GroupBy 4.1Sorting, Duplicate Removal and Aggregate 4.2Serial External Sorting Method 4.3Algorithms for Parallel External Sort.
Parallel Rendering Ed Angel
Spatial Data Structure: Quadtree, Octree,and BSP tree Mengxia Zhu Fall 2007.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
Hidden Surface Removal
Week 1 - Friday.  What did we talk about last time?  C#  SharpDX.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Graphics Pipeline Hidden Surfaces CMSC 435/634. Visibility We can convert simple primitives to pixels Which primitives (or parts of primitives) should.
Graphics Pipeline Hidden Surface CMSC 435/634. Visibility We can convert simple primitives to pixels/fragments How do we know which primitives (or which.
Computer Graphics 2 Lecture x: Acceleration Techniques for Ray-Tracing Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Parallel Programming in C with MPI and OpenMP
Pipelines are for Whimps Raycasting, Raytracing, and Hardcore Rendering.
10/29/02 (c) 2002 University of Wisconsin, CS559 Today Hidden Surface Removal Exact Visibility.
Electronic Visualization Laboratory University of Illinois at Chicago “Sort-First, Distributed Memory Parallel Visualization and Rendering” by E. Wes Bethel,
Parallel Volume Rendering Using Binary-Swap Image Composition Presented by Jin Ding Spring 2002 Visualization and Advanced Computer Graphics.
Chep06 1 High End Visualization with Scalable Display System By Dinesh M. Sarode, S.K.Bose, P.S.Dhekne, Venkata P.P.K Computer Division, BARC, Mumbai.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
Performance Evaluation of Parallel Processing. Why Performance?
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Large-Scale Polygon Rendering. Solutions Decimation Visibility Culling Parallel Rendering Others.
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
CSC418 Computer Graphics n BSP tree n Z-Buffer n A-buffer n Scanline.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
Seminar II: Rendering Architectures Yan Cui Love Joy Mendoza Oscar Kozlowski John Tang.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
Partitioning Screen Space 1 (An exciting presentation) © 2002 Brenden Schubert A New Algorithm for Interactive Graphics on Multicomputers * The Sort-First.
Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.
CS 325 Introduction to Computer Graphics 03 / 22 / 2010 Instructor: Michael Eckmann.
SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.
Computer Graphics II University of Illinois at Chicago Volume Rendering Presentation for Computer Graphics II Prof. Andy Johnson By Raj Vikram Singh.
Parallel Rendering Ed Angel Professor Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive Computer Graphics 6E.
A Few Things about Graphics Jian Huang Computer Science University of Tennessee.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Partitioning Screen Space 2 Rui Wang. Architectural Implications of Hardware- Accelerated Bucket Rendering on the PC (97’) Dynamic Load Balancing for.
Electronic visualization laboratory, university of illinois at chicago Visualizing Very Large Scale Earthquake Simulations (SC 2003) K.L.Ma, UC-Davis.
Where We Stand At this point we know how to: –Convert points from local to window coordinates –Clip polygons and lines to the view volume –Determine which.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
1 CSCE 441: Computer Graphics Hidden Surface Removal Jinxiang Chai.
01/28/09Dinesh Manocha, COMP770 Visibility Computations Visible Surface Determination Visibility Culling.
Hierarchical Occlusion Map Zhang et al SIGGRAPH 98.
An interleaved Parallel Volume Renderer with PC-clusters Antonio Garcia and Han-Wei Shen Dept of Computer Science Ohio State University Eurographics Workshop.
Parallel Computing Presented by Justin Reschke
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
Dynamic Load Balancing Tree and Structured Computations.
Chapter 9 Abstract Data Types and Algorithms Nell Dale John Lewis.
Electronic visualization laboratory, university of illinois at chicago Sort Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Siggraph 2009 RenderAnts: Interactive REYES Rendering on GPUs Kun Zhou Qiming Hou Zhong Ren Minmin Gong Xin Sun Baining Guo JAEHYUN CHO.
Computer Architecture: Parallel Task Assignment
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Week 2 - Monday CS361.
Real-Time Ray Tracing Stefan Popov.
CSCE 441: Computer Graphics Hidden Surface Removal
Introduction to Computer Graphics with WebGL
Sorting and Searching Tim Purcell NVIDIA.
COMP60621 Fundamentals of Parallel and Distributed Systems
COMP60611 Fundamentals of Parallel and Distributed Systems
Presentation transcript:

Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006

Parallel Rendering Graphics rendering process is computationally intensive Parallel computation is a natural measure to leverage for higher performance Two levels of parallelism:  Functional parallelism – pipelining  Data parallelism – multiple results computed at the same time

Rendering Pipeline

Data Parallel Algorithms A lot of taxonomies of categorizing parallel algorithms  Image space vs. object space  Shared memory architecture, distributed memory architecture  MPI, OpenMP, … Need a uniform framework to study and understand parallel rendering

Sorting in Rendering Rendering as a sorting process:  Sort from object coordinates to screen coordinates  Use this concept to study computational and communication costs The key procedure: calculating the effect of each primitive on each pixel Use this concept to study computational and communication costs

Sorting Categories The location of this ‘sort’ determines the structure of the parallel algorithm Sort-first  during geometry processing  distributes “raw” primitives Sort-middle  between geom. processing and rasterization  distributes screen-space primitives Sort-last  during rasterization  distributes pixels/fragments

Sorting cont A landmark paper: “A sorting classification of parallel rendering”, Molner, et. al., IEEE CG&A’94. G G G G G G R R R R R R G G G G G G R R R R R R G G G G G G R R R R R R C C Sort-FirstSort-MiddleSort-Last

Sort First Primitives initially assigned arbitrarily Pre-transformation is done to determine which screen regions are covered Primitives are then redistributed over the network to the correct renderer Renderer performs the work of the entire pipeline for that primitive from that point on

Sort First cont

Screen space is partitioned into non- overlapping 2D tiles, each is rendered independently by a tightly coupled pair of geometry and rasterization processors. Sub-image of 2D tiles are composited without depth comparison.

Analysis Terms Assume a dataset containing n r raw primitives with average size a r. We will call primitives that result from tessellation display primitives. If T is the tessellation ratio, there are n d = Tn r of these, with average size a d = a r /T. If there is no tessellation, T = 1, n d = n r, and a d = a r. Assume an image containing A pixels and need to compute S samples per pixel. Assume that all primitives within the viewing frustum.

Sort-first analysis Pros:  Low communication requirements when tessellation or oversampling are high, or when inter-frame coherence exploited  Processors implement entire rendering pipeline for a given screen region Cons:  Susceptible to load imbalance (clumping)  Exploiting coherence is difficult

Sort Middle Primitives initially assigned arbitrarily Primitives fully transformed, lit, etc., by the geometry processor to which they are initially assigned Transformed primitives are distributed over the network to the rasterizer assigned to their region of the screen

Sort Middle

Sort Middle Analysis Pros:  Redistribution occurs at a “natural” place Cons:  High communication cost if T is high  Susceptible to load imbalance in the same way as sort-first Overhead:  Display primitive distribution cost  Tessellation factor

Sort Last

Defers sorting until the end (imagine phase) Renderers operate independently until the visibility stage Fragments transmitted over network to compositing processors to resolve visibility

Sort Last Analysis Pros:  Renderers implement full pipeline and are independent until pixel merging  Less prone to load imbalance  Very scalable Cons:  Pixel traffic can be extremely high

Image Composition A naïve approach is binary compositing. Each disjoint pair of processors produces a new subimage. N/2 subimages are left after the first stage. Half the number of the original processors are paired up for the next level of compositing hence another half would be idle. The binary-swap compositing method makes sure that every processor participates in all the stages of the process. The key idea – at each compositing stage, the two processors involved in a composite operation split the image plane into two pieces.

Binary Swap Example The binary-swap compositing algorithm for four processors:

Which to choose? It depends. Which ones can be best matched to hardware capabilities? Number of primitives, tessellation factor, coherence, etc., are all considerations. Many tradeoffs.

Load Balancing For better load balancing,  Task queuing: the task queue can be ordered in decreasing task size, such that the concurrency gets finer until the queue is exhausted.  Load stealing: having nodes steal smaller tasks from other nodes, once they have completed their own tasks  Time stamp: timeout stamps used for each task, such that if the node can not finish its task before the timeout, it takes the remnant of the task, re-partitions it and re-distributes it. Hierarchical data structures, such as octree, k-d tree, etc., are commonly used.

References These slides reference contents from  Jian Huang at University of Tennessee at Knoxville  William Gropp and Ewing Lusk at Argonne National Laboratory