Partitioning Screen Space for Parallel Rendering

Slides:



Advertisements
Similar presentations
Visible-Surface Detection(identification)
Advertisements

Load Balancing Parallel Applications on Heterogeneous Platforms.
Sven Woop Computer Graphics Lab Saarland University
Christian Lauterbach COMP 770, 2/16/2009. Overview  Acceleration structures  Spatial hierarchies  Object hierarchies  Interactive Ray Tracing techniques.
Spatial Join Queries. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Approximations of points and polygonal chains
CSE 681 Bounding Volumes. CSE 681 Bounding Volumes Use simple volume enclose object(s) tradeoff for rays where there is extra intersection test for object.
Ray Tracing CMSC 635. Basic idea How many intersections?  Pixels  ~10 3 to ~10 7  Rays per Pixel  1 to ~10  Primitives  ~10 to ~10 7  Every ray.
Week 14 - Monday.  What did we talk about last time?  Bounding volume/bounding volume intersections.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
Chapter 6: Vertices to Fragments Part 2 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
CS 584. Review n Systems of equations and finite element methods are related.
CSE351/ IT351 Modeling And Simulation Choosing a Mesh Model Dr. Jim Holten.
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
A lion in the desert How do you find a lion in the desert? How about when you have a predicate that tells you if the lion is in front or behind a separating.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Randomized Planning for Short Inspection Paths Tim Danner Lydia E. Kavraki Department of Computer Science Rice University.
System Partitioning Kris Kuchcinski
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
OBBTree: A Hierarchical Structure for Rapid Interference Detection Gottschalk, M. C. Lin and D. ManochaM. C. LinD. Manocha Department of Computer Science,
Introduction to Parallel Rendering: Sorting, Chromium, and MPI Mengxia Zhu Spring 2006.
A lion in the desert How do you find a lion in the desert? How about when you have a predicate that tells you if the lion is in front or behind a separating.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
Collision Detection David Johnson Cs6360 – Virtual Reality.
Hidden Surface Removal
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
The sequence of graph transformation (P1)-(P2)-(P4) generating an initial mesh with two finite elements GENERATION OF THE TOPOLOGY OF INITIAL MESH Graph.
Trees for spatial data representation and searching
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Network Aware Resource Allocation in Distributed Clouds.
1 Speeding Up Ray Tracing Images from Virtual Light Field Project ©Slides Anthony Steed 1999 & Mel Slater 2004.
The Visibility Problem In many environments, most of the primitives (triangles) are not visible most of the time –Architectural walkthroughs, Urban environments.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Introduction to Parallel Rendering Jian Huang, CS 594, Spring 2002.
Application Paradigms: Unstructured Grids CS433 Spring 2001 Laxmikant Kale.
CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CIS 350 – I Game Programming Instructor: Rolf Lakaemper.
1Computer Graphics Implementation II Lecture 16 John Shearer Culture Lab – space 2
Implementation II Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico.
Implementation II.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Adaptive Mesh Applications Sathish Vadhiyar Sources: - Schloegel, Karypis, Kumar. Multilevel Diffusion Schemes for Repartitioning of Adaptive Meshes. JPDC.
Partitioning using Mesh Adjacencies  Graph-based dynamic balancing Parallel construction and balancing of standard partition graph with small cuts takes.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Where We Stand At this point we know how to: –Convert points from local to window coordinates –Clip polygons and lines to the view volume –Determine which.
Computer Graphics I, Fall 2010 Implementation II.
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
High Performance Computing Seminar
Auburn University
Computer Graphics Implementation II
Advanced Algorithms Analysis and Design
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
A Continuous Optimization Approach to the Minimum Bisection Problem
Real-Time Ray Tracing Stefan Popov.
Query Processing in Databases Dr. M. Gavrilova
Implementation II Ed Angel Professor Emeritus of Computer Science
Haim Kaplan and Uri Zwick
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
A Fundamental Bi-partition Algorithm of Kernighan-Lin
Introduction to Computer Graphics with WebGL
Adaptive Mesh Applications
David Johnson Cs6360 – Virtual Reality
Implementation II Ed Angel Professor Emeritus of Computer Science
Presentation transcript:

Partitioning Screen Space for Parallel Rendering Thomas Funkhouser JP Singh Jiannan Zheng

Goal Parallel rendering utilizing many PCs Communication via a network SHRIMP Frame Buffers Projectors

Parallel Rendering Challenge Basic problem: Multiple rasterizers cannot write the same pixel simultaneously Processor A Pixel Processor B Image

Screen Space Partitioning Partition screen into “tiles” Can be any shape, even disjoint, but cannot overlap Usually are not one-to-one with projector regions Render each tile on a separate processor Each processor renders all primitives overlapping its tile Primitives are not split at tile boundaries, and thus they may be rendered redundantly by more than one processor

Rendering with Virtual Tiles on the Wall Physical Tiles A B 1 2 C 3 4 D A 1 B 2 C 3 D 4 Frame Buffers Rasterization

Virtual Tile Selection Investigate shapes and arrangements that ... Partition primitives among virtual tiles evenly Complex tiles (concave regions) Minimize overlap of primitives with virtual tiles Match scene geometry (non-rectilinear) Sort primitives among virtual tiles rapidly Simple tiles (grids, boxes) Minimize communication between processors Match physical tiles as much as possible

Load Balancing Problem Given: N: Set of 2D primitives P: Number of processors Find: T: Partition of 2D space with exactly P tiles Minimizing: F(N,T): Objective function encoding factors on previous slide 5 10 5 7 10 1 2

Load Balancing Problem Given: Set of 2D primitives with weights Problem: Partition 2D space into P tiles so that the overall estimated rendering time is minimized cumulative weight of all primitives overlapping any tile is minimized 10 7 1 2 5

Possible Tilings Boundaries Tiles On grid Axis-aligned Linear Piecewise linear Tiles Rectangles Convex Concave Disjoint

Approaches to Partitioning Start with constraints imposed by system, and adjust start with static partition that matches projector assignment based on profiled workload, move work around to balance, in units that match hardware rendering capabilities task stealing or task pushing previous frame partition can be used as starting point Treat as general partitioning problem; constraints may refine repartition from scratch, or use previous frame as starting point Focus on latter approach for now, ignoring system constraints

The General Partitioning Problem Goal: contiguous partitions that are load balanced General class of problems: Mesh partitioning Partition the elements of an irregular mesh such that load is balanced and communication among partitions minimized Dual of mesh partitioning: graph partitioning e.g. nodes of graph are elements that have computation costs, edges denote connectivity and have comm. costs when cut goal: partition to balance and reduce computation and comm. costs Problem: NP-complete, so use heuristics want them to be cheap and effective; exploit structure of problem In polygon rendering: polygons are elements comm. represented by adjacency, to ensure contiguous partitions

Approaches to Partitioning Irregular Meshes Some also apply to many other irregular computations Merge Start with many pieces, then merge Partition Global partitioning methods Multi-level methods Optimization Dynamic adjustment start with some partition, then steal or donate dynamically Local refinement methods start with a guess, and adjust based on localized criteria Hybrids

Merge Methods Random Assignment Scattered Assignment The Greedy Algorithm “grow” partitions from starting points starting points must be well chosen

Merging of Regular Grid Tiles Starting from four corners Try to merge the tile which may make the maximum partition weight grow as less as possible 10 7 1 2 5 Max = 10 10 7 1 2 5 Max = 10 10 7 1 2 5 10 7 1 2 5 Max = 18 Max = 20

Merging of Irregular Tiles Can use irregular initial tiles also. For example, create initial tiles according to primitive geometry. 5 5 10 10 5 5 7 7 1 10 1 10 2 2 Max = 10

Partition Methods Direct P-way Recursive Geometry based partition mesh/domain recursively Graph based partition graph representation recursively

Direct P-way Partition Methods Random or Scattered Assignment Linear, with Bandwidth Reduction order nodes for contiguity, then partition linearly e.g. Morton Ordering, Peano/Hilbert ordering Tree partitioning represent spatial contiguity hierarchically using a tree inorder traversal of tree yields an ordering partition tree “linearly” achieves above effect

Recursive Partition Methods Geometry-based Coordinate Partitioning along X, Y, Z axes Inertial Partitioning choose axes intelligently according to measures of inertia Graph based Layered Partitioning recursive using greedy-like approach on graph Spectral Partitioning find matrix that represents structure of graph (Laplacian matrix) find first nontrivial eigenvector of this matrix (Fiedler vector) use this as separator field for partitioning (e.g. bisection) very good results, but quite expensive to compute

Recursive Partition Whelan’s median-cut method each primitive is represented by its centroid using the number of primitives falling in each region as load estimation recursively divide the longer dimension of the screen using the median-cut until the number of tiles equals the number of processors.

Mueller’s mesh-based hierarchical decomposition method Rendering primitive’s bounding box to a fine mesh, add 1/A to the cell it overlaps (A is the total number of cell it overlaps) Sum the cells weight into a summed area table Recursively divide the screen using binary search

Optimization Methods Develop a cost function (sum of comp and comm costs) Minimize the function, subject to constraints Difficult search problem: many local minima need a good starting guess Refinement based on Global Criteria Simulated Annealing Chained Local Optimization Genetic Algorithms Refinement based on Local Criteria Kernighan-Lin Jostle

Local Refinement Methods Kernighan-Lin swap elements with neighbors to improve matters try all pairs to see which gives best gain in a sweep iterate over sweeps until convergence Jostle similar, but swap in chunks and preferentially swap elements at boundaries can be implemented in parallel

Multilevel and Hybrid Methods Multilevel methods Construct coarse graph/mesh as approximation Partition coarse mesh Project to fine mesh Refine Can do hierarchically Hybrid methods e.g. combine multilevel with local refinement at each level e.g. spectral may be better than inertial, but inertial plus KL may be better and faster than pure spectral

Our Approach 1D case: Partition the screen into vertical strips Define the cost function as the number of primitives overlap each tile. start from any tile assignment, moving the cut so that the tiles on both side of it have costs as balanced as possible, repeat until cannot move any cut. 10 7 1 2 5 Left = 20 Right = 40 Right = 30 Right = 20

Our approach: 2D case 10 7 1 2 5 10 7 1 2 5 5 10 5 7 10 1 2 20 15 10 20 24 20 24 10

Tile swapping Starting from a static assignment, and swap cells on the boundary 10 7 1 2 5 10 1 5 1 7 10 2 17 16 18 16 20 15 19 15

Applying Tree Partitioning to Parallel Rendering Divide image plane into small cells For each bounding box, increment cost of corr. Cells Build cost tree with these cells as leaves Each tree cell holds: total pixel cost for that cell total polygon cost for all polygons fully contained in cell list of polygons (with costs) that are partly contained in cell Partition using costzones but traverse partial polygons list to see if already in partition For display wall: doesn’t (yet) consider static projector assignment doesn’t consider hw rendering unit, unless it is the basic cell

Static Plus Refinement Approach Divide into regions that match projectors a node is responsible for all tiles in its region Use KL or Jostle refinement to rebalance at boundaries use a tile or basic cell as unit of refinement tile can match hardware rendering unit Polygon cost of a tile keep track of polygons that cross different faces of tile if they cross an “internal” face for current partition, no need to subtract this cost from this partition when tile is moved out of this partition if they cross an “external” face, no need to add this cost to the new partition when tile is moved to it Use current partition as initial partition for next frame

Taxonomy of Partition Algorithms What types of splits? How choose where to split? Merging How determine initial tiles? How choose tiles to merge? Optimization What is the state space? What are the operators? What is the objective function? Can partition … Prior to rendering While rendering

Previous Approaches Parallel rendering classifications (Molnar94): Sort-last (object load-balance, sort each pixel) Sort-middle (sort between geometry and rasterization) Sort-first (sort before geometry processing) Usually tightly-coupled processors 3D Primitives 2D Primitives Pixel Primitives Sort middle Sort last Sort first Geometry Processing Rasterization Frame Buffers Database Traversal