GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.

Slides:

Advertisements

Similar presentations

Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.

Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.

Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

Graphics Pipeline.

RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.

Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.

Interactive Rendering using the Render Cache Bruce Walter, George Drettakis iMAGIS*-GRAVIR/IMAG-INRIA Steven Parker University of Utah *iMAGIS is a joint.

Render Cache John Tran CS851 - Interactive Ray Tracing February 5, 2003.

Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.

Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray

A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.

Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.

3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.

Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.

IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.

Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg.

Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages I. Viola, A. Kanitsar, M. E. Gröller Institute of Computer Graphics.

Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.

The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

Realtime 3D Computer Graphics Computer Graphics Computer Graphics Software & Hardware Rendering Software & Hardware Rendering 3D APIs 3D APIs Pixel & Vertex.

Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.

Enhancing GPU for Scientific Computing Some thoughts.

Programmable Pipelines. Objectives Introduce programmable pipelines Vertex shaders Fragment shaders Introduce shading languages Needed to describe.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.

Graphics Hardware and Graphics in Video Games COMP136: Introduction to Computer Graphics.

Computer Graphics Graphics Hardware

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

Sebastian Enrique Columbia University Real-Time Rendering Using CUReT BRDF Materials with Zernike Polynomials CS Topics.

Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.

Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.

Cg Programming Mapping Computational Concepts to GPUs.

Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.

Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.

Interactive Visualization of Exceptionally Complex Industrial CAD Datasets Andreas Dietrich Ingo Wald Philipp Slusallek Computer Graphics Group Saarland.

Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.

Real-time Graphics for VR Chapter 23. What is it about? In this part of the course we will look at how to render images given the constrains of VR: –we.

GPU Computation Strategies & Tricks Ian Buck NVIDIA.

Interactive Rendering With Coherent Ray Tracing Eurogaphics 2001 Wald, Slusallek, Benthin, Wagner Comp 238, UNC-CH, September 10, 2001 Joshua Stough.

Accelerated Stereoscopic Rendering using GPU François de Sorbier - Université Paris-Est France February 2008 WSCG'2008.

Efficient Streaming of 3D Scenes with Complex Geometry and Complex Lighting Romain Pacanowski and M. Raynaud X. Granier P. Reuter C. Schlick P. Poulin.

- Laboratoire d'InfoRmatique en Image et Systèmes d'information

Graphics Graphics Korea University cgvr.korea.ac.kr 1 7. Speed-up Techniques Presented by SooKyun Kim.

A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.

Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.

Graphics Interface 2009 The-Kiet Lu Kok-Lim Low Jianmin Zheng 1.

Real-Time Relief Mapping on Arbitrary Polygonal Surfaces Fabio Policarpo Manuel M. Oliveira Joao L. D. Comba.

From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.

COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

Ray Tracing by GPU Ming Ouhyoung. Outline Introduction Graphics Hardware Streaming Ray Tracing Discussion.

Radiance Cache Splatting: A GPU-Friendly Global Illumination Algorithm P. Gautron J. Křivánek K. Bouatouch S. Pattanaik.

Real-Time Relief Mapping on Arbitrary Polygonal Surfaces Fabio Policarpo Manuel M. Oliveira Joao L. D. Comba.

Postmortem: Deferred Shading in Tabula Rasa Rusty Koonce NCsoft September 15, 2008.

COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.

SHADOW CASTER CULLING FOR EFFICIENT SHADOW MAPPING JIŘÍ BITTNER 1 OLIVER MATTAUSCH 2 ARI SILVENNOINEN 3 MICHAEL WIMMER 2 1 CZECH TECHNICAL UNIVERSITY IN.

Computer Graphics Graphics Hardware

GPU Architecture and Its Application

Combining Edges and Points for Interactive High-Quality Rendering

A Crash Course on Programmable Graphics Hardware

Scalability of Intervisibility Testing using Clusters of GPUs

CS427 Multicore Architecture and Parallel Computing

Graphics Processing Unit

Real-Time Ray Tracing Stefan Popov.

Chapter 6 GPU, Shaders, and Shading Languages

Hybrid Ray Tracing of Massive Models

Computer Graphics Graphics Hardware

RADEON™ 9700 Architecture and 3D Performance

CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders

Interactive Sampling and Rendering for Complex and Procedural Geometry

Presentation transcript:

GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce Walter Kavita Bala

Motivation High quality shading is still too slow. –Not ready for interactivity. –It is slow even on the GPU. Potential applications. –Architecture. –Modeling. –Movies.

Overview GPU acceleration of the Render Cache and the Edge-and-Point Image (EPI). PointsRender Cache reconstructionEPI reconstructionEdges and Points

Render Cache overview ProjectionDepth cullInterpolation

Edge-and-Point Image overview Alternative display representation Edge-constrained interpolation preserves sharp features Fast anti-aliasing EPI Naive

Presented work Mapping to the hardware –The algorithm’s components differ from standard hardware rendering. –Overcome GPU limitations. Results –GPU strategies. –Better interactivity.

Related Work Interactive. –Shading cache. [Tole02] –Corrective texturing. [Stamminger00] –Tapestry. [Simmons00] –Adaptive Frameless Rendering. [Dayal05] –Distance impostors. [Szirmay-Kalos05] Non-interactive. –Irradiance caching. [Smky05] Pure Hardware implementations. –Ray tracing. [Purcell02, Carr06] –Photon mapping. [Purcell03]

Talk overview Algorithm overview. Mapping to the hardware: strategies and challenges. Results. Discussion.

Overview

Public availability The complete Cg source of the shaders is available online:

Talk overview Algorithm overview. Mapping to the hardware: strategies and challenges. Results. Discussion.

Mapping to the hardware Sections are grouped on computational similarity: –Point processing –Edge finding –Edge constrained interpolation Most of the processing has been moved to the GPU.

Point processing 4 one-pixel points 1 splat point using one quarter of the point cloud Point Cloud as Vertex Buffer Object (VBO) and Texture. Multiple Render Targets (MRT) used to write all information in a single pass. Simplified predicted projection. –Not as accurate as the regular projection.

Point processing: Update Render Cache’s structures are complex to map. We cannot modify pipelined GPU data. –Use additional passes. Point Cloud Vertex and Pixel shaders Point Image Point projector

Point processing: Bandwidth issues Point projection is bandwidth limited. –Point cloud update. –New samples request. Write to the point cloud only the new samples. –We use vertex scatter. –Faster than replacing all the point cloud. A static VBO is projected three times faster than a constantly modified one.

Silhouette detection The original EPI uses hierarchical trees. –Does not map well to GPU. Brute force method on the GPU. –Avoid edges transfer every frame. –Faster than hierarchical structures! Shadow edge detection left on the CPU. Edge texture Model edges

Silhouette detection: Limitations GPU silhouette detection is limited by the fill rate. Texture memory constraints. –We need to keep all vertices as VBO. –Vertices and normals as textures. –One results texture. Normals stored as fp16 to reduce space.

Edge Raster Raster edges with subpixel precision. Depends on model complexity. Extended lines as described in SEN03. Filtered depth as read-only depth buffer. –Free occlusion culling! No depth texture With depth texture

Edge Constrained Interpolation Multi-pass pixel shaders. –Very long. –A lot of texture accesses. Image resolution dependent. Use look-up tables encoded as textures. –Avoid control code in shaders. –Encode original EPI operations.

Future trends Branching granularity. –Some filters require fine granularity to take advance of dynamic branching. –This issue is being solved with newer cards beginning with ATI X1000 series. Bit operations not directly supported. –DirectX 10 will support them. Bottom line: GPU implementation will get better and faster.

Limitations Fill rate and texture access. –These characteristics constantly improve with newer hardware with more pipelines and faster clock frequencies. Improve by diminishing shaders length. –Number of registers used is still important. –A 180 instructions shader with 25 registers performs 50% slower than a 215 instructions shader with and 24 registers on our GPU.

Talk overview Algorithm overview. Mapping to the hardware: strategies and challenges. Results. Discussion.

Test platform Test environment. –Software written in C++, Cg 1.4rc, and Java through JNI under Windows XP. –Pentium 4 EE 3.2 Ghz dual core, 2 GB RAM, dual Nvidia GeForce 7800 GTX (81.85). Test scenes. –Cornell Box –Chains –Mackintosh Room –David Head –Dragon

Results: FPS GPU version is 60–110% faster than the original. –Speed up increases along with scene complexity.

Results: Speed increase from CPU

Results: Rendering times

Talk overview Algorithm overview. Mapping to the hardware: strategies and challenges. Results. Discussion.

Discussion Point projection, even though it maps straightforwardly to the GPU is the bottleneck. Image filters are very fast in spite of their multiple texture accesses and multiple passes. We originally thought the opposite would be true!

Discussion Projection is not optimal. We wanted to use Vertex Texture Fetch (VTF) for mapping the point cloud update but it was slower than Render to Vertex Array (RTV). Dual GPU rendering with Scalable Link Interface (SLI) showed marginal gains.

Future performance Texture accesses are very fast and efficient. Transferring vertex data on the GPU is too slow to be fully useful. Scatter write on pixel shaders and geometry shaders may allow complete data management on the GPU.

Conclusions We presented a hybrid GPU/CPU system for the Render Cache and the EPI using commodity graphics hardware. Our implementation is 60−110% faster than a pure CPU implementation and frees the CPU up for other operations. System’s performance is likely to improve with the current trend of GPUs.

Questions? Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware