Filtering Approaches for Real-Time Anti-Aliasing

Slides:



Advertisements
Similar presentations
Destruction Masking in Frostbite 2 using Volume Distance Fields
Advertisements

Accelerating Real-Time Shading with Reverse Reprojection Caching Diego Nehab 1 Pedro V. Sander 2 Jason Lawrence 3 Natalya Tatarchuk 4 John R. Isidoro 4.
Grass, Fur and all things hairy Nicolas ThibierozKarl Hillesland Gaming Engineering Manager, AMDSenior Research Engineer, AMD.
The Framebuffer February 6, A Configurable Pixel Cache for Fast Image Generation, Gorris et al. Problem Processor speeds have increased to the point.
Exploration of advanced lighting and shading techniques
Deferred Shading Optimizations
Lecture 8 Transparency, Mirroring
An Optimized Soft Shadow Volume Algorithm with Real-Time Performance Ulf Assarsson 1, Michael Dougherty 2, Michael Mounier 2, and Tomas Akenine-Möller.
Filtering Approaches for Real-Time Anti-Aliasing
A Micro-benchmark Suite for AMD GPUs Ryan Taylor Xiaoming Li.
CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.
DSPs Vs General Purpose Microprocessors
Normal Map Compression with ATI 3Dc™ Jonathan Zarge ATI Research Inc.
Frame Buffer Postprocessing Effects in DOUBLE-S.T.E.A.L (Wreckless)
Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Filtering Approaches for Real-Time Anti-Aliasing
The Art and Technology Behind Bioshock’s Special Effects
GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.
Filtering Approaches for Real-Time Anti-Aliasing
Week 10 - Monday.  What did we talk about last time?  Global illumination  Shadows  Projection shadows  Soft shadows.
Fast GPU Histogram Analysis for Scene Post- Processing Andy Luedke Halo Development Team Microsoft Game Studios.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Skin Rendering GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from David Gosselin’s Power Point and article, Real-time skin rendering,
Enhancing and Optimizing the Render Cache Bruce Walter Cornell Program of Computer Graphics George Drettakis REVES/INRIA Sophia-Antipolis Donald P. Greenberg.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
Hand Movement Recognition By: Tokman Niv Levenbroun Guy Instructor: Todtfeld Ari.
© John A. Stratton, 2014 CS 395 CUDA Lecture 6 Thread Coarsening and Register Tiling 1.
© 2004 Tomas Akenine-Möller1 Shadow Generation Hardware Vision day at DTU 2004 Tomas Akenine-Möller Lund University.
Introduction to 3D Graphics John E. Laird. Basic Issues u Given a internal model of a 3D world, with textures and light sources how do you project it.
Computer Graphics: Programming, Problem Solving, and Visual Communication Steve Cunningham California State University Stanislaus and Grinnell College.
Input: Original intensity image. Target intensity image (i.e. a value sketch). Using Value Images to Adjust Intensity in 3D Renderings and Photographs.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
Shadows Computer Graphics. Shadows Shadows Extended light sources produce penumbras In real-time, we only use point light sources –Extended light sources.
Computer Graphics Mirror and Shadows
Filtering Approaches for Real-Time Anti-Aliasing /
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Tiger Woods 2008: Advancements in Environments Peter Arisman Technical Art Director Tiger Woods 2008.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252.
Computer Graphics 2 Lecture 7: Texture Mapping Benjamin Mora 1 University of Wales Swansea Pr. Min Chen Dr. Benjamin Mora.
Hardware-accelerated Rendering of Antialiased Shadows With Shadow Maps Stefan Brabec and Hans-Peter Seidel Max-Planck-Institut für Informatik Saarbrücken,
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
Using wavelets on the XBOX360 For current and future games San Francisco, GDC 2008 Mike Boulton Senior Software Engineer Rare/MGS
GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,
Mobile Graphics Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Igor Jánoš. Goal of This Project Decode and process a full-HD video clip using only software resources Dimension – 1920 x 1080 pixels.
CSCI 440.  So far we have learned how to  build shapes  create movement  change views  add simple lights  But, our objects still look very cartoonish.
Advances in Real-Time Rendering in Games. Dynamic lighting in GOW3.
October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],
WORKING WITH SELECTIONS MASKS and CHANNELS 3D IMAGES LAYER BASICS PHOTO.
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
Postmortem: Deferred Shading in Tabula Rasa Rusty Koonce NCsoft September 15, 2008.
Real-Time Soft Shadows with Adaptive Light Source Sampling
Graphics Processing Unit
Deferred Lighting.
Real-Time Ray Tracing Stefan Popov.
The Graphics Rendering Pipeline
SoC and FPGA Oriented High-quality Stereo Vision System
Digital Image Processing
Static Image Filtering on Commodity Graphics Processors
UMBC Graphics for Games
RADEON™ 9700 Architecture and 3D Performance
Intensity Transformation
Frame Buffer Applications
Frame Buffers Fall 2018 CS480/680.
Presentation transcript:

Filtering Approaches for Real-Time Anti-Aliasing

Filtering Approaches for Real-Time Anti-Aliasing Hybrid CPU/GPU MLAA (on the Xbox 360) Pete Demoreuille

MLAA Algorithm well described by this point As well as benefits and drawbacks This talk focuses on –Shipping hybrid CPU/GPU implementation –Assorted edge detection routines –Integration and use in engine

Example Results

Motivations Engine uses variant of deferred rendering GPU time at a premium, prefer fixed cost SSAA (!) originally used, needed alternative Complicated by time pressure –Added right before shipping Costume Quest –Aspects of implementation show this

Starting Points Reference implementation from paper –Took unspeakable amount of time on PPC First-pass fixed-point implementation –Took ~90ms (not include untiling!) –~21ms for edges, ~4.5ms blending –~65ms for blend weight calculations

Starting Points Fully on cpu, fixed point ~4fps (90ms aa) Gpu edges + rest on Cpu, optimized masks ~68fps (8ms aa+untile) Sample Art Courtesy of Microsoft

Hybrid MLAA: Overview GPU edge detection Variety of color/depth/id data used CPU blend weight computation Fast transpose using tiling and VMX 128 GPU blending

Hybrid MLAA: Timeline GPU CPUs GPU Image From PIX for Xbox 360

Blend Mask CPU blend mask generation –Contains weights for GPU use when filtering GPU to provide input data –Flags for horizontal / vertical edges –Intensity values for blending calculations Hypothesis: calculation bearable, bandwidth not –Fist reduce size as much as possible

Blend Mask Input Use 8bpp: 6-bit luminance, 2-bit H+V edge Our implementation actually uses 16bpp, 1 channel for other stuff

Blend Mask Input Large reduction, bandwidth still an issue –Vertical edges obliterate cache –Transpose image! Do Not Want Do Want

An Aside: Tiling GPU wants CPU wants annoyance? Images Courtesy of Microsoft

Not an Annoyance DXT blocks - One read for 4x4 pixels 64 bits in memory 4x4 pixels VMX Untile

Multiple tiles Into blocks of vector registers Read from few cache lines Store original and transpose

Blend Mask: Transpose Untile creates horizontal, vertical images –We convert from 16bpp -> 8bpp here as well

Blend Mask Now run horizontal mask code twice –Massive bandwidth reduction Horizontal edgesVertical edges

Blend Mask: Threading Last major step: threads –Process interleaved horizontal blocks –Jobs kicked as untiling of blocks completes L2 still warm when mask generation starts –Wait for complete untile before vertical mask Final cost ~4-8ms per thread –Tons of potential optimizations left

Blending GPU reads horizontal and vertical masks –Linear textures, but shader is ALU bound –Performed with color correction, etc

Edge Detection GPU offers additional options –Augment color with stencil, depth, normals –Cheaper to use linear intensity values, if desired Still must work to avoid overblurring! –Image quality suffers (toon-like images, fonts, etc) –Uses excessive CPU, forcing throttling

Start with technique from Brutal Legend Our best results are with raw/projected Z –But hard to tune absolute tolerance –Use ratio of gradient and center depth plus bias Depth-based Edges

Absolute ToleranceRelative Tolerance

Material-based Edges Use a few stencil bits for per-material values Material edgesDepth Edges

Even combinations fail –Choose best for your app (or add more sources) Neither a panacea DepthMaterial

Edge Detection Costume Quest used a blend –Material edges used to increase color tolerance –Stopped short of using normal edges (GPU cost) Stacking went simpler –Pure color/luminance thresholds, with tweaks –Skip gamma, use x ^2 to save fetch cost (or x ^1 …)

Adjust tolerance based on local contrast –Similar to depth approach Avoiding Overblurring

Cutoff MLAA where fully out of focus Skip Unneeded work

Plan for the worst case –Close view of high frequency materials –Enforce wall-clock CPU budget –Can adaptively change threshold Interior flag could help Backup Plan

Integration Memory required: 4x 8bpp buffers –Edge input, blend masks, two temporary buffers Could reduce using pool of tile buffers Latency hidden behind post, parts of next frame Applied after lighting, before DOF+Post+UI GPU cost varies, ms plus z-buffer reload –Lower when work can be added to existing passes

Future Work Better edge detection Quality improvements Many optimizations to code possible –And some to GPU passes Many ideas described in this course applicable!

Thank You! Questions: