Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD

Slides:



Advertisements
Similar presentations
Visible-Surface Detection(identification)
Advertisements

Grass, Fur and all things hairy Nicolas ThibierozKarl Hillesland Gaming Engineering Manager, AMDSenior Research Engineer, AMD.
OIT and Indirect Illumination using DX11 Linked Lists
Exploration of advanced lighting and shading techniques
Agenda Overview Collisions Sorting Tiled Rendering Conclusions
Deferred Shading Optimizations
An Optimized Soft Shadow Volume Algorithm with Real-Time Performance Ulf Assarsson 1, Michael Dougherty 2, Michael Mounier 2, and Tomas Akenine-Möller.
CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.
Practical Clustered Shading
Solving Some Common Problems in a Modern Deferred Rendering Engine
Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.
HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Chapter 6: Vertices to Fragments Part 2 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
1 Dr. Scott Schaefer Hidden Surfaces. 2/62 Hidden Surfaces.
Vertices and Fragments I CS4395: Computer Graphics 1 Mohan Sridharan Based on slides created by Edward Angel.
Extending GRAMPS Shaders Jeremy Sugerman June 2, 2009 FLASHG.
Program Design and Development
Skin Rendering GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from David Gosselin’s Power Point and article, Real-time skin rendering,
Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.
1 CSCE 441: Computer Graphics Hidden Surface Removal (Cont.) Jinxiang Chai.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Z-Buffer Optimizations Patrick Cozzi Analytical Graphics, Inc.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
4.2. D EFERRED S HADING Exploration of deferred shading (rendering)
Graphics Pipeline Hidden Surfaces CMSC 435/634. Visibility We can convert simple primitives to pixels Which primitives (or parts of primitives) should.
Graphics Pipeline Hidden Surface CMSC 435/634. Visibility We can convert simple primitives to pixels/fragments How do we know which primitives (or which.
Ray Tracing Primer Ref: SIGGRAPH HyperGraphHyperGraph.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
NVIDIA PROPRIETARY AND CONFIDENTIAL Occlusion (HP and NV Extensions) Ashu Rege.
-Global Illumination Techniques
Week 2 - Friday.  What did we talk about last time?  Graphics rendering pipeline  Geometry Stage.
Arrays Module 6. Objectives Nature and purpose of an array Using arrays in Java programs Methods with array parameter Methods that return an array Array.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Pointers OVERVIEW.
Shadow Mapping Chun-Fa Chang National Taiwan Normal University.
Tiled Forward Shading Johan Medeström. Project Goals Render a scene with lots of lights Learn more OpenGL and shading techniques Learn more about OpenCL/Compute.
1Computer Graphics Implementation II Lecture 16 John Shearer Culture Lab – space 2
Implementation II Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts University of New Mexico.
Recursion and Data Structures in Computer Graphics Ray Tracing 1.
CS161 Topic #16 1 Today in CS161 Lecture #16 Prepare for the Final Reviewing all Topics this term Variables If Statements Loops (do while, while, for)
Implementation II.
Sample Based Visibility for Soft Shadows using Alias-free Shadow Maps Erik Sintorn – Ulf Assarsson – uffe.
CSE 381 – Advanced Game Programming GLSL. Rendering Revisited.
Basic Perspective Projection Watt Section 5.2, some typos Define a focal distance, d, and shift the origin to be at that distance (note d is negative)
Global Illumination. Local Illumination  the GPU pipeline is designed for local illumination  only the surface data at the visible point is needed to.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
Single Pass Point Rendering and Transparent Shading Paper by Yanci Zhang and Renato Pajarola Presentation by Harmen de Weerd and Hedde Bosman.
Computer Graphics I, Fall 2010 Implementation II.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Arrays Chapter 7. MIS Object Oriented Systems Arrays UTD, SOM 2 Objectives Nature and purpose of an array Using arrays in Java programs Methods.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
Compositing and Rendering
Real-Time ray casting for virtual reality
Computer Graphics Implementation II
Week 2 - Friday CS361.
Patrick Cozzi University of Pennsylvania CIS Fall 2013
DX11 TECHNIQUES IN HK2207 Takahiro Harada AMD. DX11 TECHNIQUES IN HK2207 Takahiro Harada AMD.
Graphics Processing Unit
Deferred Lighting.
3D Graphics Rendering PPT By Ricardo Veguilla.
The Graphics Rendering Pipeline
End-of-Term Winter Progress Report
Implementation II Ed Angel Professor Emeritus of Computer Science
UMBC Graphics for Games
(c) 2002 University of Wisconsin, CS 559
Lecture 13 Clipping & Scan Conversion
Introduction to Computer Graphics with WebGL
Frame Buffer Applications
Implementation II Ed Angel Professor Emeritus of Computer Science
Presentation transcript:

Technology Behind AMD’s “Leo Demo” Jay McKee MTS Engineer, AMD

Why Forward Rendering? Complex materials Multiple light types Supports hardware anti-aliasing Efficient memory usage Supports transparency BUT, previously could not support a large number of lights

Forward+ Rendering Modified forward renderer. Add computer shader for light culling. Modify main light loop. Lighting and shading done in the same place, all information is preserved.

Forward+ Rendering (continued) No limits on parameters for lights and materials Omni Spot Cinematic (arbitrary falloffs, barndoor) BRDF per material instance Simple design, concentrate on rendering, not engine maintenance.

Important DX11 features Compute Shaders UAV support.

Compute Shaders In Leo demo we use two compute shaders: One for culling lights. Another for spawning Virtual Point Lights (VPLs) for indirect lighting. Culling 3,072 lights takes 1.7 ms on high end GPU.

UAVs Array(s) of scene light information. Array of u32 light indices for storing start/end lights per-tile. Array of material instance data

Algorithm summary Depth Pre-Pass Light Culling Screen divided into tiles. Launch compute shader per tile. Light info such as position, radius, direction, length passed to light culling compute shader. Light culling shader projects lights bounds to screen-space tiles. Uses scene depth from z pre-pass for z testing against light volumes. Outputs to UAV describing per tile light list start/end along with a large UAV of u32 array of light indices. Output UAVs are passed to main light shaders for looping through lights per-pixel.

Algorithm summary continued Render scene materials Base light accumulation function Use screen x, y location to determine tileID From tileID, get light start and end indices From start index to end index, loop Entry is index into light array. Accumulate light hitting pixel Returns total direct and indirect light hitting pixel.

Algorithm summary continued Material shader Decides what to do with total incoming light Passed into material’s BRDF for example Uses light accumulation building blocks Env. lighting, base light accumulation, BRDF, etc. are put together for final pixel color.

Light Culling Shader Details (1/3) // 1. prepare float4 frustum[4]; float minZ, maxZ; { ConstructFrustum( frustum ); minZ = thread_REDUCE(MIN, depth ); maxZ = thread_REDUCE(MAX, depth ); ldsMinZ = SIMD_REDUCE(MIN, minZ ); ldsMaxZ = SIMD_REDUCE(MAX, maxZ ); minZ = ldsMinZ; maxZ = ldsMaxZ; }

Light Culling Shader Details (2/3) __local u32 ldsNLights = 0; __local u32 ldsLightBuffer[MAX]; // 2. overlap check, accumulate in LDS for(int i=threadIdx; i<nLights; i+=WG_SIZE) { Light light = fetchAndTransform( lightBuffer[ i ] ); if( overlaps( light, frustum ) && overlaps ( light, minZ, maxZ ) ) AtomicAppend( ldsLightBuffer, i ); }

Light Culling Shader Details (3/3) // 3. export to global __local u32 ldsOffset; if( threadIdx == 0 ) { ldsOffset = AtomAdd( ldsNLights ); globalLightStart[tileIdx] = ldsOffset; globalLightEnd[tileIdx] = ldsOffset + ldsNLights; } for(int i=threadIdx; i< ldsNLights; i+=WG_SIZE) int dstIdx = ldsOffset + i; globalLightIndexBuffer[dstIdx] = ldsLightBuffer[i];

Light Accumulation Pseudo-code // BaseLighting.inc // THIS INC FILE IS ALL THE COMMON LIGHTING CODE StructuredBuffer<float4> LightParams : register(u0); StructuredBuffer<uint> LowerBoundLights : register(u1); StructuredBuffer<uint> UpperBoundLights : register(u2); StructuredBuffer<int2> LightIndexBuffer : register(u3); uint GetTileIndex(float2 screenPos) { float tileRes = (float)m_tileRes; uint numCellsX = (m_width + m_tileRes - 1)/m_tileRes; uint tileIdx = floor(screenPos.x/tileRes)+floor(screenPos.y/tileRes)*numCellsX; return tileIdx; }

Light Accumulation (2): StartHLSL BaseLightLoopBegin // THIS IS A MACRO, INCLUDED IN MATERIAL SHADERS uint tileIdx = GetTileIndex( pixelScreenPos ); uint startIdx = LowerBoundLights[tileIdx]; uint endIdx = UppweBoundLights[tileIdx]; [loop] for ( uint lightListIdx = startIdx; lightListIdx < endIdx; lightListIdx++ ) { int lightIdx = LightIndexBuffer[lightListIdx]; // Set common light parameters float ndotl = max(0, dot(normal, lightVec)); float3 directLight = 0; float3 indirectLight = 0;

Light Accumulation (3): if( lightIdx >= numDirectLightsThisFrame ) { CalculateIndirectLight(lightIdx , indirectLight); } else { if( IsConeLight( lightIdx ) ) { // <<== Can add more light types here CalculateDirectSpotlight(lightIdx , directLight); } else { CalculateDirectSpherelight(lightIdx , directLight); } float3 incomingLight = (directLight + indirectLight)*ndotl; float shadowTerm = CalcShadow(); EndHLSL StartHLSL BaseLightLoopEnd

Material Shader Template: #include "BaseLighting.inc" float4 PS ( PSInput i ) : SV_TARGET { float3 totalDiffuse = 0; float3 totalSpec = GetEnvLighting();; $include BaseLightLoopBegin // unique material code goes here!! Light accumulation on the pixel for a given light // we have total incoming light and direct/indirect light components as well as material params and shadow term // use these building blocks to integrate lighting terms totalDiffuse += GetDiffuse(incomingLight); totalSpec += CalcPhong(incomingLight); $include BaseLightLoopEnd float3 finalColor = totalDiffuse + totalSpec; return float4( finalColor, 1 ); }

Debug Mode Demo

Benchmark 3k dynamic lights

Compute-based Deferred v.s. Forward+ Takahiro Harada, Jay McKee, Jason C.Yang, Forward+: Bringing Deferred Lighting to the Next Level, Eurographics Short Paper (2012)

Depth Pre-Pass Critical Pixel overdraw cripples this technique so depth pre-pass is required. Depth pre-pass is good opportunity to use MRT to generate other full-screen data needed for post-fx and other render fx (optional).

Other important points XBOX 360 has good bandwidth so given limitations on forward rendering, deferred makes a lot of sense. However, ALU computation growing at faster rate than bandwidth. more and more feasible to just do the calculations than to read/write so much data. Dynamic branching penalties not nearly as bad as before. As an optimization, compute shader can sort by light-type for example to minimize penalties. All that "light management" CPU side code to decide which lights hit each object for setting constant registers can be ditched!

Summary Modified forward renderer that handles scenes with 1000s of lights. Hardware anti-aliasing (MSAA) “automatic” Bandwidth friendly. Makes the most of the GPU's ALU power (which is growing faster than bandwidth)

Thanks! Contact: Takahiro.Harada@amd.com jay.mckee@amd.com jasonc.yang@amd.com Leo Demo website: http://developer.amd.com/samples/demos/pages/AMDRadeonHD7900SeriesGraphicsReal-TimeDemos.aspx Eurographics 2012: 'Forward+: Bringing Deferred Lighting to the Next Level'