The Intersection of Game Engines & GPUs: Current & Future

Slides:



Advertisements
Similar presentations
A Real Time Radiosity Architecture for Video Games
Advertisements

Destruction Masking in Frostbite 2 using Volume Distance Fields
Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Advanced Virtual Texture Topics
Advanced Visual Effects with Direct3D
Deferred Lighting and Post Processing on PLAYSTATION®3
Exploration of advanced lighting and shading techniques
 Welke overwegingen komen daar bij kijken?  Multiplatform?  Install base -> potential user base.
Dragon Age II DX11 Technology
Sven Woop Computer Graphics Lab Saarland University
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Computer graphics & visualization Global Illumination Effects.
The Art and Technology Behind Bioshock’s Special Effects
Advanced Rendering MATERIALS, POSTEFFECTS AND SCENE COMPOSITION GDC
Week 10 - Monday.  What did we talk about last time?  Global illumination  Shadows  Projection shadows  Soft shadows.
3D Graphics Rendering and Terrain Modeling
Real-Time Rendering TEXTURING Lecture 02 Marina Gavrilova.
ABC HFG JKLW OPQR NTU VS YZ.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Chapter 6: Vertices to Fragments Part 2 E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
CGDD 4003 THE MASSIVE FIELD OF COMPUTER GRAPHICS.
X86 and 3D graphics. Quick Intro to 3D Graphics Glossary: –Vertex – point in 3D space –Triangle – 3 connected vertices –Object – list of triangles that.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
Status – Week 277 Victor Moya.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
02/04/03 Page 1 Rendering Visibility Lighting Texturing.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Computer Graphics Mirror and Shadows
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
Technology and Historical Overview. Introduction to 3d Computer Graphics  3D computer graphics is the science, study, and method of projecting a mathematical.
Computer graphics & visualization REYES Render Everything Your Eyes Ever Saw.
CSE 381 – Advanced Game Programming Basic 3D Graphics
Jonathan M Chye Technical Supervisor : Mr Matthew Bett 2010.
Cg Programming Mapping Computational Concepts to GPUs.
Matrices from HELL Paul Taylor Basic Required Matrices PROJECTION WORLD VIEW.
1 Rendering Geometry with Relief Textures L.Baboud X.Décoret ARTIS-GRAVIR/IMAG-INRIA.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Advanced Computer Graphics Advanced Shaders CO2409 Computer Graphics Week 16.
1Computer Graphics Implementation II Lecture 16 John Shearer Culture Lab – space 2
Implementation II.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
CSCE 552 Spring D Models By Jijun Tang. Triangles Fundamental primitive of pipelines  Everything else constructed from them  (except lines and.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Computer Graphics I, Fall 2010 Implementation II.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
09/23/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Reflections Shadows Part 1 Stage 1 is in.
1 Geometry for Game. Geometry Geometry –Position / vertex normals / vertex colors / texture coordinates Topology Topology –Primitive »Lines / triangles.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Chapter 1 An overview on Computer Graphics
A Crash Course on Programmable Graphics Hardware
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
3D Graphics Rendering PPT By Ricardo Veguilla.
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
Understanding Theory and application of 3D
© University of Wisconsin, CS559 Fall 2004
Graphics, Modeling, and Textures
(c) 2002 University of Wisconsin
UMBC Graphics for Games
Ray Tracing on Programmable Graphics Hardware
RADEON™ 9700 Architecture and 3D Performance
Presentation transcript:

The Intersection of Game Engines & GPUs: Current & Future 2.5 Johan Andersson Rendering Architect

Agenda Goal Areas Conclusions Q & A Share and discuss current & future graphics use cases in our games and implications for graphics hardware Areas Engine overview Shaders Parallelization Texturing Raytracing GPU compute Conclusions Q & A Very big topic! The last time I talked about these topics was with one of the IHVs and that lasted 12 hours straight, I will try to keep keep this a little bit shorter  Coffe break aftwards, so the reprocussions for me rambling on for too long shouldn’t be too bad Parallelization, multi-core dispatch

Frostbite DICE proprietary engine Focus Xbox 360 PS3 Windows (Direct3D 10) Focus Large outdoor environments Singleplayer & multiplayer Destruction! New: Content workflows Started developing in 2004 as a way to transition to the next-generation consoles, Xbox 360 and PS3. BFBC pilot project. Release next week

BFBC screenshot

BFBC screenshot

Graph-based surface shaders Rich high-level shading framework Used by all content & systems Artist-friendly Easy to create, tweak & manage Flexible Programmers & artists can extend & expose features Data-centric Encapsulates resources Transformable Independent of lighting & environment Rich data-centric control flow No need to manually specialize shaders to enable/disable features Calculations can be done on any level Per-pixel, per-vertex, per-object, per-frame Split to multiple passes

Shader permutations Generate shader permutations For each used combination of features/data HLSL vertex & pixel shaders Many features = permutation explosion Shader graphs, lighting, geometry Balance perf. vs permutations vs features Dynamic branching Live with many permutations

Shader subroutines Next step: Static subroutine linking Inline in all subroutines at call site Similar to a switch statement Reduces # permutations Implementation moved to driver or GPU Doesn’t work with instancing Future step: Dynamic subroutines Control function pointers inside shader Problem solved, but coherency important Static: Virtual call to evaluateLighting Select linking of shader before draw call Seperate fog and lighting permutations Dynamic subroutines: Direct connection with our instanced shader graph Good coherency for good performance Use cases: Vary per instance, texel, and more.

Rendering & Parallelization Waterfall: Client game -> cull -> visible entity -> render primitive -> system -> shader backend

Jobs Must utilize multi-core Job definition 6 HW threads on Xbox 360 6 SPUs on PS3 2-8 cores on PC Job definition Fully independent stateless function PS3 SPU requirement Graph dependencies Task-parallel and data-parallel Better code structure! Gustafson’s Law Fixed 33 ms/f

Rendering jobs Refactor rendering systems to jobs Jobs Most will move to GPU Eventually One-way data flow Compute shaders & stream output Jobs Decal projection Particle simulation Terrain geometry processing Undergrowth generation [2] Frustum culling Occlusion culling Command buffer generation PS3: Triangle culling Divided up all the different rendering systems into individual parallelizable jobs Jobs: functions, standalone One-way data flow, CPU to GPU and not back. Decal projection GS & stream out

Parallel command buffer recording Dispatch draw calls and state to multiple command buffers in parallel Scales linearly with # cores 1500-4000 draw calls per frame Super-important for all platforms, used on: Xbox 360 PS3 (SPU-based) No support in DX10! One of the first optimizations for multi-core that we did was to move all rendering dispatch to a seperate thread. This includes all the draw calls and states that we set to D3D. This helps a lot but it doesn’t scale that well as we only utilize a single extra core. Gather

DX10 parallel command buffer rec. Single most important DX10 issue For us and many others (in the future) Until future API support Reduce draw calls with instancing Trade GPU performance for CPU performance Reduce state & constant updates Slow dynamic constant path  Manual software command buffers Difficult to update dynamic resources efficiently in parallel due to API 20 ms DX10 dispatch for 2000 draw calls Semi-static constant buffer pain Constant buffer as resource

PS3 geometry processing (1/2) Slow GPU triangle & vertex setup Unique situation with ”free” processors Not fully utilized Solution: SPU triangle culling Trade SPU time for GPU performance Cull back faces, micro-triangles, frustum Sony PS3 EDGE library 5 jobs processes frame geometry in parallel Output is new index buffer for each draw call Only visible triangles

PS3 geometry processing (2/2) Great flexibility and programmability! Custom processing Partition bounding box culling Triangle part culling Clip plane triangle trivial accept & reject Triangle cull volumes (inverse clip planes) Future: No vertex & geometry shaders DIY compute shaders with fixed-func tesselation and triangle setup units Output buffer streaming still important Initially very skeptical Intrinsics problematic

Occlusion culling Buildings occlude objects Difficult to implement Tons of objects Difficult to implement Building destruction Dynamic occludees Heavy GPU occlusion queries Invisible objects still have to Update logic & animations Generate command buffer Processed on CPU & GPU

Software occlusion culling Solution: Rasterize course zbuffer on SPU/CPU Low-poly occluder meshes 100m view distance Max 10000 vertices/frame Manually conservative 256x114 float z-buffer Created for PS3, now on all Cull all objects against zbuffer Before passed to all other systems = big savings Screen-space bbox test Conservative

GPU occlusion culling Want GPU rasterization & testing, but: Occlusion queries introduces overhead & latency Can be manageable, not ideal Conditional rendering only helps GPU Not CPU, frame memory or draw calls Future1: Low-latency extra GPU exec context Rasterization and testing done on GPU Lockstep with CPU Future2: Move entire cull & rendering to GPU Scene graph, cull, systems, dispatch. End goal. Want to rasterize on GPU, not CPU  CPU Û GPU job dependencies

Texturing

Texture formats Using DXT1 replacement needed DXT1/5 color maps, sRGB BC5 (3Dc) normal maps BC4 (DXT5A) for grayscale masks sRGB support for BC4/5 would be nice DXT1 replacement needed Low quality 565 color bleeding RG/RGB masks compresses badly HDR envmaps & lightmaps DXT color bleed BC4/5 sRGB for orthogonality and grayscale colormaps Color bleeding on upsampling, lowres colormaps Transcode BC7 to DXT1 on low-spec HDR format, BC6? Not using RGBE yet RGB DXT1 mask

Future texture sampling Texture sampling derivatives 1st order texel derivatives 2nd order as well? Implement in sampler unit Bad performance or quality with shader sampling Artifacts with ddx/ddy technique Replace normalmaps with easily compressed bumpmaps Bicubic upsampling Terrain masks Terrain heightmap When it comes to the texture samplers on the GPUs, that is one fixed function unit that I would actually want to extend some more. The terrain geometry in Frostbite is represented as an 16-bit integer heightfield to be able to easily support destruction, where we can just render into to heightfield. To light the terrain correctly we C1 continuity Shader sampling vs sampler HW: What matters to us is performance Derived normals [2]

Current sparse textures Save memory for terrain Static quadtree mask texture Dynamic sparse destruction mask Implementation Indirection texture lookup in atlas Arrays too small, want 8192 slices Correct bilinear filtering by borders Siggraph’07 course for details [2] Source mask Atlas, texture arrays to small Atlas texture

HW sparse textures Virtual texture HW texture filtering & mipmapping Fallback on non-resident tile access Lower mipmap, default value or shader bool At least 32k x 32k, fp issues with larger? Application-controlled tile commit/free ~128 x 128 tiles Feedback mechanism for referenced tiles Easy view-dependent allocation Future: Latency-free allocation & generation Alt1. CPU thread callback & block Alt2. Keep everything on GPU. ”Command” shader? GPU control = No frame latency

Cached Procedural Unique Texturing Unique dynamic sparse texture on all objects Defined by texture shader graph Combine procedurals, compositing, streaming and uv-space geometry Dynamically commit & render visible tiles Highly complex compositing Thanks to high frame-to-frame coherency Upsample and refine New dynamic effects made possible Affect every surface Motivation Superset of Megatexture Affect surfaces with destruction, decals, paint in uv-space

Raytracing

Raytracing Much recent debate & interest in RTRT What we are interested in: Performance!! Rasterization for primary rays Deterministic Easy integration into engines Just another method for certain effects & objects Not replace whole pipeline Efficient dynamic geometry Procedural & manual animation (foliage, characters) Destruction (foliage, buildings, objects) Middleware? Going in the direction of having more and more dynamic geometry.

Mirror’s Edge

Raytraced reflections wanted Glass & metal Mostly planar surfaces Reflection locality Correct reflections for important objects Main character Simplified world geometry & shading for rest Common for games Brickmaps? [3] Cars not so important Ratatouiie

Mirror’s Edge Soft reflections Soft reflections for floors & metal surfaces Generally more useful than sharp reflections

GPGPU

GPGPU uses Effect physics AI pathfinding AI visibility Particle vs world soft collision AI pathfinding AI visibility View rasterization. Obstruction from smoke & foliage Procedural animation Trees, undergrowth, hair Post-processing

CUDA DOF post-process filter Thesis work at DICE [4] Test CUDA and performance Poisson disc blur Multi-passed diffusion Seperable diffusion Good: Easy to learn (C) Map complex algorithms Thread & memory control Bad: Performance vs shaders Beta interop Vendor-specific Circle of confusion map Output

GPU Compute programming model Wanted: Easy & efficient Direct3D 10 interop Low-latency Compute tasks Vendor-independent base interface OpenCL? Efficient CPU multi-core backend Server, older GPUs, debugging MCUDA [5] Eventually platform-independent Future consoles Mipmapping?

Conclusions Shader subroutines More software-controlled pipeline More texture sampler functionality Limited-case raytracing GPU compute for games

Contact: johan.andersson@dice.se Questions? Contact: johan.andersson@dice.se

References [1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering Architecture and Real-time Procedural Shading & Texturing Techniques”. GDC 2007. Link [2] Andersson, Johan. ”Terrain Rendering in Frostbite using Procedural Shader Splatting”. Siggraph 2007. Link [3] Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Global Illumination in Complex Production Scenes“. Eurographics Symposium on Rendering 2004. Link [4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-Processing using GPGPU techniques”. Master thesis, 2008. [5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008.

Bonus slides

Real-time REYES Very interesting But Displacement mapping & procedurals Stochastic sampling Potentially more efficient & general Compared to maxed out rasterization & tessellation on everything = pixel-sized triangles But No experience More research & experimentation needed

Terrain detail Deriving normal from heightfield good in distance Future: HW tessellation & procedural displacement shaders for up close ground detail

Texture arrays Use cases: More slices plz Everything! Rich parameterized shaders Vary slice index per instance, triangle or texel Instancing without comprimising on variation or perf. Cascaded shadow maps HW PCF only in DX 10.1  Stable Cascaded Bounding Box Shadow Maps Sparse textures More slices plz For tile pools. 64x64x8192 Tiling Merge shaders, single brdf

Other raytracing uses Global Illumination & Ambient Occlusion Incremental Photon Mapping? Async collision raycasts AI pathfinding, gameplay, sound obstruction Seperate collision world from visual world CPU job-based now Semi-static environment with destruction