Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visibility Optimization for Games

Similar presentations


Presentation on theme: "Visibility Optimization for Games"— Presentation transcript:

1 Visibility Optimization for Games
Sampo Lappalainen Lead Programmer Umbra Software Ltd.

2 Introduction Background in graphics programming
Hybrid Graphics, NVIDIA, Umbra Software With Umbra since 2008 Graphics middleware for console and PC games Emphasis on visibility

3 Roadmap Motivation Theory Practice Other applications Demo

4 Why is visibility optimization important?
Motivation

5 Game World

6 Our Villain

7 3/25/2017 9:14 AM Our Hero Tehdään grafiikka moottori joka piirtää kamaa ruudulle -> helppoa. Artsti tekee modeleita -> modelit annetaan graffamoottorille ja piirretään. Inskät optimoi graffamoottoria ja artistit optimoi graffaa kunnes performance on kunnossa. Sit päädytään tähän tilanteeseen...

8 3/25/2017 9:14 AM Screen Shot Miten päädyttiin alkuperäsestä tilanteesta tähän? Tekki rajotti toimintaa niin paljon, että tää oli parasta mitä saatiin aikaan.

9 Game Worlds Game developers want to make impressive game worlds
Hardware sets limits on what can and can’t be done. Game developers need to push the hardware to it’s limits. Pelidevaajat tekee tekkiä jotta pelit saatas näyttämään hyvältä. Artistit pystyis tekemään hienompaa kamaa. Ongelmana ei oo piirtää hienoa grafiikkaa, ongelmana on piirtää hienoa grafiikkaa tarpeeks nopeesti.

10 Visibility Optimization
The most effective way to gain performance in games. Two basic ways to do visibility optimization: art and level design technology Games use a mix of both.

11 Visibility Optimization by Level Design
Artists design game worlds so that performance is acceptable. Can be done in numerous ways e.g.: limiting view distance limiting polygon or object count modeling portals and cells

12 Visibility Optimization by Level Design

13 Visibility Optimization by Level Design
3/25/2017 9:14 AM Visibility Optimization by Level Design Time consuming and usually boring work. Sets huge limits on what can and cannot be done. May lead to monotonic level design. Manual and non-recurring work.

14 Visibility Optimization by Technology

15 Visibility Optimization by Technology

16 Visibility Optimization by Technology
Gains: No time wasted on rendering objects that don’t contribute to the output image (no state changes, no draw calls etc). AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.

17 Walkthrough of the key concepts
Theory

18 Terminology Culling – removing hidden objects from rendering
Target – object that can be hidden by others Occluder – an object that blocks visibility Rendering artifact – A non-intended glitch in the output image

19 Metrics for comparison
GPU cost CPU cost Overall frame time Memory usage Precomputation time Manual work Culling power

20 Backface culling Taken care of by the HW
Culling entire triangles based on their winding No need to render the insides of an object

21 Depth buffering Taken care of by the HW
A two dimensional buffer for storing z-values for each screen pixel Before processing shaders for a pixel to be rendered, test the z-value. Allows drawing of unsorted geometry, however sorting still greatly improves performance

22 Hierarchical depth buffering
Replace depth buffer with a depth pyramid Bottom of the pyramid: full-resolution depth buffer Higher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below level Hierarchically rasterize the polygon starting from the highest level If polygon is further than the recorded pixel, early exit If polygon is closer, hierarchically test the lower levels If the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramid TODO kuva TODO viite

23 Spatial hierarchies Enabled culling large portions of the game world with a single quick test Dynamic objects can be moved in the hierarchy runtime BSP-tree, kd-tree TODO rename slide TODO pictures from Teppo’s presentation

24 Spatial hierarchies

25 View frustum culling Culling objects that are outside the camera view cone Test using object bounds Tremendous speed-up using an hierarchy TODO code?

26 View Frustum Culling 3/25/2017 9:14 AM
VFn sisällä on vielä paljon cullattavaa.

27 3/25/2017 9:14 AM View Frustum Culling

28 Potentially Visible Set - PVS
A data structure that defines from-region-visibility for a scene Computed in pre-process Scene is divided into Cells Compute a bit matrix that lists all the visible objects for each cell Runtime a simple matrix lookup How to find a good sub-division for a scene? Cannot handle dynamic occluders Target volume: extension to handle dynamic targets TODO lähteet

29 Portals Place portals in the scene that connect the cells to form a portal graph In runtime, find the portals of the current cell that are in the frustum Traverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portal Same limitations with dynamic objects as with PVS systems TODO lähteet

30 Rasterization-based Render occluder geometry into a software coverage buffer Test visibility using test geometry Use temporal coherence to determine the initial set to be rendered Handles both dynamic targets and occluders as long as they have occluder geometry

31 Testing from coverage buffer

32 Testing from coverage buffer

33 Testing from coverage buffer

34 Testing from coverage buffer

35 Testing from coverage buffer

36 Testing from coverage buffer

37 Testing from coverage buffer

38 Testing from coverage buffer

39 Occlusion Queries Supported by GPUs since 2001.
GPU answers the question: “how many pixels would have been visible if this object would have been rendered”? Instead of rasterizing your own depth buffer, use the GPU depth buffer instead Normally the query is done using bounding volumes (effective but not necessary). No need for artist generated occluder geometry GPU-CPU synchronization needed

40 3/25/2017 9:14 AM Occlusion Queries Determine the set of visible objects against the actual rendered geometry: all pixels can be used as occluding material!

41 Using Occlusion Queries
Occlusion queries are a really powerful tool for visibility optimization. Like all other features of the GPU occlusion queries can be used ineffectively. Special tricks are needed to get the most out of occlusion queries.

42 Issuing Occlusion Queries
disableColorWrite(); disableDepthWrite(); startQueryCounter(); renderObjectBounds(); stopQueryCounter(); enableColorWrite(); enableDepthWrite(); if (query->getResult() > 0) renderObject();

43 CPU-GPU synchronization
With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing). With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not. The CPU needs to wait for the query results to be available. No parallel processing (which is really bad).

44 Issuing Occlusion Queries
3/25/2017 9:14 AM Issuing Occlusion Queries Objekti 3 on tullu just näkyviin.

45 Issuing Occlusion Queries

46 Issuing Occlusion Queries

47 Issuing Occlusion Queries
Fortunately GPU design has a solution for this problem. GPUs can store multiple occlusion query results. Occlusion queries can be batched. Some GPUs have a limit on how many query results can be stored.

48 Batching Occlusion Queries
disableColorWrite(); disableDepthWrite(); for (each query) { startQueryCounter(); renderObjectBounds(); stopQueryCounter(); } enableDepthWrite(); enableColorWrite(); if (query->getResult() > 0) renderObject();

49 Batching Occlusion Queries

50 Latent Occlusion Queries
Some stalls may be introduced between frames. The last query result needs to be read back before continuing. Avoid GPU stalls by using the query results from the previous frame. Read back the query results at the beginning of each frame. Sounds like a perfect solution?

51 Latent Occlusion Queries

52 Latent Occlusion Queries
There are downsides to this. Visible popping artifacts when objects come visible. If the camera is moving slowly and FPS is good, no problem. When multiple objects become visible FPS typically drops (there’s a lot more to render) For example when a door is opened.

53 Latent Occlusion Queries

54 Latent Occlusion Queries

55 Latent Occlusion Queries

56 Latent Occlusion Queries
Queries done to hierarchy nodes produce even larger artifacts Growing bounds helps, but is difficult to get to work with hierarchical queries The stall in using occlusion query results on the same frame may be as short as 0.1ms (on XBOX 360) In this a price developers are ready to pay for artifact free occlusion culling?

57 Parallelism Most gaming platforms today come with more than one CPU
Using the same algorithm for multiple cameras (splitscreen, AI bots, light sources) Tile-based rasterization Parallel data structure traverse TODO note about SIMD? TODO MORE BEEF!

58 What kind of systems have really been used?
Practice

59 Binary Space Partitioning
As made famous by Doom and the Quake series A tree data structure for representing the scene Gordon and Chen 1991 paper used in Doom ( Teller’s 1992 PhD thesis used in Quake ( TODO rethink

60 Binary space partitioning
Before Doom BSP’s were used to do sorting for the painter’s algorithm (back-to-front) Painter’s algorithm is too slow for large scenes Solution: change the order to front-to-back and keep track on which pixels have been drawn Quake introduced a pre-process step for computing a PVS based on the BSP model TODO rethink

61 Umbra 1 Used in Star Wars Galaxies, EverQuest 2, Age of Conan, Kingdom Heroes 2, Tian Xia 2 A data structure that supports dynamic and static visibility Software rasterizer and occlusion queries supported

62 Umbra 1 Database Visibility traverse Spatial bounding volume hierarchy
User updates Visibility traverse Input: camera parameters Output: visible object set Hierarchical visibility testing: a single query can hide large parts of the scene TODO describe how it works

63 Hierarchical Culling In typical game scenes most of the scene is hidden at any given point of view Problem: the size of the whole scene effects performance (input sensitive system). Only the visible objects are supposed to effect performance (output sensitive system).

64 Hierarchical Culling

65 Hierarchical Culling Solution:
build a spatial hierarchy for the objects in the scene Culling hidden parts of the scene in constant time Occlude groups of objects: if a hierarchy node is hidden all nodes below it are also hidden

66 Hierarchy Traversal Traverse the hierarchy to determine visibility
Use temporal coherency On first frame, start from the root Store nodes where traversal ended and start traversing them on the next frame Nodes form a visibility barrier

67 Hierarchy Traversal

68 Hierarchy Traversal

69 Hierarchy Traversal

70 Dynamic Objects Object geometry may change (e.g. due to LODing).
Objects may move If object geometry changes it may not fit into its old bounds Move the object upwards in the hierarchy so that the bounds can fit inside a node Push the object back down once there is idle time Esimerkki seuraa.

71 3/25/2017 9:14 AM Dynamic Objects If the object moves temporal bounding volumes can be used. Use history info to predict the object movement. The TBV doesn’t have to be updated every frame.

72 Dynamic Objects

73 Dynamic Objects

74 Umbra 2 Multi-core version of the previous tech
Used in e.g. Mass Effect 2, Dragon Age series, Alan Wake

75 Multi-core culling Two subtasks: rendering and visibility traversal
Rendering issues rendering calls and occlusion queries. Visibility processing takes care of hierarchy processing and high level culling (e.g. vf culling).

76 Multi-core culling Game tread needs to do updates before our visibility thread can continue (camera and object updates) Visibility thread updates the hierarchy After update the hierarchy can be traversed

77 Multi-core culling

78 Multi-core culling While the visibility thread is idle it can update the hierarchy: lazy hierarchy building collapsing nodes visibility barrier updates moving dynamic objects down etc.

79 Umbra 3 Used by Unity 3D, Secret Studio
Collection of visibility algorithms Umbra 1-2 feature sets Automatic portal generation in pre-process CPU rasterization and ray-tracing based portal culling algorithms PVS culling for low end systems

80 Umbra 3 Uses real geometry, no need for artists to create occluder geometry Support for streaming, distance queries, intersection queries

81 Automatic portal generation
Works with both outdoor and indoor scenes Conservative occlusion The output is a graph where the nodes are cells and the edges are the portals Optionally a PVS can be computed Incremental updates

82 Umbra 3 recursive portal culling
Recursive traverse of the portal graph from the camera view point, ray tracing Very accurate culling results Too slow for whole scene culling, currently used for reference and for dynamic object culling

83 TODO Video

84 Umbra 3 optimized portal culling
Rasterize the portals into a coverage buffer Fast enough for even outdoor scenes In some cases over-estimates the visible set TODO kuva miten toimii oikeasti

85

86 Umbra 3 PVS culling Extremely fast
Needed for low end systems such as smart phones Can be used to determine visibility for e.g. hunderds of AI bots The longer time spent computing, the more accurate the result TODO kuva portal vs pvs culling

87 Killzone 3 See ”Practical occlusion culling for PS3”: Solution implemented spesifically for PlayStation 3 Rasterizes a 720p tiled depth buffer on the SPU’s Performs occlusion tests to a downsampled depth buffer using object bounds Occluder mesh selection done by artists TODO link to paper

88 Battlefield 3 See ”Culling the Battlefield”: A cross-platform (XBOX360, PS3, PC) solution SIMD optimized frustum culling Software rasterizer for occlusion culling done to a 256x116 depth buffer Occluder geometry hand made by artists

89 What else can I use it for?
Other applications

90 Lighting & shadows When applied from a light sources point of view a visibility algorithm can be used for finding shadow casters ”Shadow Caster Occlusion Culling for Efficient Shadow mapping” (

91 Streaming Large game worlds have so much content that it cannot fit in the memory of a gaming platform Loading between zones takes away immersion A from-region visibility algorithm can be used to do visibility-based streaming over the network or from a storage media

92 AI A visibility algorithm can be used to drive AI logic
Data structures used in visibility determination can be modified to be used for distance or intersection testing

93 Sound occlusion Distance and intersection tests can be used to simulate the behaviour of sound Precomputing visibility and audio have a lot of overlap and make for an interesting field of study

94 Sampo Lappalainen sampo@umbrasoftware.com http://www.umbra3.com
FIN Sampo Lappalainen


Download ppt "Visibility Optimization for Games"

Similar presentations


Ads by Google