73/25/2017 9:14 AMOur HeroTehdään grafiikka moottori joka piirtää kamaa ruudulle -> helppoa.Artsti tekee modeleita -> modelit annetaan graffamoottorille ja piirretään.Inskät optimoi graffamoottoria ja artistit optimoi graffaa kunnes performance on kunnossa.Sit päädytään tähän tilanteeseen...
83/25/2017 9:14 AMScreen ShotMiten päädyttiin alkuperäsestä tilanteesta tähän?Tekki rajotti toimintaa niin paljon, että tää oli parasta mitä saatiin aikaan.
9Game Worlds Game developers want to make impressive game worlds Hardware sets limits on what can and can’t be done.Game developers need to push the hardware to it’s limits.Pelidevaajat tekee tekkiä jotta pelit saatas näyttämään hyvältä.Artistit pystyis tekemään hienompaa kamaa.Ongelmana ei oo piirtää hienoa grafiikkaa, ongelmana on piirtää hienoa grafiikkaa tarpeeks nopeesti.
10Visibility Optimization The most effective way to gain performance in games.Two basic ways to do visibility optimization:art and level designtechnologyGames use a mix of both.
11Visibility Optimization by Level Design Artists design game worlds so that performance is acceptable.Can be done in numerous ways e.g.:limiting view distancelimiting polygon or object countmodeling portals and cells
13Visibility Optimization by Level Design 3/25/2017 9:14 AMVisibility Optimization by Level DesignTime consuming and usually boring work.Sets huge limits on what can and cannot be done.May lead to monotonic level design.Manual and non-recurring work.
16Visibility Optimization by Technology Gains:No time wasted on rendering objects that don’t contribute to the output image (no state changes, no draw calls etc).AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.
18Terminology Culling – removing hidden objects from rendering Target – object that can be hidden by othersOccluder – an object that blocks visibilityRendering artifact – A non-intended glitch in the output image
19Metrics for comparison GPU costCPU costOverall frame timeMemory usagePrecomputation timeManual workCulling power
20Backface culling Taken care of by the HW Culling entire triangles based on their windingNo need to render the insides of an object
21Depth buffering Taken care of by the HW A two dimensional buffer for storing z-values for each screen pixelBefore processing shaders for a pixel to be rendered, test the z-value.Allows drawing of unsorted geometry, however sorting still greatly improves performance
22Hierarchical depth buffering Replace depth buffer with a depth pyramidBottom of the pyramid: full-resolution depth bufferHigher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below levelHierarchically rasterize the polygon starting from the highest levelIf polygon is further than the recorded pixel, early exitIf polygon is closer, hierarchically test the lower levelsIf the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramidTODO kuvaTODO viite
23Spatial hierarchiesEnabled culling large portions of the game world with a single quick testDynamic objects can be moved in the hierarchy runtimeBSP-tree, kd-treeTODO rename slideTODO pictures from Teppo’s presentation
28Potentially Visible Set - PVS A data structure that defines from-region-visibility for a sceneComputed in pre-processScene is divided into CellsCompute a bit matrix that lists all the visible objects for each cellRuntime a simple matrix lookupHow to find a good sub-division for a scene?Cannot handle dynamic occludersTarget volume: extension to handle dynamic targetsTODO lähteet
29PortalsPlace portals in the scene that connect the cells to form a portal graphIn runtime, find the portals of the current cell that are in the frustumTraverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portalSame limitations with dynamic objects as with PVS systemsTODO lähteet
30Rasterization-basedRender occluder geometry into a software coverage bufferTest visibility using test geometryUse temporal coherence to determine the initial set to be renderedHandles both dynamic targets and occluders as long as they have occluder geometry
39Occlusion Queries Supported by GPUs since 2001. GPU answers the question: “how many pixels would have been visible if this object would have been rendered”?Instead of rasterizing your own depth buffer, use the GPU depth buffer insteadNormally the query is done using bounding volumes (effective but not necessary).No need for artist generated occluder geometryGPU-CPU synchronization needed
403/25/2017 9:14 AMOcclusion QueriesDetermine the set of visible objects against the actual rendered geometry:all pixels can be used as occluding material!
41Using Occlusion Queries Occlusion queries are a really powerful tool for visibility optimization.Like all other features of the GPU occlusion queries can be used ineffectively.Special tricks are needed to get the most out of occlusion queries.
43CPU-GPU synchronization With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing).With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not.The CPU needs to wait for the query results to be available.No parallel processing (which is really bad).
44Issuing Occlusion Queries 3/25/2017 9:14 AMIssuing Occlusion QueriesObjekti 3 on tullu just näkyviin.
47Issuing Occlusion Queries Fortunately GPU design has a solution for this problem.GPUs can store multiple occlusion query results.Occlusion queries can be batched.Some GPUs have a limit on how many query results can be stored.
50Latent Occlusion Queries Some stalls may be introduced between frames.The last query result needs to be read back before continuing.Avoid GPU stalls by using the query results from the previous frame.Read back the query results at the beginning of each frame.Sounds like a perfect solution?
52Latent Occlusion Queries There are downsides to this.Visible popping artifacts when objects come visible.If the camera is moving slowly and FPS is good, no problem.When multiple objects become visible FPS typically drops (there’s a lot more to render)For example when a door is opened.
56Latent Occlusion Queries Queries done to hierarchy nodes produce even larger artifactsGrowing bounds helps, but is difficult to get to work with hierarchical queriesThe stall in using occlusion query results on the same frame may be as short as 0.1ms (on XBOX 360)In this a price developers are ready to pay for artifact free occlusion culling?
57Parallelism Most gaming platforms today come with more than one CPU Using the same algorithm for multiple cameras (splitscreen, AI bots, light sources)Tile-based rasterizationParallel data structure traverseTODO note about SIMD?TODO MORE BEEF!
58What kind of systems have really been used? Practice
59Binary Space Partitioning As made famous by Doom and the Quake seriesA tree data structure for representing the sceneGordon and Chen 1991 paper used in Doom (http://www.rothschild.haifa.ac.il/~gordon/ftb-bsp.pdf)Teller’s 1992 PhD thesis used in Quake (http://people.csail.mit.edu/seth/pubs/pubs.html)TODO rethink
60Binary space partitioning Before Doom BSP’s were used to do sorting for the painter’s algorithm (back-to-front)Painter’s algorithm is too slow for large scenesSolution: change the order to front-to-back and keep track on which pixels have been drawnQuake introduced a pre-process step for computing a PVS based on the BSP modelTODO rethink
61Umbra 1Used in Star Wars Galaxies, EverQuest 2, Age of Conan, Kingdom Heroes 2, Tian Xia 2A data structure that supports dynamic and static visibilitySoftware rasterizer and occlusion queries supported
62Umbra 1 Database Visibility traverse Spatial bounding volume hierarchy User updatesVisibility traverseInput: camera parametersOutput: visible object setHierarchical visibility testing: a single query can hide large parts of the sceneTODO describe how it works
63Hierarchical CullingIn typical game scenes most of the scene is hidden at any given point of viewProblem:the size of the whole scene effects performance (input sensitive system).Only the visible objects are supposed to effect performance (output sensitive system).
65Hierarchical Culling Solution: build a spatial hierarchy for the objects in the sceneCulling hidden parts of the scene in constant timeOcclude groups of objects: if a hierarchy node is hidden all nodes below it are also hidden
66Hierarchy Traversal Traverse the hierarchy to determine visibility Use temporal coherencyOn first frame, start from the rootStore nodes where traversal ended and start traversing them on the next frameNodes form a visibility barrier
70Dynamic Objects Object geometry may change (e.g. due to LODing). Objects may moveIf object geometry changes it may not fit into its old boundsMove the object upwards in the hierarchy so that the bounds can fit inside a nodePush the object back down once there is idle timeEsimerkki seuraa.
713/25/2017 9:14 AMDynamic ObjectsIf the object moves temporal bounding volumes can be used.Use history info to predict the object movement.The TBV doesn’t have to be updated every frame.
74Umbra 2 Multi-core version of the previous tech Used in e.g. Mass Effect 2, Dragon Age series, Alan Wake
75Multi-core culling Two subtasks: rendering and visibility traversal Rendering issues rendering calls and occlusion queries.Visibility processing takes care of hierarchy processing and high level culling (e.g. vf culling).
76Multi-core cullingGame tread needs to do updates before our visibility thread can continue (camera and object updates)Visibility thread updates the hierarchyAfter update the hierarchy can be traversed
78Multi-core cullingWhile the visibility thread is idle it can update the hierarchy:lazy hierarchy buildingcollapsing nodesvisibility barrier updatesmoving dynamic objects down etc.
79Umbra 3 Used by Unity 3D, Secret Studio Collection of visibility algorithmsUmbra 1-2 feature setsAutomatic portal generation in pre-processCPU rasterization and ray-tracing based portal culling algorithmsPVS culling for low end systems
80Umbra 3Uses real geometry, no need for artists to create occluder geometrySupport for streaming, distance queries, intersection queries
81Automatic portal generation Works with both outdoor and indoor scenesConservative occlusionThe output is a graph where the nodes are cells and the edges are the portalsOptionally a PVS can be computedIncremental updates
82Umbra 3 recursive portal culling Recursive traverse of the portal graph from the camera view point, ray tracingVery accurate culling resultsToo slow for whole scene culling, currently used for reference and for dynamic object culling
86Umbra 3 PVS culling Extremely fast Needed for low end systems such as smart phonesCan be used to determine visibility for e.g. hunderds of AI botsThe longer time spent computing, the more accurate the resultTODO kuva portal vs pvs culling
87Killzone 3See ”Practical occlusion culling for PS3”:Solution implemented spesifically for PlayStation 3Rasterizes a 720p tiled depth buffer on the SPU’sPerforms occlusion tests to a downsampled depth buffer using object boundsOccluder mesh selection done by artistsTODO link to paper
88Battlefield 3See ”Culling the Battlefield”:A cross-platform (XBOX360, PS3, PC) solutionSIMD optimized frustum cullingSoftware rasterizer for occlusion culling done to a 256x116 depth bufferOccluder geometry hand made by artists
90Lighting & shadowsWhen applied from a light sources point of view a visibility algorithm can be used for finding shadow casters”Shadow Caster Occlusion Culling for Efficient Shadow mapping” (http://www.cg.tuwien.ac.at/research/publications/2011/bittner-2011-scc/bittner-2011-scc-paper.pdf)
91StreamingLarge game worlds have so much content that it cannot fit in the memory of a gaming platformLoading between zones takes away immersionA from-region visibility algorithm can be used to do visibility-based streaming over the network or from a storage media
92AI A visibility algorithm can be used to drive AI logic Data structures used in visibility determination can be modified to be used for distance or intersection testing
93Sound occlusionDistance and intersection tests can be used to simulate the behaviour of soundPrecomputing visibility and audio have a lot of overlap and make for an interesting field of study