Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Virtual Texture Topics

Similar presentations


Presentation on theme: "Advanced Virtual Texture Topics"— Presentation transcript:

1

2 Advanced Virtual Texture Topics
Virtual Texture is like Virtual Memory but instead of stalling a low resolution mip can be used. The page is a 2d section of a mip with constant memory size. =Mega Texture? possible Martin Mittring Lead Graphics Programmer

3 Presentation Overview
Motivation Virtual Texture* Implementation Related Topics Combo Texture* Summary / Future * Demo

4 Motivation: Texture Streaming
Peripherals Hard drive Network DVD/CD/… Computer Main memory GPU GPU texture cache Texture Video memory Amount Speed

5 Per mip-map texture streaming*
Streaming is needed: Large worlds, Loading time, Memory limits HW / API support No asynchronous single mip-map update Create/Release breaks MultiGPU Unpredictable performance less stable performance No asynchronous single mip-map update: upload is delayed until the texture is used, all mip-maps are uploaded Create/Release breaks MultiGPU: Unpredictable performance: different mip-map size, fragmentation in video memory might trigger garbage collector We used in Crysis to stay in the 32bit limits, on 64bit with enough memory or with low-res textures it’s not active * Used in CrysisTM to overcome 32 bit limits with high resolution textures

6 What is a Virtual Texture*?
Emulates a mip-mapped texture can be high resolution real-time on consumer graphics hardware partly resides in texture memory Implications on engine design, content creation, performance and image quality For the virtual texture we only keep relevant parts of the texture in fast memory and asynchronously request missing parts from a slower memory (while using the content of a lower mip-map as fall-back). This implies that we need to keep parts of lower mip-maps in memory and these parts need to be loaded first. To allow efficient texture lookups coherent memory access is achieved by slicing up the mip-mapped texture into reasonably sized pieces. Using a fixed size for all pieces (now called texture tiles) the cache can be managed more easily and all tile operations, e.g. reading or copying, have constant time and memory characteristics. Mip-maps smaller than the tile size are not handled by our implementation. This is often acceptable for applications like terrain rendering where the texture is never that far away and little aliasing is barely visible at all. If required this can be solved as well, simply by packing multiple mip-maps into one tile. For distant objects it’s even possible to fall back to normal mip-mapped texture mapping, but that requires a separate system for managing that. * “Virtual Texture” derived from the OS/CPU feature “Virtual Memory”

7 Virtual Texture example usage
Prepare mip-maps, store in equal sized tiles on HD Compute required tiles and request from HD Update indirection texture and tile cache Render 3D scene Both the tile texture cache and the indirection texture are dynamic and adapt to one or multiple views. A typical virtual texture is created from some source image format in a pre-process without imposing the graphic hardware limits on texture size. The data is stored on any slower device, e.g. the hard drive. Unused areas of the virtual texture can be dropped, thus saving memory as a nice bonus. The texture in the video memory (now called tile cache) consists of tiles required to render the 3d view. To efficiently reconstruct the virtual texture layout the indirection information is required as well.

8 Virtual Texture Demo Take a pause
Show tile cache in action and UV space with borders

9 Virtual Texture in the Pixel Shader
Idea: single unfiltered lookup into indirection texture (scale&offset), single filtered lookup into tile cache texture -> GDC 2008 Sean Barrett “Sparse Virtual Textures” No mips, memory coherent access Precision problem: 24/32bit float / integer D3D9: Half texel offset to get stable results Alternatives: Indirection per draw call, …

10 Virtual Texture Pixel Shader (HLSL)
float4 g_vIndir; // w,h,1/w,1/h indirection texture extend float4 g_Cache; // w,h,1/w,1/h tilecachetexture extend float4 g_CacheMulTilesize; // w,h,1/w,1/h tilecachetexture extend * tilesize sampler IndirMap = sampler_state { Texture = <IndirTexture>; MipFilter = POINT; MinFilter = POINT; MagFilter = POINT; // MIPMAPLODBIAS = 7; // using mip-mapped indirection texture, here 128x128 }; float2 AdjustTexCoordforAT( float2 vTexIn ) float fHalf = 0.5f; // half texel for DX9, 0 for DX10 float2 TileIntFrac = vTexIn*g_vIndir.xy; float2 TileFrac = frac(TileIntFrac)*g_vIndir.zw; float2 TileInt = vTexIn - TileFrac; float4 vTiledTextureData = tex2D(IndirMap,TileInt+fHalf*g_vIndir.zw); float2 vScale = vTiledTextureData.bb; float2 vOffset = vTiledTextureData.rg; float2 vWithinTile = frac( TileIntFrac * vScale ); return vTileCacheUV = vOffset + vWithinTile*g_CacheMulTilesize.zw + fHalf*g_Cache.zw; }

11 Bilinear filtering Efficient only with redundant border
1 is minimum, DXT 4 pixel (e.g ), centered tiles can improve quality Texture updates become unaligned and texture size non power-of-two Power-of-two property can be retained by reducing the usable tile size but quality suffers (e.g ) non power-of-two is bad If you have non power of two tiles it’s probably better to add some padding to the tile cache to create the texture with a power of two extend. Otherwise you might be faced with undefined memory and performance characteristics from the graphic card driver. Power-of-two is best for the hardware implementation but with borders the quality can suffer a lot. That loss is due to aliasing in the mip-maps caused by the down sampling of the source texture to slightly less than its half size. A good down-sampling algorithm can limit the aliasing in the lower mip-maps but even the top mip-map is affected by this design decision and that can be visible especially when using regular patterns in the texture.

12 Indirection Texture Efficient representation of the dynamic (view dependent) Quad-tree Texture format: 32Bit ARGB (Precision issues on some HW) 64Bit FP16 for precision and less PS math Free channel can be used to fade tiles: bilinear filtered, to limit max per-pixel LOD Using a 32bit texture format with 8 bit channels is possible but you have to adjust the values in the pixel shader with additional instructions. Beware that scaling might not return the values you would expect. By storing a value from 0 to 255 in the texture you get values from 0.0 to 1.0 in the pixel shader what is guaranteed, but scaling these values by should return you integer values, but it might not. Floating point math can be an issue, but even worse is that on some hardware the precision seems to be lower, rather comparable to 16 bit floats. In [Barrett08] the problem was solved by rounding[1], but the author admits that this might be not the most efficient approach. [1] To get the integer value from of a 8 bit channel this shader code was used: exp2(floor(channel* ))

13 Indirection Texture Update
Quad-tree modifications only at leaf level 2D block updates of the texture (CPU/GPU) Mip mapped texture offers per-pixel LOD Many indirection textures at full resolution in memory are wasteful Multiple indirection texture updates per frame Many indirection textures at full resolution in memory are wasteful: run-time Create/Release, using max memory or pools with different sizes Multiple indirection texture updates per frame: distribute over multiple frames

14 Indirection Texture Update (C/C++)
// float to fp16(s1e5m10) conversion (does not handle all cases) WORD float2fp16( float x ) { uint32 dwFloat = *((uint32 *)&x); uint32 dwMantissa = dwFloat & 0x7fffff; int iExp = (int)((dwFloat>>23) & 0xff) - (int)0x7f; uint32 dwSign = dwFloat>>31; return (WORD)( (dwSign<<15) | (((uint32)(iExp+0xf))<<10) | (dwMantissa>>13) ); } WORD texel[4]; // texel output RECT recTile; // in texels in the tilecache texture int iLod; // 0=full domain, 1=2x2, 2=4x4, ... int iSquareExtend; // indirection texture size in texels float fInvTileCache; // tile.Width / texCacheTexture.Width texel[0] = float2fp16(recTile.left*fInvTileCache); texel[1] = float2fp16(recTile.top*fInvTileCache); texel[2] = float2fp16((1<<iLod)/ iSquareExtend); texel[3] = 0; // unused

15 Tile Texture Cache Update 1/2
Requirements: Fast in latency and throughput stall free (no wait) but correct state Minimal bandwidth and memory overhead Memory layout and type for fast texture lookups O(n) or better for n tiles Predictable performance Fast in latency and throughput Bandwidth efficient (copy only required part) Little memory overhead Updates should happen stall free (no wait) but correctly synchronized to get the right texture state When updating the content by CPU there should be no copy from GPU memory to CPU memory (discard should be used) For fast texturing from the tile cache texture it should be in the appropriate memory layout (swizzled[1]) and memory type (video memory). Note that on some hardware compressed textures are stored in linear form (not swizzled). Multiple tile updates should have linear or better performance [1] A swizzled memory layout is a cache friendly layout for position coherent texture lookups.

16 Tile Texture Cache Update 2/2
Direct CPU Update D3D9: D3DPOOL_MANGED, LockRect() Small intermediate tile texture D3D9: LockRect(), StretchRect(), not for compressed Large intermediate tile cache texture D3D9: D3DPOOL_SYSTEM, LockRect(), UpdateTexture() GPU update (render to texture) fastest, flexibility limited There are three basic methods to update a part of a mip-map from CPU and depending on graphics API and depending on driver version and graphic card it’s not defined what performance behaviour you would get. This is actually is the major problem of the implementation and on some configurations it might even make the method unusable. We haven’t made enough tests to clarify this. The best test would be a finished game with heavy asset usage and dynamic content because you need to have a realistic workload to see problems with the bandwidth and memory limits. Experience shows that such driver issues often get addressed after a major game shipped using the technology. Method 1: Direct CPU update: The destination texture needs to be in D3DPOOL_MANGED and via the LockRect() function a section of the texture is updated. This method wastes main memory, and most likely the transfer is deferred till a draw call is using the texture. This method is simple but likely to be less optimal when compared with the next two methods. Method 2: With (small) intermediate tile texture: Here we need one lockable intermediate texture hat can hold one tile (including border). With the LockRect() function this texture is updated as a whole. With the StretchRect() function the texture is then copied into the destination texture to replace one tile only. This does not work with compressed textures (DXT) as they cannot be used as render target format. For the StretchRect() function the source and the destination requires to be in D3DPOOL_DEFAULT and that requires the source to be D3DUSAGE_DYNAMIC to be lockable. To find out if the driver supports dynamic textures, you are ought to check the caps bits for D3DCAPS2_DYNAMICTEXTURES but according to the list in the DirectX SDK even the lowest SM20 cards support this feature. Method 3: With (large) intermediate tile cache texture: This method requires a lockable intermediate texture in D3DPOOL_SYSTEM with the full texture cache extend. With LockRect() the intermediate texture is updated only where required and a following UpdateTexture() function call is transferring the data to the destination texture. UpdateTexture() requires the destination to be in D3DPOOL_DEFAULT.

17 Limits Maximum virtual texture size Maximum tile cache texture size
= ExtendIndirection texture * Extendtile without border (addtionally limited by math precision) Maximum tile cache texture size Storing different attributes in the tile caches Splitting the tile cache over multiple textures Different object types or multiple cache levels

18 Possible Tile Sources Harddrive, CD.. use IO Completion ports, not memory mapped files Network Procedural Content Generation Mathematics or Example based Blending Materials Decals / Roads / Vector graphics / Text / Shadows Compression? DXT on-load? O(1)? Harddrive: IO Completion ports, disable OS cache CD / DVD / BD / HD DVD: latency accumulates with many small tile requests, alignment Network: Wouldn’t it be nice to join a multiplayer match without the need to download a map? You can join an ongoing game and even see all the detailed changes like tire tracks and decals that happened to the map. The normal game server or some special extra server can bake decals and distribute them to the clients on demand. Compression might be more important here, but it seems that downstream bandwidth will be much less of a problem in the future. O(1): costant performance with HD streaming of uncompressed content, much worse with on-the-fly baking of decals

19 Examples from CrysisTM *
Terrain material blending (terrain detail objects have been removed Decals (roads, tire tracks, dirt) used on top of terrain material blending * not using the virtual texture pixel shader or the combo texture method!

20 Mip-map generation Even Kernel size (e.g. 2x2, 4x4) mip-map is offset to the next higher one fits to normal GPU behavior Odd Kernel size (e.g. 3x3, 5x5) mip-map texels are aligned good for rectangular chart images Choice affects implementation in other areas Non power-of-two: harder, slower, less quality

21 Out of Core mip-map Generation
In memory often not feasible (32Bit) Tile based HD layout, no redundancy, uncompressed data, sRGB? Functions to request any rectangular block Intermediate mip-maps stored in tiles Texture compression in export step Lazy / on-the-fly mip-map generation

22 Computing the local LOD
Conservative or Precise? View-point or View-cone? UV mapping dependent? min(ddx,ddy) or max(ddx,ddy) or euclidian? Many methods e.g. WorldSpace Quad-tree of AABB (tessellation dependent) View space (occlusion problem, single camera) Texture space (GPU, overdraw, resolution?, conservative?) D3D9: OcclusionQuery() or CPU readback?

23 Mesh Parameterization
Unique unwrapping many applications Workflow issues Stable memory requirements Shading in texture space Rectangular charts Quad-tree aware unwrapping (less wasted area) Sparse textures (e.g. Masks) Rectangular charts: Subdivision surfaces like catmull-clark Tesselation?

24 Attributes stored in the Tiles
Similar to the Attributes for Deferred Shading Diffuse/Specular Colour, Normal, SpecularPower Material ID => no blending Material Masks in Sparse Textures => blending, compression Combo Texture => lossy but static compression

25 Combo Texture Demo Take a pause

26 Combo Texture A 3d position in the RGB Cube can be used to blend between up to 8 materials. The blend coefficients can be stored efficiently in a virtual texture

27 Combo Texture RGB Cube Example
8 materials are defined at the cube corners RGB Combo Texture 8 materials are blended by the Combo Texture

28 Combo Texture Properties
Clean blend only on the cube edges (material placement) Bilinear filtering (Custom filtering possible but expensive), DXT Dynamic or shader driven combo texture colour (e.g. frost) Blending 2/4/8/16 or n materials by using RGB/RGBA texture Best: RGB with 2..8 Materials, alpha left for other use Single pass only possible in simple cases (e.g. Detail Texture)

29 Combo Texture Multi pass Blending
4 materials: One opaque pass 3 passes alpha lerp Alpha needs to be adjusted to compensate the following passes Additive blending would be simpler but precision requires FP16 Less overdraw: Index buffer per material Alternative to per triangle material assignment Material LOD (e.g. golden buttons on jacket)

30 Combo Texture Pixel Shader (HLSL)
float3 g_ComboMask; // RGB material combo colour // (3 channels for 8 materials) // 000,100,010,110,001,101,011,111 float4 g_ComboSum0,g_ComboSum1; // RGBA sum of the masks blended so far // including the current // (8 channels for 8 materials) // , , , // , , , float ComputeComboAlpha( BETWEENVERTEXANDPIXEL_Unified InOut ) { float3 cCombo = tex2D(Sampler_Combo,InOut.vBaseTexPos).rgb; float3 fSrcAlpha3 = g_ComboMask*cCombo + (1-g_ComboMask)*(1-cCombo); float fSrcAlpha3 = fSrcAlpha3.r*fSrcAlpha3.g*fSrcAlpha3.b; float4 vComboRG = float4(1-cCombo.r,cCombo.r,1-cCombo.r,cCombo.r) * float4(1-cCombo.g,1-cCombo.g,cCombo.g,cCombo.g); float fSum = dot(vComboRG,g_ComboSum0)*(1-cCombo.b) + dot(vComboRG,g_ComboSum1)*(cCombo.b); // + small numbers to avoid DivByZero return (fSrcAlpha f)/( f+fSum); }

31 Sparse Virtual Textures GDC 2008 Sean Barrett
Summary Many uncovered topics and details Virtual Texture is an old idea Recommended read: Sparse Virtual Textures Course PDF for details and references Many uncovered topics (e.g. anisotropy, prediction) We [game developers] need asynchronous partial texture update and mip-level adjustment with predictable performance. It seems we get virtualized memory for graphic card memory which is nice from an OS perspective but for games we don’t want stalling virtual memory. The application or the engine can deal with missing data on a more high level and can provide fallbacks until the request is resolved. Whatever methods are used and the described virtual texture method is just one, we need to be able to deliver frame rate quality and see this as a valuable feature (Quality of Service). Sparse Virtual Textures GDC 2008 Sean Barrett

32 Future … Better HW / API support Many Applications
Render from/to Virtual Texture Quality of Service (MultiGPU?) Local LOD feedback Compression Many Applications Adaptive Shadow Maps TileTrees, PolyCube-Maps, … Many uncovered topics (e.g. anisotropy, prediction) We [game developers] need asynchronous partial texture update and mip-level adjustment with predictable performance. It seems we get virtualized memory for graphic card memory which is nice from an OS perspective but for games we don’t want stalling virtual memory. The application or the engine can deal with missing data on a more high level and can provide fallbacks until the request is resolved. Whatever methods are used and the described virtual texture method is just one, we need to be able to deliver frame rate quality and see this as a valuable feature (Quality of Service).

33

34 Acknowledgments Efgeni Malachewitsch for the creature 3D model
Anton Kaplanyan, Nick Kasyan, Michael Kopietz and all others that helped me with this text Special thanks to Natasha Tatarchuk, Kev Gee, Miguel Sainz, Yury Uralsky, Henry Morton, Carsten Dachsbacher and the many others from the industry Questions?


Download ppt "Advanced Virtual Texture Topics"

Similar presentations


Ads by Google