Download presentation
Published byWilliam Hunt Modified over 9 years ago
1
Partially Resident Textures on Next-Generation GPUs
Bill Bilodeau, Graham Sellers, Karl Hillesland - AMD
2
Agenda for today’s Talk
Part 1 – Introduction to HD7970 and Partially Resident Textures, Bill Bilodeau Part 2 – Implementation in OpenGL, Graham Sellers Part 3 – Ptex, an example PRT application, Karl Hillesland
3
Part 1 Introduction to the Radeon HD7970 and Partially Resident Textures
4
What are Partially resident Textures?
Partially Resident Textures (PRTs) are textures that have only portions of the texture stored in GPU video memory Best known example of virtual texturing (software implementation) is John Carmack’s “MegaTextures” Image from id Software’s Rage
5
Completely new Shader architecture
Radeon HD7970 Overview World’s first GPU to have dedicated hardware for Partially Resident Textures Completely new Shader architecture Improved cache and memory bandwidth World’s first Direct3D® 11.1 GPU
6
Previous Shader Architecture
Previous AMD GPUs used VLIW (Very Long Instruction Word) architecture Combines instructions into a 4-wide VLIW that gets executed on a SIMD Shader Instructions VLIW Instruction X Y Z W b + c idle a + c idle c + d idle b + a idle b + c c + d d + e e + f Thread 0 Thread 1 Thread 2 Thread 63 a = b + c; b = a + c; c = b + a; d = c + d; a = b + c; b = c + d; c = d + e; d = e + f; a + c idle b + c idle c + d idle b + c c + d d + e e + f b + a idle b + c idle b + a idle c + d idle b + c c + d d + e e + f a + c idle b + c idle b + c c + d d + e e + f c + d idle b + a idle a + c idle
7
New Shader Architecture
64-wide SIMD architecture without VLIW instructions No need to combine instructions, since multiple threads can run in parallel Shader Instructions ALUs a = b + c; b = a + c; c = b + a; d = c + d; S0 S1 S2 .... S63 c + d b + c b + a a + c b + c c + d b + c b + a No idle ALUs!
8
Compute Units are the New basic Building Block for Shaders
Each Compute unit consists of 4 SIMDs and one Scalar unit Higher execution efficiency Simplified logic design Simplified assembly language HD7970 has 32 Compute Units 4 SIMDs per CU Compute Unit
9
Additional Features of the HD7970
Improved Tessellation Performance Improved Geometry Shader Performance Fast depth accept for fully visible triangles, depth bounds testing support 384 bit memory bus DX11.1 And of course, Partially Resident Texture support!
10
Introduction To Partially resident textures
Enables application to manage more texture data than can physically fit in a fixed footprint A.k.a. Virtual texturing or Sparse texturing The principle behind PRT is that not all texture contents is likely to be needed at any given time Current render view may only require selected portions of the texture to be resident in memory Or selected MIPMap levels PRT textures only have a portion of their data mapped into GPU-accessible memory at a given time
11
Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
PRT Tiles The PRT texture is chunked into 64 KB tiles Fixed memory size Not dependant on texture type or format Highlighted areas represent texture data that needs highest resolution Chunked texture Texture tiles needing to be resident in GPU memory Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
12
Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
Translation Table The GPU virtual memory page table translates tiles into a resident texture tile pool Texture Map Texture Tile Pool (Video Memory) (linear storage) Unmapped page entry Mapped page entry 64Kb tile Page Table Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
13
Translation Table - MIPMaps
MIPMaps can be included in the Texture Tile Pool Texture Map Page Table Texture Tile Pool (Video Memory) Unmapped page entry Mapped page entry 64Kb tile Images from “Sparse Virtual Textures”, Sean Barrett, GDC 2008
14
“Failed” Texel Fetch Condition
How does the application know which texture tiles to upload? Answer: PRT-specific texture fetch instructions in a shader Return a “Failed” texel fetch condition when sampling a PRT pixel whose tile is currently not in the pool This information is then stored in render target or UAV Texel fetch failed for a given (x,y) tile location ...and then copied to the CPU so that application can upload required tiles App chooses what to render until missing data gets uploaded OpenGL example: there is a “sparse” version of virtually all texture fetch instructions. ...and then copied to the CPU so that application can upload required tiles: GPU->CPU copies have a few frames delay 14
15
“LOD Warning” Texel Fetch Condition
PRT fetch condition code can also indicate an “LOD Warning” The minimum LOD warning is specified by the application on a per texture basis If a fetched pixel’s LOD is below the specified LOD warning value then the condition code is returned This functionality is typically used to try to predict when higher-resolution MIP levels are going to be needed E.g. Camera getting closer to PRT-mapped geometry 15
16
Example Usage 1) App allocates PRT (e.g. 16kx16k DXT1) using PRT API 2) App uploads MIP levels using API calls 3) Shader fetches PRT data at specified texcoords Two possibilities: 3a) Texel data belongs to a resident (64KB) tile - Valid color returned, no error code 3b) Texel data points to non-resident tile or specified LOD - Error/LOD Warning code returned - Shader writes tile location and error code to RT or UAV 4) App reads RT or UAV and upload/release new tiles as needed App allocates PRT (e.g. 16kx16k BC1) using PRT API: no video memory allocated at this stage App reads RT or UAV and upload/release new tiles as needed: UAV reading on the CPU subject to latency, typically a couple of frames when copying GPU memory to CPU-accessible memory 16
17
PRT Advantages vs Software Implementation
Software Impementation PRT Ease of implementation Eliminates the complexity and limitations of SW solutions Full filtering support Includes anisotropic filtering Full-speed filtering SW solution requires “manual” filtering in pixel shader Can be quite costly if anisotropic filtering is used Don’t go overboard with PRT allocation! Page table entry size is 4 DWORDs Have to be resident in video memory Advantage of PRT over virtual texturing SW solution: reduced overhead, no need for texture borders, full aniso is possible Page entry size is 4 DWORDS (16 bytes). One for each 64Kb tile means that the maximum texture size of 16kx16kx8kx32bits = 8Tb needs 2 Gigs of page table entries (that have to be resident) PRT is a HW solution that eliminates the complexity and limitations of software solutions (e.g. Carmack’s MegaTexture) 17
18
Part 2 Implementation in OpenGL AMD_sparse_texture Extension
19
OpenGL Extension | AMD_sparse_texture
Partially Resident Textures exposed in OpenGL via extension Two design goals for the extension Minimally invasive to the API Easy to retrofit into existing application Plays well with non-sparse textures Easy fallback path Most of the same code will work in the absence of the extension Two parts to the extension Update to the API – 1 function, a hand full of tokens Update to the shading language
20
Upload Textures | Example Using Existing OpenGL API
Use of immutable texture storage This is the existing OpenGL immutable storage API – declare storage, specify image data GLuint tex; glGenTextures(1, &tex); glBindTexture(GL_TEXTURE_2D, tex); glTexStorage2D(GL_TEXTURE_2D, 10, GL_RGBA8, 1024, 1024); glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);
21
Upload Textures | Example Using New OpenGL Extension
Use of sparse texture storage glTexStorageSparseAMD is the one new function in the extension Notice very little difference to previous API GLuint tex; glGenTextures(1, &tex); glBindTexture(GL_TEXTURE_2D, tex); glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data);
22
Make Pages Resident | Reuse Existing API
Previous example used glTexSubImage2D Upload sub-region of the texture Physical pages allocated on demand by the OpenGL driver Unused pages remain free Enough storage for two 256x256 regions allocated glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 10, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, data1); glTexSubImage2D(GL_TEXTURE_2D, 0, 768, 768, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, data2);
23
Free Physical Pages | Again, Reuse Existing API
Passing NULL to glTexSubImage2D makes pages non-resident Driver returns physical pages to the pool glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 256, 256, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
24
Page Sizes | Determining Page Sizes
Sparse Textures rely on VM subsystem Pages are 64KB in size on Southern Islands Note size is measured in bytes, not texels Texel size of a page depends on texture format BPP Texels 128 4096 64 8192 32 16384 16 32768 8 65636 BPP Tile Width Tile Height 128 64 BC2/3/5/6H/7 256 BC1/4 512 32 16 8
25
Page Size | Retrieving Page Size from OpenGL
Reuse existing API: glGetInternalFormativ New OpenGL tokens – GL_VIRTUAL_PAGE_SIZE_{X,Y,Z}_AMD Given a target (texture dimensionality) and format, returns the page size It is not necessary to create a texture to get this information GLint page_size_x; glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA8, GL_VIRTUAL_PAGE_SIZE_X_AMD, sizeof(GLint), &page_size_x);
26
MipMaps | Dealing With Small Textures
Highest resolution LOD requires multiple pages Each LOD requires fewer and fewer pages Eventually, one LOD does not fill a page Now what? At some point, we must make all LODs resident But which LOD? Use glGetInternalFormativ to retrieve the lowest sparse level for a given target/format All levels below this reside in the same page and share residency GLint min_sparse_level; glGetInternalFormativ(GL_TEXTURE_2D, GL_RGBA16F, GL_MIN_SPARSE_LEVEL_AMD, 1, &min_sparse_level);
27
LOD Warning | Low Water Mark
To assist in streaming we include a per-texture low water mark Set this to the highest resolution LOD that’s fully resident Once you hit this, you’ll get a signal in the shader Returned data is still valid Signal says it’s time to start streaming the next mip Exposed using the glTexParameter API Here, an LOD warning will be returned to the shader if hardware attempts to access LOD 4 or lower More on residency returns later... glTexParameteri(GL_TEXTURE_2D, GL_MIN_WARNING_LOD_AMD, 4);
28
Rendering to PRT | Attach PRT to FBO
It is possible to render to a PRT using an FBO Writes to unmapped regions are simply dropped GLuint prt, fbo; glGenTextures(1, &prt); glBindTexture(GL_TEXTURE_2D, prt); glTexStorageSparseAMD(GL_TEXTURE_2D, GL_RGBA, 1024, 1024, 1, 1, GL_TEXTURE_STORAGE_SPARSE_BIT_AMD); glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data); glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, prt, 0);
29
Reading from PRT | Retrieving Data from PRTs
Applications can read PRTs to CPU memory using existing APIs Call glGetTexImage to read the entire content back Bind to FBO and use glReadPixels or glBlitFramebuffer Reads to system memory or into another FBO, respectively glGetTexImage(GL_TEXTURE_2D, 0, GL_RGBA, GL_UNSIGNED_BYTE, data); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, prt, 0); glReadPixels(0, 0, 1024, 1024, GL_RGBA, GL_UNSIGNED_BYTE, data); glBlitFramebuffer(0, 0, 1024, 1024, 0, 0, 128, 128, GL_COLOR_BUFFER_BIT, GL_LINEAR);
30
Restrictions | Mostly Everything Works
There are some restrictions on the use of sparse textures Dimensions of the base level must be integer multiples of the page size (GL_VIRTUAL_PAGE_SIZE_{X,Y,Z}_AMD) This means... no sparse textures below this size No buffer textures or “TBOs” – another extension is coming for that! No depth or stencil textures, nor MSAA textures
31
Managing Failure | Memory is not Unlimited
Virtual address space is extremely large – 10’s to 100’s of gigabytes You will run out eventually, but it’ll take a while Physical memory is still limited glTexSubImage2D etc., may fail Draw calls may fail Feel free to create an 4k x 4k x 4k volume texture Don’t try to make it all resident at the same time! There are no sparse read-backs glGetTexImage could read gigabytes of data back This will fail 4k x 4k x 4k x 32bpp is 512GB.
32
Sparse Textures in Shaders | Extending GLSL
First and most important: IT IS NOT NECESSARY TO MAKE SHADER CHANGES TO USE SPARSE TEXTURES
33
Sparse Textures in Shaders | Extending GLSL
Basic type for textures in GLSL is the ‘sampler’ Several types of samplers exist... sampler2D, sampler3D, samplerCUBE, sampler2DArray, etc. We didn’t add any new sampler types PRTs look like regular textures in the shader Textures are read using the ‘texture’ built-in function, its overloads and variants We didn’t add any overloads gvec4 texture(gsampler1D sampler, float P [, float bias]); gvec4 texture(gsampler2D sampler, vec2 P [, float bias]); gvec4 texture(gsampler2DArray sampler, vec3 P [, float bias]); gvec4 textureLod(gsampler2D sampler, vec2 P, float lod); gvec4 textureProj(gsampler2D sampler, vec4 P [, float bias]); gvec4 textureOffset(gsampler2D sampler, vec2 P, ivec2 offset [, float bias]); // ... etc.
34
Extending GLSL | New Built-in Functions
Adding more overloads to existing functions was difficult Need to return a status code and a texel Need user-specified defaults with conditional move like functionality Optional parameters in existing overloads made this very difficult Added new built-in functions New built-in functions return status code New built-in functions return texel data via inout parameters Most existing texture functions have a sparseTexture equivalent Non-PRTs work with new functions Will appear as fully-resident PRT int sparseTexture(gsampler2D sampler, vec2 P, inout gvec4 texel [, float bias]); int sparseTextureLod(gsampler2D sampler, vec2 P, float lod, inout gvec4 texel); // ... etc.
35
Extending GLSL | sparseTexture Functions
All sparseTexture functions return two pieces of data: Texel data via inout parameter Residency status code Texel data returned in inout parameter If texel fetch fails, old data remains in variable Think of it as a CMOV type operation Return code is hardware-dependent bit-field information More built-in functions for decoding status codes This allows us to extend this further in the future, or to change the implementation int sparseTexture(gsampler2D sampler, vec2 P, inout gvec4 texel [, float bias]);
36
sparseTexture Functions | Texture Data Return
Texel data is returned in inout parameter No direct support for ‘default value’ behavior This is emulated in the shader: Note that regular texture fetch functions work on PRTs too: Value of texel is undefined if you miss ... ... but feel free to use on known-resident data (atlases, explicit LoD, etc.) vec4 texel = vec4(1.0, 0.0, 0.7, 1.0); // Default value sparseTexture(s, texCoord, texel); // On success, texel contains texture data. On failure, it has the shader-supplied // default value in it (pinkish magenta here). vec4 texel = texture(s, texCoord);
37
sparseTexture Functions | Residency Data Return
Residency data is bit-packed into the return value from the fetch After this, code can be interpreted by three additional functions: vec4 texel = vec4(1.0, 0.0, 0.7, 1.0); // Default value int code; code = sparseTexture(s, texCoord, texel); bool sparseTexelResident(int code); bool sparseTexelMinLodWarning(int code); int sparseTexelLodWarningFetch(int code);
38
Residency Data | sparseTexelResident
sparseTexelResident simply indicates whether the data fetched is valid Returns true if data is valid, false otherwise Texel miss is generated if any required sample is not resident, including: Texels required for bilinear or trilinear sampling Missing mip maps Anisotropic filter taps It is up to the shader to ‘do the right thing’ Fall back to lower mips Write out to an image or framebuffer attachment etc., etc. bool sparseTexelResident(int code);
39
Residency Data | sparseTexelMinLodWarning
sparseTexelMinLodWarning returns true if a min LOD warning was generated This occurs when generating the returned texel required fetching from an LOD lower than the low-water mark specified by the application This can be a signal to the application to start streaming more mip levels bool sparseTexelMinLodWarning(int code);
40
Residency Data | sparseTexelLodWarningFetch
Returns the LOD that caused the low-watermark warning to be generated This also causes sparseTexelMinLodWarning to return true sparseTexelLodWarningFetch returns 0 if the warning was not hit int sparseTexelLodWarningFetch(int code);
41
Example Use Cases | What Can I Use This For?
Drop in replacement for traditional 2D Sparse Virtual Texture (SVT) Well, almost – maximum texture size hasn’t increased Very large texture arrays Sparsely populate array Can almost eliminate texture binds in some applications Volume textures + ray marching Sparse or homogeneous media Default value is maximum step distance for ray marching distance fields Arrays of variable sized textures Make a large array, but populate different mip levels in each slice Store LOD bias per array slice in an auxiliary array (UBO, for example) Etc., etc., etc.
42
Part 3 PRT PTex PTex Using Sparse Textures
43
Ptex: Per-face Texture Mapping for Production Rendering
Ptex | Introduction Ptex: Per-face Texture Mapping for Production Rendering [Burley and Lacewell, 2008] No UV setup (it’s implicit) No Seams Per-Patch Resolution Control Out-of-core Performance Advantages
44
Ptex: Per-face Texture Mapping for Production Rendering
Ptex | Introduction Ptex: Per-face Texture Mapping for Production Rendering [Burley and Lacewell, 2008] Per-face textures + MIPs Adjacency for filtering
45
Borders for Filtering Face Texture A Face Texture B
46
Manual Trilinear Filtering
floor Resolution Lookup (ddx ddy) frac Lerp floor +1
47
Packed in one texture array Slice per resolution
PRT Ptex Packed in one texture array Slice per resolution Resolution includes MIPs Cannot fit in standard MIP chain Easy lookups Easy resolution management Still one texture
48
Better organization possibilities Pack pages Scaled squares
PRT Ptex Pragmatics Better organization possibilities Pack pages Scaled squares Other Methods Packed Ptex – all in one texture slice Face per slice, array per resolution
49
Multires Slices
50
MIP Fallback Resolution Lookup (ddx ddy) Lerp floor +1 frac
51
Demo
52
Trademark Attribution
AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. ©2012 Advanced Micro Devices, Inc. All rights reserved.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.