Vulkan Memory Types Ashley Smith November 2018.

Vulkan Memory Types Ashley Smith November 2018

Agenda Heaps and types Tips and tricks The VMA library Conclusion

Effort vs control Effort required Legacy APIs Level of control
Achieve smaller footprint Better optimize for specific platforms Aliasing! Effort required Legacy APIs Level of control We have more control with Vulkan But we require more effort

Heaps and types

Figure out what type, alignment and size
Allocation Figure out what type, alignment and size Allocate memory Create a resource Resource Memory Bind With legacy APIs, applications take the lower path With lower level APIs, applications take the higher path More operations required

Allocating some memory
VkResult vkAllocateMemory( VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, const VkAllocationCallbacks* pAllocator, VkDeviceMemory* pMemory); typedef struct VkMemoryAllocateInfo { VkStructureType sType; const void* pNext; VkDeviceSize allocationSize; uint32_t memoryTypeIndex; } VkMemoryAllocateInfo; Allocate memory with a call to vkAllocateMemory()

VkResult vkAllocateMemory( VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, const VkAllocationCallbacks* pAllocator, VkDeviceMemory* pMemory); typedef struct VkMemoryAllocateInfo { VkStructureType sType; const void* pNext; VkDeviceSize allocationSize; uint32_t memoryTypeIndex; } VkMemoryAllocateInfo; But what do we pass in to memoryTypeIndex ?

vkGetPhysicalDeviceMemoryProperties( VkPhysicalDevice physicalDevice, VkPhysicalDeviceMemoryProperties* pMemoryProperties); typedef struct VkPhysicalDeviceMemoryProperties { uint32_t memoryTypeCount; VkMemoryType memoryTypes[VK_MAX_MEMORY_TYPES]; uint32_t memoryHeapCount; VkMemoryHeap memoryHeaps[VK_MAX_MEMORY_HEAPS]; } VkPhysicalDeviceMemoryProperties; We need to query the memory types and heaps that are present on the device Select an appropriate place for our resource

vkGetPhysicalDeviceMemoryProperties( VkPhysicalDevice physicalDevice, VkPhysicalDeviceMemoryProperties* pMemoryProperties); typedef struct VkPhysicalDeviceMemoryProperties { uint32_t memoryTypeCount; VkMemoryType memoryTypes[VK_MAX_MEMORY_TYPES]; uint32_t memoryHeapCount; VkMemoryHeap memoryHeaps[VK_MAX_MEMORY_HEAPS]; } VkPhysicalDeviceMemoryProperties; Next we will explain the memory types available

typedef enum VkMemoryPropertyFlagBits { VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT = 0x , VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT = 0x , VK_MEMORY_PROPERTY_HOST_COHERENT_BIT = 0x , VK_MEMORY_PROPERTY_HOST_CACHED_BIT = 0x , VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT = 0x , VK_MEMORY_PROPERTY_PROTECTED_BIT = 0x , } VkMemoryPropertyFlagBits; We will only be focusing on these memory properties today

Memory types vs. heaps Memory Type Heap 0 Type 0 Heap 1 Type 1 Heap 2
(RX Vega 64) Memory Type Heap 0 Type 0 Heap 1 Type 1 Heap 2 Type 2 Type 3 Keep in mind the size of each heap

Memory types cheat sheet
(RX Vega 64) Size Memory Type $ $ R W R W Most of VRAM 1 2 Fixed 256MiB 3 $ - Storage - Visible - Cached - Fast - Slow

(RX Vega 64) Memory Type $ $ R W R W Size Most of VRAM Maps to VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT in VkMemoryPropertyFlagBits. 1 2 Fixed 256MiB 3 $ - Storage - Visible - Cached - Fast - Slow

(RX Vega 64) $ $ R W R W Size Memory Type Most of VRAM 1 Maps to VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_CACHED_BIT respectively. 2 Fixed 256MiB 3 $ - Storage - Visible - Cached - Fast - Slow

(RX Vega 64) Memory Type $ $ R W R W Size Most of VRAM 1 2 Fixed 256MiB 3 $ - Storage - Visible - Cached - Fast - Slow Type 0: Video memory. Not visible by CPU at all, you can’t map it. Fast for access on the GPU. Straightforward. Type 1: Not in GPU memory, instead it’s in CPU memory, however, like all memory types it’s still visible and cached by the GPU. It is however, *not* cached by the CPU. So that means reads will be uncached. Writes may be write-combined to achieve decent write speed. Type 2: We’re now back in GPU memory again. Except this time the memory is visible by the CPU. However, it’s not cached - Write it and never read it from CPU. Type 3: We’re now back over in host memory again, with Type 3. This type is similar to Type 1, except it’s cached by the CPU (write-back). This means the GPU reads have to snoop CPU caches. Making both R&W quite slow.

(RX Vega 64) Memory Type $ $ R W R W Size Most of VRAM Okay, not that bad since we benefit from GPU caches, but certainly worse than just reading from DEVICE_LOCAL. 1 2 Fixed 256MiB 3 $ - Storage - Visible - Cached - Fast - Slow

(RX Vega 64) On current GPUs & drivers, PC Windows® everything that is HOST_VISIBLE is also marked COHERENT. Memory Type $ $ R W R W Size Most of VRAM 1 On other architectures you may need: vkInvalidateMappedMemoryRanges before reads and vkFlushMappedMemoryRanges after writes. BEWARE: Unmapping won’t do this for you! 2 Fixed 256MiB 3 $ - Storage - Visible - Cached - Fast - Slow

Memory types vs. heaps Memory Type Usage Notes Heap 0 Type 0
(RX Vega 64) Memory Type Usage Notes Heap 0 Type 0 Use for “upload-once” resources, render targets, depth buffers, etc. Heap 1 Type 1 Staging for uploads to VRAM. Fixed size of 256MiB. Use for dynamic, write once, read once resources. Heap 2 Type 2 Type 3 Written by GPU, read by CPU. E.g.: Downsampled luminance.

Warning VkMemoryAllocateInfo x = { .memoryTypeIndex = 2 }; vkAllocateMemory(&x, &mem);

Warning vkGetPhysicalDeviceMemoryProperties(&properties); VkMemoryAllocateInfo x = { .memoryTypeIndex = findAppropriateMemoryHeap(properties); }; vkAllocateMemory(&x, &mem);

Warning Not a good idea to hardcode the memory type indices
Driver may change in future May be different on other/newer hardware Magic numbers Query the info using vkGetPhysicalDeviceMemoryProperties Map to engine-specific enums

Tips and tricks

Allocation strategies
Is this a good idea? for( int i = 0; i < 1000; i++ ) { my_new_objects[i] = new x; }

Is this a good idea? //for( int i = 0; i < 1000; i++ ) { // my_new_objects[i] = new x; //} my_new_objects = new x[1000];

Same idea on GPU for similar reasons: Fragmentation Performance Data locality Allocate reasonably large chunks of memory (256MiB) Just 16 allocations fills 4GiB of VRAM Good balance between flexibility and performance On Windows®7 Vulkan memory allocations have larger overhead Sub-allocate the memory for resources from these blocks

Linear allocator Stack allocator Double stack allocator Block allocator Ring buffer - Used memory - Free memory

Over subscription When you run out of memory:
VK_ERROR_OUT_OF_DEVICE_MEMORY VK_SUCCESS

Over subscription VK_ERROR_OUT_OF_DEVICE_MEMORY Allocation fails
Application must handle out-of-memory conditions Out-of-memory potentially changes per driver/hardware

Over subscription VK_SUCCESS Allocation succeeds
Some blocks are silently migrated to system memory Why would you want this? Useful for development purposes - Artists don’t always stick to budgets Some of your blocks might get paged anyway (you’re not alone on the machine) Application doesn’t have to handle out-of-memory Accessing blocks migrated to system memory can degrade GPU performance

Over subscription No way is exposed to control residency manually
No way is exposed to query the used/free memory To make things worse, there are other implicit resources which need memory too: Swap chains Command buffers Descriptors Shaders / PSOs Query results Use VkMemoryHeap::size then apply some “informed adjustments”: Flags Adjustment DEVICE_LOCAL VkMemoryHeap::size * 0.8f DEVICE_LOCAL | HOST_VISIBLE VkMemoryHeap::size * 0.66f Can now be controlled with VK_AMD_memory_overallocation_behavior However memory can still be migrated – still suggest “fudges” on limits

Vulkan memory allocator (VMA)

Free Open source MIT license Single header Simple, C99 interface. Same style as Vulkan™ Battle tested, already getting some love in the community

Function that help to choose the correct and optimal memory type based on intended usage Functions that allocate memory blocks, reserve and return parts of them to the user Allocation tracker, look at used/unused, and fragmentation Respects alignment and buffer/image granularity

VkBufferCreateInfo bufferInfo = { VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO }; bufferInfo.size = 65536; bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT; VmaAllocationCreateInfo allocInfo = {}; allocInfo.usage = VMA_MEMORY_USAGE_GPU_ONLY; VkBuffer buffer; VmaAllocation allocation; vmaCreateBuffer(allocator, &bufferInfo, &allocInfo, &buffer, &allocation, NULL);

Even has some tooling! VMA can dump allocator state to JSON Python script generates PNG file which shows the allocator contents Defragmentation support Unintialised memory checks, Out of bounds checks

Conclusion Vulkan is lower-level and requires explicit memory management Creating resources is a multi-stage process Former driver magic is now under your control You need to deal with differences between GPUs By following good practices you can achieve optimal performance on any GPU Vulkan Memory Allocator (VMA) is battle-tested and can really help a lot

Thank-you Adam Sawicki Dominik Baumeister Lou Kramer
Matthäus G. Chajdas Rys Sommefeldt Timothy Lottes Nicolas Thibieroz Alon Or-Bach

Memory aliasing As resolutions get larger, render targets follow suit
As many resources are transient, aliasing can be a solution to keep render target/UAV memory in check G-Buffers: Render G-Buffer Lighting Render Alphas Post-Processing Render passes:

As many resources are transient, aliasing can be a solution to keep render target/UAV memory in check Once lighting is done, we don’t need the g-buffer anymore. Let’s use it! G-Buffers: Render G-Buffer Lighting Render Alphas Post-Processing Render passes:

As many resources are transient, aliasing can be a solution to keep render target/UAV memory in check Post processing often does a lot of ping-ponging of RTs. Why not use again here? G-Buffers: Render G-Buffer Lighting Render Alphas Post-Processing Render passes:

As many resources are transient, aliasing can be a solution to keep render target/UAV memory in check Maybe some compute shader in here needs a nice big UAV for something. No need to allocate, alias with a Render target. G-Buffers: Render G-Buffer Lighting Render Alphas Post-Processing Render passes: Showing resource lifetime Frame graph style

As many resources are transient, aliasing can be a solution to keep render target/UAV memory in check For second, third, etc. use of aliased resource best to assume it contains garbage G-Buffers: Render G-Buffer Lighting Render Alphas Post-Processing Render passes: VK_IMAGE_LAYOUT_UNINITIALIZED

As many resources are transient, aliasing can be a solution to keep render target/UAV memory in check For second, third, etc. use of aliased resource best to assume it contains garbage >50% memory saved in some titles [ODonnell17] G-Buffers: Render G-Buffer Lighting Render Alphas Post-Processing Render passes: [ODonnell17] Yuriy O’Donnell – FrameGraph: Extensible Rendering Architecture in Frostbite

Conclusion Vulkan is lower-level and requires explicit memory management Creating resources is a multi-stage process Former driver magic is now under your control You need to deal with differences between GPUs By following good practices you can achieve optimal performance on any GPU Vulkan Memory Allocator (VMA) is battle-tested and can really help a lot

Thank-you Adam Sawicki Dominik Baumeister Lou Kramer
Matthäus G. Chajdas Rys Sommefeldt Timothy Lottes Nicolas Thibieroz Alon Or-Bach

Vulkan Memory Types Ashley Smith November 2018.

Similar presentations

Presentation on theme: "Vulkan Memory Types Ashley Smith November 2018."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Vulkan Memory Types Ashley Smith November 2018.

Similar presentations

Presentation on theme: "Vulkan Memory Types Ashley Smith November 2018."— Presentation transcript:

Similar presentations

About project

Feedback