Presentation is loading. Please wait.

Presentation is loading. Please wait.

“WALK IN” SLIDE. August 14-15 2006 Memory Management Internals Steve Smith Software Design Engineer Game Technology Group Microsoft Presentation/Presenter.

Similar presentations

Presentation on theme: "“WALK IN” SLIDE. August 14-15 2006 Memory Management Internals Steve Smith Software Design Engineer Game Technology Group Microsoft Presentation/Presenter."— Presentation transcript:


2 August 14-15 2006 Memory Management Internals Steve Smith Software Design Engineer Game Technology Group Microsoft Presentation/Presenter Title Slide Allocation Strategies for High Performance

3 August 14-15 2006 Presentation Overview What this talk is about Windows and Xbox 360 memory management How memory allocation functions work Performance consequences of different allocation schemes Common pitfalls in managing memory What this talk is not about How to write your own custom allocators Standard Slide with subtitle

4 August 14-15 2006 Virtual Memory “Virtualizes” physical memory Non-contiguous memory presented as contiguous 4K native page size 4K or 64K native page size Can allocate without committing RAM Per-page control over access rights

5 August 14-15 2006 Virtual Memory 0x00000000 Application Code DLL Code Application Data Stacks 0x7FFFFFFF System 0x80000000 0xFFFFFFFF 2GB

6 August 14-15 2006 Virtual Memory Virtual 4-KB Page Range Virtual 64-KB Page Range 0x00000000 0x3FFFFFFF 0x40000000 0x7FFFFFFF Physical 64-KB Range Physical 16-MB Range Physical 4-KB Range 0xA0000000 0xBFFFFFFF 0xC0000000 0xDFFFFFFF 0xE0000000 0xFFFFFFFF Code 64-KB Range Code 4-KB Range 0x90000000 0x9FFFFFFF 0x80000000 2GB 512MB 1.5GB

7 August 14-15 2006 Virtual Memory Access Rights Avoid giving pages execution rights …unless you really need to. Enforce via PAGE_EXECUTE flag Use Read-only pages ( PAGE_READONLY ) Can catch bugs Performance benefit to loading asset data into memory, then marking as read only Access rights controlled using the VirtualProtect API

8 August 14-15 2006 Virtual Memory: Beware! Do not request very large continuous areas of virtual memory Address space can be fragmented 32-bit VM: Only 2GB Available Competing with EXEs, DLLs, stack, heaps, memory-mapped I/O, etc. Do not allocate more memory than is available (no paging!) Prefer 64K page sizes over 4K pages

9 August 14-15 2006 Virtual Memory: Beware! Third-party DLLs are often based at an arbitrary address Fragments VM space Rebase DLLs to fix this Some D3D9 drivers map all VRAM into VM Significant portion of VM as video memory increased Can result in strange crashes restoring device Make sure you are aware of this Is not an issue with 64 bit

10 August 14-15 2006 Virtual Memory Best Practices Be careful about VM address space fragmentation Keep custom heap allocations limited to 256 MB or so …Or less… Be careful about physical memory fragmentation

11 August 14-15 2006 General Purpose Allocation Global standard unbounded heap created automatically GetProcessHeap to access heap handle Aligned allocation – HeapAlloc 8-byte aligned on 32-bit Windows 16-byte aligned on 64-bit Windows 16-byte aligned on Xbox 360 Recommended for small to medium size allocations

12 August 14-15 2006 Minimize Allocations Allocate what you use up front Avoid allocations in-frame! Don’t process data on load (other than decompression) Block load data/flatten trees Avoid allocating small chunks of memory Prefer growable arrays/vectors over linked lists …Or avoid allocations altogether… ;)

13 August 14-15 2006 “Hidden” Allocations STL Lots of cases of allocations “under the hood” D3DX Internal allocations Not high performance XAudio Does physical allocations under the hood Many other potential cases – be aware!

14 August 14-15 2006 Wrap It Up… Write a general wrapper to sit on top of platform-specific APIs Removes need to worry about implementation details Can plug in your own custom allocation scheme without changing code Override new, delete, malloc, free… Add debug features Assert on debug

15 August 14-15 2006 Multi-core Considerations SynchronizationOverheadComplexity Memory usage efficiency

16 August 14-15 2006 One Heap Owned per Thread Memory allocation managed in one place Heap synchronization at a higher level Use HEAP_NO_SERIALIZE for heaps created with HeapCreate(…) API Single heap - simplification Potential problem with contention

17 August 14-15 2006 One Heap Per Thread Each thread creates its own heap Reserves a chunk of virtual memory Commits on demand (as heap grows) Data locality! Each thread manages its own allocations No synchronization required Use HEAP_NO_SERIALIZE for heaps allocated with HeapCreate(…) API Assert thread ID on allocate/deallocate

18 August 14-15 2006 Low Fragmentation Heap Build on top of existing heap Windows XP and Vista only Modify the default heap… …or a heap created with HeapCreate(…) ULONG ulInfo = 2; HANDLE hDefHeap = GetProcessHeap(); HeapSetInformation( hDefHeap, HeapCompatibilityInformation, HeapCompatibilityInformation, &ulInfo, &ulInfo, sizeof( ulInfo ) ); sizeof( ulInfo ) );

19 August 14-15 2006 LFH – How It Works Bucket 1 1 byte to 8 bytes Bucket 32 248 bytes To 256 bytes 8 Byte Granularity Bucket 33 257 bytes To 262 bytes Bucket 127 15858 bytes To 15873 bytes 16 Byte Granularity Bucket 128 15874 bytes To 16384 bytes Largest

20 August 14-15 2006 The C Run-Time Uses process default heap Uses this heap for temporary allocations new and new[] throw on out-of-memory condition by default Checking for NULL doesn’t buy anything Can link with nothrownew.obj to shut this off Watch out though – STL will then crash on OOM Can also use std::nothrow …e.g. Foo *pFoo = new( std::nothrow ) Foo;

21 August 14-15 2006 CRT Allocation Uses HeapAlloc(…) under the hood App compat modes to emulate older CRT Allocations 8-byte aligned _mm_malloc for aligned allocations (16 byte) 16-byte aligned on Win64 8–15 bytes of overhead on Win32 16–31 bytes of overhead on Win64 Overhead adjacent to allocation block

22 August 14-15 2006 CRT Allocation Uses global default heap CRT manages allocation 16-byte aligned 16–31 byte overhead per allocation Overhead is adjacent to allocation block This can be bad…

23 August 14-15 2006 Bad…! Very bad…! = overhead or unused (cold) = used memory (hot) 50% of cache line Unused! > 80% of cache line Unused! = cache line boundary 16 bytes

24 August 14-15 2006 Better…

25 Best… Investigate writing custom small block allocator Investigate custom heap options

26 August 14-15 2006 Platform Recommendations

27 August 14-15 2006 Recommendations Use VirtualAlloc(…) to reserve custom heaps Reasonably sized - <= 256MB Partition custom heaps to give asset data read-only access after load Memory-mapped files: Keep size under control in Win32 Be more aggressive in Win64 Be careful with execute rights

28 August 14-15 2006 GlobalAlloc Old 16-bit memory model allocation Included only for Emulate old Win3.x behavior Clipboard interaction Don’t use this in your game!

29 August 14-15 2006 Xbox 360 Memory APIs XPhysicalAlloc(…)XMemAlloc(…)XPhysicalProtect(…) To change memory protection on a previously allocated chunk of memory

30 August 14-15 2006 XPhysicalAlloc Not recommended for performance-critical code… Walks through pages in memory to find space Potentially rebases used VM pages in range (defragments) Recommended for infrequent allocations Once only for asset data on asset load Allocation for asset data space on startup Use write-combined memory

31 August 14-15 2006 XPhysicalAlloc The Algorithm Enters spin-lock to block other allocations Checks range & page availability Linear search pages for valid range Search each candidate range for immovable pages until none found Relocate all used VM pages in range Flush Virtual Address List Flush & Invalidate processor caches Flush & Invalidate TLB

32 August 14-15 2006 XPhysicalProtect Not designed for performance Look into using D3D macros instead: GPU_CONVERT_*() Defined in d3d9gpu.h Example:GPU_CONVERT_GPU_TO_CPU_ADDRESS_64KB Look for the MemoryViews sample (coming in the September XDK)

33 August 14-15 2006 XMemAlloc/XMemFree Allows custom memory management Used by XDK for many allocations Developers may provide own implementation XMemAllocDefault(…) is the “default”… Wraps XPhysicalAlloc(…) for write- combined or physical memory allocations Wraps LocalAlloc(…) for heap allocations

34 August 14-15 2006 XMemAlloc/XMemFree Overriding default recommended XAudio uses XMemAlloc(…) a lot internally to allocate physical memory XMemAlloc(…) used throughout XDK By overriding, you can significantly reduce some performance bottlenecks To override: just implement your own XMemAlloc(…) Call XMemAllocDefault(…) for cases you don’t want/need to handle yourself

35 August 14-15 2006 Caring for the Cache Keep related data together in memory Avoid small pages TLB contains only 1024 entries Lots of 4KB pages can cause TLB misses TLB misses are expensive Prefer 64KB pages

36 August 14-15 2006 Select Types Wisely CPU Write-only data (e.g. asset data) Write-combined memory, 64-KB pages Large read/write datasets Virtual memory, 64-KB pages Small/temporary allocations Application heap CRT/HeapAlloc/Custom Allocator Be aware of overhead and cache issues from generic allocators

37 August 14-15 2006 Write-Combined RAM Use for write-only data Reading is very expensive Always write in order, minimum 4-byte chunks Double-check what the compiler is doing May reorder writes, which kills perf

38 August 14-15 2006 In Summary Avoid Allocations In Frame! Minimize allocations where possible Avoid VM/physical memory fragmentation Think about overhead of standard APIs Consider custom heap solution Prefer 64K pages

39 © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. DirectX Developer Center Game Development MSDN Forums Xbox 360 Central XNA Web site End Slide

Download ppt "“WALK IN” SLIDE. August 14-15 2006 Memory Management Internals Steve Smith Software Design Engineer Game Technology Group Microsoft Presentation/Presenter."

Similar presentations

Ads by Google