Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconsidering Custom Memory Allocation Emery D. Berger Benjamin G. Zorn Kathryn S. McKinley November 2002 Proceedings of the Conference on Object-Oriented.

Similar presentations


Presentation on theme: "Reconsidering Custom Memory Allocation Emery D. Berger Benjamin G. Zorn Kathryn S. McKinley November 2002 Proceedings of the Conference on Object-Oriented."— Presentation transcript:

1 Reconsidering Custom Memory Allocation Emery D. Berger Benjamin G. Zorn Kathryn S. McKinley November 2002 Proceedings of the Conference on Object-Oriented Programming: Systems, Languages, and Applications (OOPSLA) 2002

2 Lecture Topics Custom memory allocators General purpose allocators Regions (good performance) Reaps (very good performance and more) Results and Conclusions

3 Key Contributions of the paper A comprehensive evaluation of custom allocators Custom allocations vs. General-Purpose allocators (memory consumption and performance) Most programmers seeking faster memory allocation should use Lea allocator rather than writing their own

4 Key Contributions of the paper – Cont. The custom allocators that do provide higher performance use regions Reaps are even better

5 Key Contributions of the paper – Cont. If you need fast regions – use reaps Otherwise use Lea allocator, rather than any custom allocator.

6 Related Works Articles in the trade press claim Custom Allocators are a good idea “ Effective C++ ” “ C++ Programming language ” Benjamin Zorn in 1993 claims it to be a waste of time Articles on region allocation (arenas, groups, zones) We find that all of them are true

7 General-purpose memory allocators Windows XP allocator Lea allocator (Linux)

8 Lea Allocator An approximate best-fit allocator with different behavior based on object size Small Objects (<64 bytes) allocated by exact- size quicklists Medium Objects (<128K) – coalesce quicklists Large Objects – allocate and free by mmap The best allocator known

9 Our Benchmarks

10 Emulating Custom Semantics Custom allocators often support different semantics from C interface Region emulator Full region semantics General allocator Records a pointer to each allocated object to allow region deletion The pointer recorded in an out-of-band array – no impact on drag

11 Custom memory allocators - Definition Memory allocation mechanism that differs from general-purpose allocator in at least one of two: May provide more than one object for every allocated chunk of memory May not immediately return objects to the system/general-purpose allocator No wrappers

12 Custom allocators – widespread use Recommended as an optimization technique in a trade press Apache web server, GCC, C++ STL Direct support by C++ (by overloading new and delete operators)

13 Why programmers use Custom Allocators? Improving runtime performance Reducing memory consumption Improving software engineering (?)

14 Improving runtime performance 16% (average) of the run-time in the memory allocator Most our benchmarks reason Per-operation cost of general allocators In programs with intensive use of allocator

15 Improving runtime performance – Cont.

16 Reducing memory consumption

17 Improving software engineering (?) Memory allocated by a custom allocator can ’ t be managed by another allocator Free on custom allocated object may cause a segmentation fault Difficult to understand the source of memory consumption in the program No Purify No parallel allocator for SMP scalability No GC No shared multi-language heap

18 Improving software engineering (!) Region-based allocator simplifies memory management Memory area can be deleted by a single call Separate memory areas Regions are good for multithreaded server applications Memory spaces isolation Memory leaks prevention Apache web server

19 A Taxonomy of Custom Allocators Apply your knowledge about some set of objects Use regions to free objects dead at the same time Take advantage of object sizes Use known allocation patterns

20 Benchmark allocators characteristics Per-class allocators Regions Nested regions Obstack Custom patterns

21 Per-class allocators Objects of the same size (type) Eliding size checks Freelist with objects of the specific type The same API like malloc and free

22 Regions Allocation by incrementing a pointer to a large chunks of memory Only entire region deletion - no deletion of individual objects freeAll function Nested regions Nested object lifetime Obstack ( “ Object Stack ” ) Deletion of every object allocated after a certain object

23 Custom patterns A general – purpose allocator optimized for a particular pattern of object behavior

24 Custom allocators characteristics – Cont.

25 Problems with regions Excessive memory retention Unbounded memory consumption Unbounded buffers Dynamic arrays Producer – consumer patterns Complicated programming of server applications (Apache)

26 The ideal allocator Region Semantics + General-Puspose Allocation (heap) = Reaps

27 Heaps malloc free Regions malloc freeAll Reaps malloc free freeAll

28 Reaps - Example r region chunks heap reapCreate(r); x1 x1 = reapMalloc(r, 8); x2 x2 = reapMalloc(r, 8); x3 x3 = reapMalloc(r, 16); x4 x4 = reapMalloc(r, 8); x3 reapFree(r, x3);

29 Implementation Issues Initially, Region – similar behavior Allocation by bumping a pointer Geometrically-increasing chunks of memory threaded onto a linked list Header for every allocated object Freed objects (reapFree) are placed in an associated heap Allocations use memory from this heap

30 Reap allocation interface void reapCreate (void ** reap, void ** parent); void reapDestroy (void ** reap); void reapFreeAll (void ** reap); //clear void * reapMalloc (void ** reap, size_t size); void reapFree (void ** reap, void * object);

31 Design issues Heap Layers Mixins

32 Design issues – Cont. RegionHeap CoalesceableHeap ClearOptimizedHeap NestedHeap LeaHeap Sbrk

33 Design issues – Layers LeaHeap layer high speed low fragmentation NestedHeap layer ClearOptimizedHeap layer nothingOnHeap flag Fast allocations by pointer bumping on first heap Second heap – after freeing an object CoalesceableHeap layer adds per-object metadata RegionHeap layer Linked list of allocated objects clear()

34 Benchmark allocation statistics

35 Benchmark allocation statistics – Cont. Programs with general-purpose allocators Not allocation-intensive Spend little time in memory allocator Programs with custom allocators Tend to allocate many small objects More time in memory allocator Correct pinpointing of memory manager as a significant factor in the performance

36 Results Different memory management policies compared (general, custom, reaps) Execution time Memory consumption

37 Results - technicalities Runtime – the best of three Visual C++ 6.0 compilation Pentium III 600MHz 320Mb under Windows XP

38 Runtime Performance

39 Runtime Performance – Cont. Custom Vs Windows – justifies the use of custom allocator Lea provides almost the same performance as custom - except regions Reaps are comparable to Lea and to custom

40 Memory Consumption

41 Memory Consumption – Cont. No Windows XP – no equivalent way to keep track of memory consumption Reaps – don ’ t use individual deletion Mixed results Region space advantage - misleading

42 Evaluating Region Allocation Total drag – an average ratio of heap sizes with and without immediate object deallocation Immediate free of every dead object – total drag of 1 Non-region allocators – minimal drag Region allocators – high drag, substantial increase in memory consumption

43 Evaluating Region Allocation – Cont.

44 Experimental Comparison to previous work

45 Reaps in Apache Using space consumption advantages by allowing individual deletion bc – an arbitrary-precision calculator language Apache region rerouting to reaps + reapFree ( ap_pfree ) call Redefinition of malloc and free in bc Computing 1000 th prime consumes 7.4Mb without ap_free and 240 kilobytes with

46 Why programmers use custom allocators to no effect Recommended practice Premature optimization Drift Improved competition

47 Conclusions Despite widespread belief custom allocator doesn ’ t always improve performance Lea allocator is as fast or even faster The exception is region-based allocator Reaps – high-performance and reduction in memory consumption

48 Future plans Reaps integration with Hoard scalable memory allocator Reaps integration into garbage- collected setting

49 Questions ?

50 The End

51 Custom Allocator implementation Standard C++ way (inheritance) Significant overhead of virtual method dispatch Limits compiler optimizations Fixed relations between classes, single inheritance structure – difficult reuse

52 Mixins Can be reparented template class Mixin : public Super{}; No single class hierarchy class Composition1 : public A {}; class Composition2 : public A {};

53 Heap Layers Mixin Provides Malloc and Free Coding Guidelines Handle NULL returned by malloc() correctly Destructor must free any memory held by layer Top heaps – system-provided memory wrappers

54 Example – Composing a Per-Class Allocator Per – class pool of memory Same-sized objects Singly-linked freelist for memory management No change of source code for the original class PerClassHeap Utility Class - to adapt a class to use heap layer as its allocator FreeListHeap Heap Layer

55 Example - PerClassHeap Template class PerClassHeap : public Object { public: inline void * opertor new (size_t sz){ return getHeap().malloc (sz);} inline void * opertor delete (void * ptr){ return getHeap().free (ptr);} private: static SuperHeap& GetHeap (){ static SuperHeap theHeap; return theHeap;}};

56 Example - FreeListHeap

57 Example - Combination Foo subclass that uses per-class pools Class FasterFoo: public PerClassHeap > {};

58 The End!!!


Download ppt "Reconsidering Custom Memory Allocation Emery D. Berger Benjamin G. Zorn Kathryn S. McKinley November 2002 Proceedings of the Conference on Object-Oriented."

Similar presentations


Ads by Google