Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 1 Emery Berger Memory Management for High-Performance Applications Department.

Similar presentations


Presentation on theme: "Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 1 Emery Berger Memory Management for High-Performance Applications Department."— Presentation transcript:

1 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 1 Emery Berger Memory Management for High-Performance Applications Department of Computer Sciences Advisor: Kathryn S. McKinley

2 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 2 High-Performance Applications Web servers, search engines, scientific codes C or C++ Run on one or cluster of server boxes software compiler runtime system operating system hardware Needs support at every level

3 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 3 New Applications, Old Memory Managers Applications and hardware have changed Multiprocessors now commonplace Object-oriented, multithreaded  Increased pressure on memory manager ( malloc, free ) But memory managers have not changed Inadequate support for modern applications

4 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 4 Current Memory Managers Limit Scalability As we add processors, program slows down Caused by heap contention Larson server benchmark on 14-processor Sun

5 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 5 The Problem Current memory managers inadequate for high-performance applications on modern architectures Limit scalability, application performance, and robustness

6 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 6 Contributions Building memory managers Heap Layers framework Problems with current memory managers Contention, false sharing, space Solution: provably scalable memory manager Hoard Extended memory manager for servers Reap

7 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 7 Implementing Memory Managers Memory managers must be Space efficient Very fast Heavily-optimized code Hand-unrolled loops Macros Monolithic functions  Hard to write, reuse, or extend

8 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 8 Building Modular Memory Managers Classes Rigid hierarchy Overhead Mixins Flexible hierarchy No overhead

9 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 9 A Heap Layer template class GreenHeapLayer : public SuperHeap {…}; Mixin with malloc & free methods

10 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 10 Example: Thread-Safe Heap Layer LockedHeap protect the superheap with a lock LockedMallocHeap

11 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 11 Empirical Results Heap Layers vs. originals: KingsleyHeap vs. BSD allocator LeaHeap vs. DLmalloc 2.7  Competitive runtime and memory efficiency

12 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 12 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap

13 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 13 Problems with General-Purpose Memory Managers Previous work for multiprocessors Concurrent single heap [Bigler et al. 85, Johnson 91, Iyengar 92] Impractical Multiple heaps [Larson 98, Gloger 99] Reduce contention but cause other problems: P-fold or even unbounded increase in space Allocator-induced false sharing we show

14 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 14 Multiple Heap Allocator: Pure Private Heaps One heap per processor: malloc gets memory from its local heap free puts memory on its local heap STL, Cilk, ad hoc x1= malloc(1) free(x1)free(x2) x3= malloc(1) x2= malloc(1) x4= malloc(1) processor 0processor 1 = in use, processor 0 = free, on heap 1 free(x3) free(x4) Key:

15 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 15 Problem: Unbounded Memory Consumption Producer-consumer: Processor 0 allocates Processor 1 frees Unbounded memory consumption Crash! free(x1) x2= malloc(1) free(x2) x1= malloc(1) processor 0processor 1 x3= malloc(1) free(x3)

16 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 16 Multiple Heap Allocator: Private Heaps with Ownership free returns memory to original heap Bounded memory consumption No crash! “Ptmalloc” (Linux), LKmalloc x1= malloc(1) free(x1) free(x2) x2= malloc(1) processor 0processor 1

17 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 17 Problem: P-fold Memory Blowup Occurs in practice Round-robin producer- consumer processor i mod P allocates processor (i+1) mod P frees Footprint = 1 (2GB), but space = 3 (6GB) Exceeds 32-bit address space: Crash! free(x2) free(x1) free(x3) x1= malloc(1) x2= malloc(1) x3=malloc(1) processor 0processor 1processor 2

18 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 18 Problem: Allocator-Induced False Sharing False sharing Non-shared objects on same cache line Bane of parallel applications Extensively studied All these allocators cause false sharing! processor 0processor 1 x2= malloc(1)x1= malloc(1) cache line thrash…

19 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 19 So What Do We Do Now? Where do we put free memory? on central heap: on our own heap: (pure private heaps) on the original heap: (private heaps with ownership) How do we avoid false sharing?  Heap contention  Unbounded memory consumption  P-fold blowup

20 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 20 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap

21 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 21 Hoard: Key Insights Bound local memory consumption Explicitly track utilization Move free memory to a global heap  Provably bounds memory consumption Manage memory in large chunks  Avoids false sharing  Reduces heap contention

22 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 22 Overview of Hoard Manage memory in heap blocks Page-sized Avoids false sharing Allocate from local heap block Avoids heap contention Low utilization  Move heap block to global heap Avoids space blowup global heap … processor 0processor P-1

23 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 23 Hoard: Under the Hood select heap based on size malloc from local heap, free to heap block get or return memory to global heap

24 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 24 Summary of Analytical Results Space consumption: near optimal worst-case Hoard:O(n log M/m + P) {P ¿ n} Optimal: O(n log M/m) [Robson 70]: ≈ bin-packing Private heaps with ownership: O(P n log M/m) Provably low synchronization n = memory required M = biggest object size m = smallest object size P = processors

25 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 25 Empirical Results Measure runtime on 14-processor Sun Allocators Solaris (system allocator) Ptmalloc (GNU libc) mtmalloc (Sun’s “MT-hot” allocator) Micro-benchmarks Threadtest:no sharing Larson: sharing (server-style) Cache-scratch:mostly reads & writes (tests for false sharing) Real application experience similar

26 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 26 Runtime Performance: threadtest speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors) Many threads, no sharing Hoard achieves linear speedup

27 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 27 Runtime Performance: Larson Many threads, sharing (server-style) Hoard achieves linear speedup

28 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 28 Runtime Performance: false sharing Many threads, mostly reads & writes of heap data Hoard achieves linear speedup

29 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 29 Hoard in the “Real World” Open source code www.hoard.org 13,000 downloads Solaris, Linux, Windows, IRIX, … Widely used in industry AOL, British Telecom, Novell, Philips Reports: 2x-10x, “impressive” improvement in performance Search server, telecom billing systems, scene rendering, real-time messaging middleware, text-to-speech engine, telephony, JVM  Scalable general-purpose memory manager

30 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 30 Custom Memory Allocation Programmers replace malloc / free Attempt to increase performance Provide extra functionality (e.g., for servers) Reduce space (rarely) Empirical study of custom allocators: Lea allocator often as fast or faster Custom allocation ineffective, except for regions.

31 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 31 Overview of Regions regionmalloc(r, sz) regiondelete(r) Regions: separate areas, deletion only en masse regioncreate(r) r Used in parsers, server applications

32 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 32 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap

33 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 33 Server Support Certain servers need additional support “Process” isolation Multiple threads, many transactions per thread Minimize accidental overwrites of unrelated data Avoid resource leaks Tear down all memory associated with terminated connections or transactions Current approach (e.g., Apache): regions

34 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 34 Regions: Pros and Cons + Fast + Pointer-bumping allocation + Deletion of chunks + Convenient + One call frees all memory regionmalloc(r, sz) regiondelete(r) Regions: separate areas, deletion only en masse regioncreate(r) - Space - Can’t free objects - Drag - Can’t use for all allocation patterns r

35 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 35 Regions Are Limited Can’t reclaim memory in regions  unbounded memory consumption Long-running computations Producer-consumer patterns Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming  Regions: wrong abstraction

36 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 36 Reap = region + heap Adds individual object deletion & heap Reap Hybrid Allocator reapmalloc(r, sz) reapdelete(r) reapcreate(r) r reapfree(r,p) Can reduce memory consumption Fast Adapts to use (region or heap style) Cheap deletion

37 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 37 Using Reap as Regions Reap performance nearly matches regions

38 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 38 Reap In Progress Incorporate Reap in Apache Rewrite modules to use Reap Measure space savings  Simplifies module programming & adds robustness against denial-of-service

39 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 39 Overview Building memory managers Heap Layers framework Problems with memory managers Contention, space, false sharing Solution: provably scalable allocator Hoard Extended memory manager for servers Reap

40 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 40 Open Questions Grand Unified Memory Manager? Hoard + Reap Integration with garbage collection Effective Custom Allocators? Exploit sizes, lifetimes, locality and sharing Challenges of newer architectures SMT/CMP

41 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 41 Contributions Memory management for high-performance applications Framework for building high-quality memory managers (Heap Layers) [Berger, Zorn & McKinley, PLDI-01] Provably scalable memory manager (Hoard) [Berger, McKinley, Blumofe & Wilson, ASPLOS-IX] Study of custom memory allocation; Hybrid high-performance memory manager for server applications (Reap) [Berger, Zorn & McKinley, OOPSLA-2002]

42 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 42 Backup Slides

43 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 43 Empirical Results, Runtime

44 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 44 Empirical Results, Space

45 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 45 Robust Resource Management User processes can bring down systems (= DoS) Current solutions Kill processes (Linux) Die (Linux, Solaris, Windows) Proposed solutions limit utilization Quotas, proportional shares Insight: As resources becomes scarce, make them “cost” more (apply economic model) Fork bomb – use all process ids Malloc all memory – exhausts swap

46 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 46 Future Work “Performance, scalability, and robustness” Short-term Memory management False sharing Robust garbage collection for multiprogrammed systems (with McKinley, Blackburn & Stylos) Locality – self-reorganizing data structures Compiler-based static error detection [Guyer, Berger & Lin, in preparation] Longer term Safety & security as dataflow problems Integration of OS/runtime/compiler

47 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 47 Rockall: Larson

48 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 48 Rockall: Threadtest

49 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 49 Hoard Conclusions As fast as uniprocessor allocators Performance linear in number of processors Avoids false sharing Worst-case provably near optimal  Scalable general-purpose memory manager

50 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 50 Conceptually Modular malloc free

51 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 51 A Real Memory Manager Modular design and implementation manage objects on freelist add size info to objects select heap based on size malloc free KingsleyHeap

52 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 52 Conclusion Memory management for high-performance applications Heap Layers framework [PLDI 2001] Reusable components, no runtime cost Hoard scalable memory manager [ASPLOS-IX] High-performance, provably scalable & space-efficient Reap hybrid memory manager [in preparation] Provides speed & robustness for server applications Future work: memory management, resource management, static error detection [in preparation]

53 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 53 Regions Waste Memory Drag = wasted memory caused by unreclaimed dead objects

54 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 54 Reap: Under the Hood manage pool of free chunks just add headers or manage as a heap select region or heap behavior support for nesting (hierarchy of reaps)

55 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 55 Memory Management Unifying Reap & Hoard Comprehensive evaluation of custom allocators [in preparation] Not only bad software engineering, also largely ineffective GC to reduce working set size (with Steve Blackburn & Jeffrey Stylos @UMass, Kathryn McKinley) Multiprogrammed systems Twofold increase in WS  half as many programs resident, lots of paging Cooperation between GC & VM Examples: VM about to pageout – trigger GC, GC about to scan – query VM for residency

56 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 56 Robust Resource Management User processes can bring down systems (= DoS) Current solutions Kill processes (Linux) Die (Linux, Solaris, Windows) Proposed solutions limit utilization Quotas, proportional shares Insight: As resources becomes scarce, make them “cost” more (apply economic model) Fork bomb – use all process ids Malloc all memory – exhausts swap

57 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 57 Static Error Detection Many errors we’d like to detect statically: Usage errors (double locks, sockets, files) Information leaks Security Lots of recent work: Syntactic state machines (Metal) [Engler] Program abstraction + theorem proving (SLAM) [Ball] Type systems [Shankar, Foster] What’s right approach?

58 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 58 Error Detection: Observations Programmer time is expensive Annotations or specifications are a lot of work Type theory approaches require intervention Computer time is cheap (but not unbounded!) Theorem-based & flow-sensitive techniques still exponential False positives must be near zero to be useful Sound analysis invaluable for security Lexical approaches unsound

59 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 59 Detecting Errors with Configurable Whole-Program Dataflow Analysis Insight: model errors as dataflow analysis problems on aggressive compiler framework (with Samuel Guyer, Calvin Lin) Interprocedural, flow-sensitive, precise pointer analysis Drive analysis with simple dataflow language Very promising results Captures extremely general class of errors Fast (7 minutes for 200KLOC), works on unmodified C programs Lowest reported false positive rates (none!) on format string vulnerability problems

60 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 60 Example: Format String Vulnerability Subject of numerous CERT advisories Improper use of printf() family Enables “stack-smashing” attacks Solution: “Taintedness” analysis [Foster et al., Perl] Data from untrusted sources is “tainted” Ensure tainted data may not end up in format string fgets(buffer, size, file); printf(buffer);

61 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 61 Taintedness Dataflow Analysis Taintedness lattice Transfer functions Functions that produce tainted data: scanf(), getenv(), read() Functions that pass on taintedness: strdup(), strcpy(), sprintf() Error conditions Functions with format string arguments printf() family, syslog() property Taint : { Tainted { Untainted }} (top) Untainted Tainted (bottom)

62 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 62 Test programs Five programs bftpd : FTP daemon muh : IRC proxy named : Name server in the BIND package lpd : Print daemon cfengine : System administration tool All programs have the format string vulnerability Notice: these are real programs Mature versions were distributed with the bug Exploits are known and have been used to compromise machines

63 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 63 Results Run on Pentium 4 2Ghz, 512 MB RAM ProgramLinesProceduresTimeErrorsFoundFalse Positives bftpd1,0171800:01110 muh5,0022280:06110 named25,8204441:11110 lpd38,17472623:57110 cfengine45,1027006:38660

64 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 64 Research Methodology Transparently improve program and programmer efficiency Compilers & runtime systems, OS involvement Apply theory and rigorous experimentation to systems development Seek “economically insensitive” problems Brains, not money

65 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 65 Conclusions Memory management support for high- performance applications Framework for building high-quality allocators (Heap Layers) [Berger, Zorn & McKinley, PLDI-01] Scalable & high-performance general-purpose allocator (Hoard) [Berger, McKinley, Blumofe & Wilson, ASPLOS-IX] Extended high-performance allocator for server applications (Reap) [Berger, Zorn & McKinley, to be submitted]

66 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 66 Reap Conclusions Custom allocation often not much faster Use better general-purpose allocator Important exception: Regions Fast but waste space Reap: best of both worlds Extends general-purpose allocation Fast, space-efficient & flexible Eliminates the need for most custom allocators

67 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 67 Experimental Methodology Built analogous allocators using heap layers KingsleyHeap (BSD allocator) LeaHeap (based on Lea allocator 2.7.0) Three weeks to develop 500 lines vs. 2,000 lines in original Compared performance with originals SPEC2000 & standard allocation benchmarks

68 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 68 Custom Allocators Are Effective Average 30% faster than system allocator + wrapper

69 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 69 Custom Allocators Not Effective Lea allocator + wrapper: as fast as custom, except for regions

70 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 70 Reap Runtime

71 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 71 Runtime Compared to General-Purpose Allocators

72 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 72 Space Compared to General-Purpose Allocators

73 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 73 A General-Purpose Memory Manager for High-Performance Applications To support high-performance applications, memory manager requires: Speed Fast malloc / free Scalability Performance linear in number of processors Space-efficiency Worst-case and average-case

74 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 74 Contributions Heap Layers framework for building memory managers Reusable components, no runtime cost Simplifies implementations, good experimental platform Hoard scalable general-purpose memory manager High-performance, provably scalable & space-efficient Comprehensive evaluation of custom allocators Not only bad software engineering, also largely ineffective Reap hybrid general-purpose memory manager Combines regions and heaps Provides robustness for server applications

75 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 75 Uniprocessor Memory Allocators Standard on many operating systems Scalability: Poor Heap contention Single lock protects heap Space: Excellent Nearly optimal for most programs [Wilson & Johnstone, 2000]

76 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 76 Example: Debugging Heap Layer DebugHeap Protects against invalid & multiple frees. DebugLockedMallocHeap

77 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 77 Hoard Example malloc from heap block on local heap free returns memory to its heap block local heap “too empty”:  move heap block to global heap x1= malloc(1) processor 0global heap free(x7) …some mallocs …some frees Empty fraction = 1/3

78 Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 78 Hoard Details “Segregated size class” allocator Size classes above 4K are logarithmically-spaced Superblocks hold objects of one size class empty superblocks are “recycled” Approximately radix-sorted: Allocate from mostly-full superblocks Fast removal of mostly-empty superblocks 8 1624 32 40 48 sizeclass bins radix-sorted superblock lists (emptiest to fullest) superblocks


Download ppt "Memory Management for High-Performance Applications - Ph.D. defense - Emery Berger 1 Emery Berger Memory Management for High-Performance Applications Department."

Similar presentations


Ads by Google