Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint.

Similar presentations

Presentation on theme: "Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint."— Presentation transcript:

1 Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint work with: Chris Lattner, Dinakar Dhurjati, Sumant Kowshik Thanks: NSF (CAREER, Embedded02, NGS00, NGS99, OSC99), Marco/DARPA

2 Why Does Data Layout Matter? … and complex heap-based data structures are ubiquitous. Performance Working sets Spatial locality Temporal locality Heap allocation overheads Security Buffer overruns Dangling pointers Uninitialized pointers S/w Reliability Dangling pointers Checkpointing Static bug detection Static data race detection

3 Compiling Pointer-Intensive Codes Today Current analyses and transformations focus on primitives  disambiguate individual loads and stores  optimize individual loads and stores  reorder, split, or merge individual data types Q. Can compilers manipulate entire logical data structures ? A list? A tree of linked lists? A hashtable? A graph?

4 List 1 Nodes What the program creates : List 2 Nodes Tree Nodes What the compiler SHOULD create and see : What the compiler sees :

5 Why Segregate Data Structures into Pools? Programs are designed around data structures Direct benefit of segregation: Better performance  Smaller working sets  Improved spatial locality  Sometimes convert irregular to regular strides Primary Goal: Better compiler information & control  Compiler knows where (sets of) data structures live in memory  Compiler knows order of data in memory (in some cases)  Compiler knows type information  runtime points-to graph  Compiler knows which pools point to which other pools  Compiler knows bounds on pool lifetimes

6 Outline  Automatic Pool Allocation [ LA:PLDI05] Using Pool Allocation to Improve Performance  Use 1: Improving heap locality, performance  Use 2: Transparent pointer compression [ LA:MSP05] Using Pool Allocation for Bug Detection, Security  Use 3: Detecting buffer overruns fast and transparently [ DA:ICSE06 ]  Use 4: Detecting all dangling pointer errors fast [ DA:Submitted ]  Use 5: SAFECode... SAFECode: A Safe Execution Environment for C/C++  Sound program analysis, memory safety for full C [ DKA:PLDI06 ]  Memory safety for “type-safe” C [ DKAL:TECS05 ]

7 Automatic Pool Allocation The transformation algorithm [Lattner and Adve, PLDI 2005] (Best Paper Award)

8 Current Manual Pool Allocation  Via library : By class (e.g., C++ STL), scope, or data structure  Via language support : By scope or data structure Pool Allocation: Current Approaches Never automated before 1.Imperative languages including C, C++, … 2.Pool allocation by logical data structures Goal is memory management, not layout control, not DS separation Compiler has no information about pool properties Automatic Region Inference for ML (Tofte & Birkedal, Aiken)  By lifetime only, e.g., stack of regions  Limited destructive updates

9 Pool Allocation: The Key Insight Partition heap objects according to the results of some pointer analysis. The pointer analysis representation we use is called a Data Structure Graph (DS Graph).

10 DS Graph Properties int G; void twoLists() { list *X = makeList(10); list *Y = makeList(100); addGToList(X); addGToList(Y); freeList(X); freeList(Y); } Object type {G,H,S,U} : Storage class list: HMRC list*int X list: HMRC list*int Y int: GMRC G Field-sensitive for “type-safe” nodes Each pointer field has a single outgoing edge These data structures have been proven (a) disjoint ; (b) confined within twoLists ()

11 DS Graph for Olden MST Benchmark Key Insight : “Fully context-sensitive” points-to graph identifies data structure instances “Fully context-sensitive”  Identify objects by full acyclic call paths

12 DS Graph for Olden EM3D Benchmark

13 Olden-Power Benchmark build_tree() t = malloc(…); t->l = build_lateral(…); build_lateral() l = malloc(…); l->next = build_lateral(…); l->b = build_branch(…); DS Graph for Olden Power Benchmark

14 Segregate memory according to points-to graph N graph nodes  1 pool (default: 1-to-1) Retain explicit free() for objects Automatic Pool Allocation Overview Pool 1Pool 2 Pool 1 Pool 2 Pool 3 Pool 4 Points-to graph (two disjoint linked lists)

15 Points-to Graph Assumptions Specific assumptions:  Separate points-to graph for each function  Unification-based graph  Can be used to compute escape info Use any points-to that satisfies the above Our implementation uses DSA [Lattner:PhD]  Infers C type info for many objects  Context-sensitive  Field-sensitive analysis  Results show that it is very fast: Linked List list: HMR list*int head DSA+pool allocation time < 3% of GCC -O3 for all tested programs.

16 list *makeList(int Num) { list *New = malloc(sizeof(list)); New->Next = Num ? makeList(Num-1) : 0; New->Data = Num; return New; } int twoLists( ) { list *X = makeList(10); list *Y = makeList(100); GL = Y; addGToList(X); addGToList(Y); freeList(X); freeList(Y); } Pool Allocation: Example Pool P1; poolinit(&P1); pooldestroy(&P1);, &P1), Pool* P) poolalloc(P);, P), &P1) P1, P2) Pool* P2), P2) P2 Change calls to free into calls to poolfree  retain explicit deallocation

17 Pool Allocation Algorithm Details Indirect Function Calls: call fp1 arg1 … argN fp1  { F1, F2 } call fp2 arg1 … argN fp2  { F2, F3 }  Must pass same pool arguments to F1, F2 and F3  Partition functions into equivalence classes: If F1, F2 have common call-site  same class  Merge points-to graphs for each equivalence class  Apply previous transformation unchanged Pools reachable from global variables  Such a pooldesc is a “runtime constant,” so make it global also  See paper for details [LA:PLDI05]

18 Two Further Refinements (1) Eliminating poolfree()  poolfree() “just before” pooldestroy() is redundant  This is effectively Static Garbage Collection ! DS = Create(P); ProcessData(DS); Free(DS, P);// redundant if... pooldestroy(&P); (2) Reducing Pool Lifetimes  Pools need not be created / destroyed at function boundaries  Intraprocedural flow analysis to create later, destroy earlier  Can be extended interprocedurally [Aiken et al., PLDI 96]

19 Pool Allocation Properties Strengths: Transparent: Fully automatic for any LLVM program Static Map: Every pointer var/field points to unique, known pool Pool Type Information: Many type-homogeneous pools Lifetimes: Lifetime of every pool is bounded Pool Points-to Graph: Compiler knows which pools contain pointers to every pool, and vice versa Limitations: 1. No deallocation: No automatic deallocation of items in pools 2. Unsafe: No guarantee of memory safety 3. Lifetimes: Pools reachable from global vars have global lifetime 4. Missing type info : Type-unsafe objects (DS nodes)

20 Use 1 of Pool Allocation Improving performance of heap-intensive codes [Lattner and Adve, PLDI 2005]

21 Simple Pool Allocation Statistics Programs from SPEC CINT2K, Ptrdist, FreeBench & Olden suites, plus unbundled programs DSA is able to infer that most static pools are type-homogenous 91 DSA+ Pool allocation compile time is small: less than 3% of GCC compile time for all tested programs. See paper for details

22 Pool Allocation Speedup Several programs unaffected by pool allocation 10-20% speedup across many pointer intensive programs Some programs (ft, chomp) order of magnitude faster Most programs are 0% to 20% faster with pool allocation alone Two are 10x faster, one is almost 2x faster

23 Cache/TLB miss reduction Sources:  Defragmented heap  Reduced inter-object padding  Segregating the heap! Miss rates measured with perfctr on AMD Athlon 2100+

24 Chomp Access Pattern with Malloc Allocates three object types (red, green, blue) Spends most time traversing green/red nodes Each traversal sweeps through all of memory Blue nodes are interspersed with green/red nodes

25 Chomp Access Pattern with PoolAlloc

26 FT Access Pattern With Malloc Heap segregation has a similar effect on FT:  See Lattner’s Ph.D. thesis for details

27 Different Data Structures Have Different Properties Pool allocation segregates heap:  Optimize using pool-specific properties Examples of properties we look for:  Pool is type-homogenous  Pool contains data that only requires 4-byte alignment  Opportunities to reduce allocation overhead buildtraversedestroy complex allocation pattern Pool Specific Optimizations list: HMR list*int head list: HMR list*int head list: HMR list*int head

28 Looking closely: Anatomy of a heap Fully general malloc-compatible allocator:  Supports malloc/free/realloc/memalign etc.  Standard malloc overheads: object header, alignment  Allocates slabs of memory with exponential growth  By default, all returned pointers are 8-byte aligned In memory, things look like (16 byte allocs): 16-byte user data 16-byte 16-byte One 32-byte Cache Line 4-byte object header 4-byte padding for user-data alignment

29 Pool-Specific Optimizations 1.Selective Pool Allocation  Don’t pool allocate when not profitable 2.PoolFree Elimination  poolfree redundant if followed by pooldestro y 3.“Bump-pointer” allocation if pool has no poolfree :  Eliminate per-object header  Eliminate freelist overhead (faster object allocation) 4.Type-safe pools infer a type for the pool:  Use 4-byte alignment for pools we know don’t need it

30 PAOpts (3/4): Bump Pointer Optzn If a pool has no poolfree’s:  Eliminate per-object header  Eliminate freelist overhead (faster object allocation) Eliminates 4 bytes of inter-object padding  Pack objects more densely in the cache Interacts with poolfree elimination (PAOpt 2/4)!  If poolfree elim deletes all frees, BumpPtr can apply 16-byte user data 16-byte One 32-byte Cache Line 16-byte user data 16-byte

31 PAOpts (4/4): Alignment Analysis Malloc must return 8-byte aligned memory:  It has no idea what types will be used in the memory  Some machines bus error, others suffer performance problems for unaligned memory Type-safe pools infer a type for the pool:  Use 4-byte alignment for pools we know don’t need it  Reduces inter-object padding 16-byte user data 16-byte 16-byte One 32-byte Cache Line 4-byte object header 16-byte user data

32 Pool Optimization Speedup (FullPA) Baseline 1.0 = Run Time with Pool Allocation Optimizations help all of these programs:  Despite being very simple, they make a big impact Most are 5-15% faster with optimizations than with Pool Alloc alone One is 44% faster, other is 29% faster Pool optzns effect can be additive with the pool allocation effect Pool optimizations help some progs that pool allocation itself doesn’t PA Time

33 Use 3 of Pool Allocation Detecting buffer overruns fast and transparently [Dhurjati and Adve, ICSE 2006, to appear]

34 Array Bounds Errors Most common reason for security attacks  Over 50% of attacks reported by CERT 1988: First exploited … 2006: Continues to get exploited Key problem : Tracking target object of each pointer is very expensive (without “fat pointers”)

35 Jones-Kelley: Transparent Bounds Checking p = malloc(n * sizeof(int)); … q =...; … r = q + i; (…, …) (p,n *4) (…, …) ref = lookup(q); Check(ref, r); lookup q (p, n*4) Idea : Register all array objects in a global splay tree; lookup on every pointer calculation Advantage : Backwards-compatible: no wrappers needed Problem : 4-5x slowdowns (up to 12x for Ruwase-Lam extension)

36 Separate search tree per pool p = malloc(n * sizeof(int)); … q =...; … r = q + i; ref = lookup(P1,q); Check(ref, r); (p, n*4) P1 (…, …) P2 3 Key Insights: 1.Splay tree for a pool should be (very) small. In fact, 2-element cache works great! 2.Pool for each pointer is known! 3.In type-homogeneous pools, can distinguish (and ignore) scalars.

37 Experimental Results Dramatic improvement in lookup overheads  Average overhead: 12% for Olden (34%, 69% for 2 cases)  < 4% for 2 system daemons Compares with 5x-6x for original Jones-Kelly. Up to 11x-12x for Ruwase-Lam extension (which we use). Effective in finding bugs  Zitser’s suite: models 14 buffer overruns in sendmail (7), wu-ftpd (4), bind (3)  All 14 detected successfully. Caveat: Like J-K, doesn’t work for casts from pointers to int and back

38 Use 5: SAFECode A Safe Compilation Strategy for C/C++ Programs Sound analysis [Dhurjati and Adve, PLDI 2006, to appear] Formal proof of soundness is in accompanying technical report [TR: UIUCDCS-R-2005-2657]. Memory safety [Dhurjati et al., PLDI 2006, TECS 2005]

39 Safe Languages Provide Basic Guarantees 1.Prevent memory access violations 2.Detect errors during development 3.Enable sound compile-time analyses  e.g. in tools for safety checking, model checking, program verification e.g., Java, C#, Modula -3, ML Weakly typed languages like C, C++ do not provide any of these benefits Often ignored

40 Why care about C/C++? Huge body of essential legacy software Dominant in critical domains: OS kernels, embedded systems, daemons, language run-time systems. Example: Microsoft Longhorn (basis of Vista)?  Less than 25% in C# [Amitabh Srivastava, CGO 04 keynote address]  Mostly high level components, e.g., windowing system  Performance critical code still in C/C++ The features that make C/C++ popular for system software are the features that make C/C++ unsafe: Nested structs; stack-allocated objects; untagged unions; explicit free; custom allocators.

41 SolutionOverhead No memory violations Error checking Sound static analysis Purify, Valgrind Several 100x-some- SafeC 5x-some- Jones-Kelley 5-6x-some- SFI Over 2xy-- FisherPatil 2x-6xYY- Yong Over 2x-some- SAFECode 0-30%YsomeY CCured Upto 1.87xY someY Cyclone 1x-2xYsomey Modified C Pure C Current Solutions

42 SAFECode Compiler and Run-time System A typed assembly language (LLVM)  Language-independent  Simple, transparent runtime system Sound analysis and memory safety  Heap safety: via Automatic Pool Allocation + run-time checks  Stack safety: via Data Structure Analysis (DSA) + heap conversion  Array safety: via pool checks or precise array bounds checks Initially, for “type-safe” C, with restricted pointer casts [TECS 2005] Now, for nearly arbitrary, unmodified C programs [PLDI 2006]

43 Guaranteeing Static Analysis Many program verification tools build on alias analysis, call graph, assumed type information  E.g., SLAM, ESP, BLAST Memory errors can invalidate these analyses Detecting all memory errors is expensive  Dangling pointer errors  Precise array bounds errors Solution : Enforce key analyses in the presence of some memory errors: Alias analysis, call graph, type information.

44 What is Alias Analysis int P[4]; P[i] = …. struct List *Q = (Struct List *)P; Q->val = … TU S,A P Q field0 TK : Type Known, TU : Type Unknown struct List* head = makeList(20); struct List (TK) H next val head A static summary of memory objects and their connectivity

45 B Int S,A Memory errors invalidate alias analysis struct List tail, head; head.field1 = &tail; Tmp = (struct List*)B; Tmp->field6 =.. //could corrupt head.field1 int B[4]; &tail Struct List s Field1 &head Struct List s Field1 Tmp TU ?? head.field1 could point any where in memory pointer analysis incorrect head.field1 could corrupt memory of another TK node B Int S,A

46 Enforcing Alias Analysis Problem 1:  Must ensure that tmp points to an object in this points-to set With normal allocation:  Objects are scattered in memory  Checking set membership at run-time is extremely expensive Insight1: Automatic Pool Allocation partitions heap corresponding to nodes in the graph. These partitions are compact and can be checked efficiently! struct List (TK) H next val tmp Caveat: Currently only flow-insensitive, unification based TU S,A field0

47 Enforcing Alias Analysis Problem 2:  Checking every pointer access or initialization is still very expensive Insight 2: Ignoring memory errors, any pointer obtained from TK pool already has correct aliasing behavior. Pointers obtained from other pools will be explicitly checked: Poolcheck(PP, p, align): Mask lower k bits of p, look in hash table of page addresses in PP Alignment check if array references in TK pool

48 Tolerating Dangling Pointers Problem 3:  But memory errors (dangling pointer errors, array bounds violations) could corrupt locations in TK pools Insight 3 (also used for “type-safe” C w/o GC): Reallocating a freed block to a new request of the same type cannot cause any type violation or (in the same pool) aliasing violation, despite dangling pointers. Only array references in TK pools must be checked (can optimize): Poolcheck(PP, p, align).

49 Evaluation of Run-time Overhead Programs: Olden, Ptrdist, 3 system daemons No source changes necessary Compared Olden with Ccured. ProgramSAFECode ratio CCured ratio bh 1.031.31 bisort 1.000.97 em3d 1.271.49 treeadd 0.992.72 tsp 0.991.23 yacr2 1.30- ftpd 1.00- fingerd 1.03- Max 1.302.72

50 Summary

51 What Could You Do With Pool Allocation? Embedded Systems  Pointer compression, data compression for embedded codes  Data partitioning for explicit local memories / buffers / tiles  Power savings for dead / dormant pools Dependable Systems  Efficient checkpointing by ignoring unmodified pools  Efficient replicated execution for servers  Focusing instrumentation for program testing High Performance Systems  Data-structure-centric profiling  Linked pointer prefetching  …

52 Summary Automatic Pool Allocation Gives compilers information about data structure layouts, lifetimes, points-to information SAFECode A sound execution strategy for C, C++ programs: enable sound analysis, enforce memory safety.


Download ppt "Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint."

Similar presentations

Ads by Google