Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Dynamic Memory Allocation I Topics Simple explicit allocators Data structures Mechanisms Policies CS 105 Tour of the Black Holes of Computing.
Dynamic Memory Allocation I Topics Basic representation and alignment (mainly for static memory allocation, main concepts carry over to dynamic memory.
The Interface Definition Language for Fail-Safe C Kohei Suenaga, Yutaka Oiwa, Eijiro Sumii, Akinori Yonezawa University of Tokyko.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
5. Memory Management From: Chapter 5, Modern Compiler Design, by Dick Grunt et al.
Various languages….  Could affect performance  Could affect reliability  Could affect language choice.
Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap June 13, 2005 PLDI Chris.
SAFECode Memory Safety Without Runtime Checks or Garbage Collection By Dinakar Dhurjati Joint work with Sumant Kowshik, Vikram Adve and Chris Lattner University.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
SAFECode SAFECode: Enforcing Alias Analysis for Weakly Typed Languages Dinakar Dhurjati University of Illinois at Urbana-Champaign Joint work with Sumant.
Introduction The Approach ’ s Overview A Language of Pointers The Type System Operational Semantics Type Safety Type Inference The Rest of C Experiments.
Static Analysis of Embedded C Code John Regehr University of Utah Joint work with Nathan Cooprider.
Type-Safe Programming in C George Necula EECS Department University of California, Berkeley.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Run time vs. Compile time
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
Catriel Beeri Pls/Winter 2004/5 environment 68  Some details of implementation As part of / extension of type-checking: Each declaration d(x) associated.
Compile-Time Deallocation of Individual Objects Sigmund Cherem and Radu Rugina International Symposium on Memory Management June, 2006.
Chapter 5: Memory Management Dhamdhere: Operating Systems— A Concept-Based Approach Slide No: 1 Copyright ©2005 Memory Management Chapter 5.
Run-time Environment and Program Organization
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
May 22, 2002OSQ Retreat 1 CCured: Taming C Pointers George Necula Scott McPeak Wes Weimer
Automatic Pool Allocation for Disjoint Data Structures Presented by: Chris Lattner Joint work with: Vikram Adve ACM.
Checking Memory Safety with BLAST Dirk Beyer, et al. FASE 2005 KAIST CS750b 2006 Fall Seonggun Kim.
COP4020 Programming Languages
Secure Virtual Architecture: A Safe Execution Environment for Commodity Operating Systems John Criswell, University of Illinois Andrew Lenharth, University.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
Making Object-Based STM Practical in Unmanaged Environments Torvald Riegel and Diogo Becker de Brum ( Dresden University of Technology, Germany)
Automatic Data Partitioning in Software Transactional Memories Torvald Riegel, Christof Fetzer, Pascal Felber (TU Dresden, Germany / Uni Neuchatel, Switzerland)
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
Backwards-Compatible Array Bounds Checking for C with Very Low Overhead Dinakar Dhurjati and Vikram Adve ICSE 2006 Itay Polack
Cache Locality for Non-numerical Codes María Jesús Garzarán University of Illinois at Urbana-Champaign.
Presentation of Failure- Oblivious Computing vs. Rx OS Seminar, winter 2005 by Lauge Wullf and Jacob Munk-Stander January 4 th, 2006.
Compiler Construction
9-1 9 Variables and lifetime  Variables and storage  Simple vs composite variables  Lifetime: global, local, heap variables  Pointers  Commands 
University of Washington Today Finished up virtual memory On to memory allocation Lab 3 grades up HW 4 up later today. Lab 5 out (this afternoon): time.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 9.
Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.
1 Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
COMP3190: Principle of Programming Languages
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap Paper by: Chris Lattner and Vikram Adve University of.
Buffer Overflow Attack Proofing of Code Binary Gopal Gupta, Parag Doshi, R. Reghuramalingam, Doug Harris The University of Texas at Dallas.
Pointers in C Computer Organization I 1 August 2009 © McQuain, Feng & Ribbens Memory and Addresses Memory is just a sequence of byte-sized.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Protecting C Programs from Attacks via Invalid Pointer Dereferences Suan Hsi Yong, Susan Horwitz University of Wisconsin – Madison.
Data Flow Analysis for Software Prefetching Linked Data Structures in Java Brendon Cahoon Dept. of Computer Science University of Massachusetts Amherst,
LECTURE 13 Names, Scopes, and Bindings: Memory Management Schemes.
Automatic Pool Allocation for better Memory System Performance Presented by: Chris Lattner Joint work with: Vikram Adve
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
Object Lifetime and Pointers
CS 326 Programming Languages, Concepts and Implementation
Seminar in automatic tools for analyzing programs with dynamic memory
Automatic Pool Allocation
Topic 3-b Run-Time Environment
Binding Times Binding is an association between two things Examples:
Programming Languages
CETS: Compiler-Enforced Temporal Safety for C
Presentation transcript:

Automatic Pool Allocation: Compile-Time Control Over Complete Pointer-Based Data Structures Vikram Adve University of Illinois at Urbana-Champaign Joint work with: Chris Lattner, Dinakar Dhurjati, Sumant Kowshik Thanks: NSF (CAREER, Embedded02, NGS00, NGS99, OSC99), Marco/DARPA

Why Does Data Layout Matter? … and complex heap-based data structures are ubiquitous. Performance Working sets Spatial locality Temporal locality Heap allocation overheads Security Buffer overruns Dangling pointers Uninitialized pointers S/w Reliability Dangling pointers Checkpointing Static bug detection Static data race detection

Compiling Pointer-Intensive Codes Today Current analyses and transformations focus on primitives  disambiguate individual loads and stores  optimize individual loads and stores  reorder, split, or merge individual data types Q. Can compilers manipulate entire logical data structures ? A list? A tree of linked lists? A hashtable? A graph?

List 1 Nodes What the program creates : List 2 Nodes Tree Nodes What the compiler SHOULD create and see : What the compiler sees :

Why Segregate Data Structures into Pools? Programs are designed around data structures Direct benefit of segregation: Better performance  Smaller working sets  Improved spatial locality  Sometimes convert irregular to regular strides Primary Goal: Better compiler information & control  Compiler knows where (sets of) data structures live in memory  Compiler knows order of data in memory (in some cases)  Compiler knows type information  runtime points-to graph  Compiler knows which pools point to which other pools  Compiler knows bounds on pool lifetimes

Outline  Automatic Pool Allocation [ LA:PLDI05] Using Pool Allocation to Improve Performance  Use 1: Improving heap locality, performance  Use 2: Transparent pointer compression [ LA:MSP05] Using Pool Allocation for Bug Detection, Security  Use 3: Detecting buffer overruns fast and transparently [ DA:ICSE06 ]  Use 4: Detecting all dangling pointer errors fast [ DA:Submitted ]  Use 5: SAFECode... SAFECode: A Safe Execution Environment for C/C++  Sound program analysis, memory safety for full C [ DKA:PLDI06 ]  Memory safety for “type-safe” C [ DKAL:TECS05 ]

Automatic Pool Allocation The transformation algorithm [Lattner and Adve, PLDI 2005] (Best Paper Award)

Current Manual Pool Allocation  Via library : By class (e.g., C++ STL), scope, or data structure  Via language support : By scope or data structure Pool Allocation: Current Approaches Never automated before 1.Imperative languages including C, C++, … 2.Pool allocation by logical data structures Goal is memory management, not layout control, not DS separation Compiler has no information about pool properties Automatic Region Inference for ML (Tofte & Birkedal, Aiken)  By lifetime only, e.g., stack of regions  Limited destructive updates

Pool Allocation: The Key Insight Partition heap objects according to the results of some pointer analysis. The pointer analysis representation we use is called a Data Structure Graph (DS Graph).

DS Graph Properties int G; void twoLists() { list *X = makeList(10); list *Y = makeList(100); addGToList(X); addGToList(Y); freeList(X); freeList(Y); } Object type {G,H,S,U} : Storage class list: HMRC list*int X list: HMRC list*int Y int: GMRC G Field-sensitive for “type-safe” nodes Each pointer field has a single outgoing edge These data structures have been proven (a) disjoint ; (b) confined within twoLists ()

DS Graph for Olden MST Benchmark Key Insight : “Fully context-sensitive” points-to graph identifies data structure instances “Fully context-sensitive”  Identify objects by full acyclic call paths

DS Graph for Olden EM3D Benchmark

Olden-Power Benchmark build_tree() t = malloc(…); t->l = build_lateral(…); build_lateral() l = malloc(…); l->next = build_lateral(…); l->b = build_branch(…); DS Graph for Olden Power Benchmark

Segregate memory according to points-to graph N graph nodes  1 pool (default: 1-to-1) Retain explicit free() for objects Automatic Pool Allocation Overview Pool 1Pool 2 Pool 1 Pool 2 Pool 3 Pool 4 Points-to graph (two disjoint linked lists)

Points-to Graph Assumptions Specific assumptions:  Separate points-to graph for each function  Unification-based graph  Can be used to compute escape info Use any points-to that satisfies the above Our implementation uses DSA [Lattner:PhD]  Infers C type info for many objects  Context-sensitive  Field-sensitive analysis  Results show that it is very fast: Linked List list: HMR list*int head DSA+pool allocation time < 3% of GCC -O3 for all tested programs.

list *makeList(int Num) { list *New = malloc(sizeof(list)); New->Next = Num ? makeList(Num-1) : 0; New->Data = Num; return New; } int twoLists( ) { list *X = makeList(10); list *Y = makeList(100); GL = Y; addGToList(X); addGToList(Y); freeList(X); freeList(Y); } Pool Allocation: Example Pool P1; poolinit(&P1); pooldestroy(&P1);, &P1), Pool* P) poolalloc(P);, P), &P1) P1, P2) Pool* P2), P2) P2 Change calls to free into calls to poolfree  retain explicit deallocation

Pool Allocation Algorithm Details Indirect Function Calls: call fp1 arg1 … argN fp1  { F1, F2 } call fp2 arg1 … argN fp2  { F2, F3 }  Must pass same pool arguments to F1, F2 and F3  Partition functions into equivalence classes: If F1, F2 have common call-site  same class  Merge points-to graphs for each equivalence class  Apply previous transformation unchanged Pools reachable from global variables  Such a pooldesc is a “runtime constant,” so make it global also  See paper for details [LA:PLDI05]

Two Further Refinements (1) Eliminating poolfree()  poolfree() “just before” pooldestroy() is redundant  This is effectively Static Garbage Collection ! DS = Create(P); ProcessData(DS); Free(DS, P);// redundant if... pooldestroy(&P); (2) Reducing Pool Lifetimes  Pools need not be created / destroyed at function boundaries  Intraprocedural flow analysis to create later, destroy earlier  Can be extended interprocedurally [Aiken et al., PLDI 96]

Pool Allocation Properties Strengths: Transparent: Fully automatic for any LLVM program Static Map: Every pointer var/field points to unique, known pool Pool Type Information: Many type-homogeneous pools Lifetimes: Lifetime of every pool is bounded Pool Points-to Graph: Compiler knows which pools contain pointers to every pool, and vice versa Limitations: 1. No deallocation: No automatic deallocation of items in pools 2. Unsafe: No guarantee of memory safety 3. Lifetimes: Pools reachable from global vars have global lifetime 4. Missing type info : Type-unsafe objects (DS nodes)

Use 1 of Pool Allocation Improving performance of heap-intensive codes [Lattner and Adve, PLDI 2005]

Simple Pool Allocation Statistics Programs from SPEC CINT2K, Ptrdist, FreeBench & Olden suites, plus unbundled programs DSA is able to infer that most static pools are type-homogenous 91 DSA+ Pool allocation compile time is small: less than 3% of GCC compile time for all tested programs. See paper for details

Pool Allocation Speedup Several programs unaffected by pool allocation 10-20% speedup across many pointer intensive programs Some programs (ft, chomp) order of magnitude faster Most programs are 0% to 20% faster with pool allocation alone Two are 10x faster, one is almost 2x faster

Cache/TLB miss reduction Sources:  Defragmented heap  Reduced inter-object padding  Segregating the heap! Miss rates measured with perfctr on AMD Athlon 2100+

Chomp Access Pattern with Malloc Allocates three object types (red, green, blue) Spends most time traversing green/red nodes Each traversal sweeps through all of memory Blue nodes are interspersed with green/red nodes

Chomp Access Pattern with PoolAlloc

FT Access Pattern With Malloc Heap segregation has a similar effect on FT:  See Lattner’s Ph.D. thesis for details

Different Data Structures Have Different Properties Pool allocation segregates heap:  Optimize using pool-specific properties Examples of properties we look for:  Pool is type-homogenous  Pool contains data that only requires 4-byte alignment  Opportunities to reduce allocation overhead buildtraversedestroy complex allocation pattern Pool Specific Optimizations list: HMR list*int head list: HMR list*int head list: HMR list*int head

Looking closely: Anatomy of a heap Fully general malloc-compatible allocator:  Supports malloc/free/realloc/memalign etc.  Standard malloc overheads: object header, alignment  Allocates slabs of memory with exponential growth  By default, all returned pointers are 8-byte aligned In memory, things look like (16 byte allocs): 16-byte user data 16-byte 16-byte One 32-byte Cache Line 4-byte object header 4-byte padding for user-data alignment

Pool-Specific Optimizations 1.Selective Pool Allocation  Don’t pool allocate when not profitable 2.PoolFree Elimination  poolfree redundant if followed by pooldestro y 3.“Bump-pointer” allocation if pool has no poolfree :  Eliminate per-object header  Eliminate freelist overhead (faster object allocation) 4.Type-safe pools infer a type for the pool:  Use 4-byte alignment for pools we know don’t need it

PAOpts (3/4): Bump Pointer Optzn If a pool has no poolfree’s:  Eliminate per-object header  Eliminate freelist overhead (faster object allocation) Eliminates 4 bytes of inter-object padding  Pack objects more densely in the cache Interacts with poolfree elimination (PAOpt 2/4)!  If poolfree elim deletes all frees, BumpPtr can apply 16-byte user data 16-byte One 32-byte Cache Line 16-byte user data 16-byte

PAOpts (4/4): Alignment Analysis Malloc must return 8-byte aligned memory:  It has no idea what types will be used in the memory  Some machines bus error, others suffer performance problems for unaligned memory Type-safe pools infer a type for the pool:  Use 4-byte alignment for pools we know don’t need it  Reduces inter-object padding 16-byte user data 16-byte 16-byte One 32-byte Cache Line 4-byte object header 16-byte user data

Pool Optimization Speedup (FullPA) Baseline 1.0 = Run Time with Pool Allocation Optimizations help all of these programs:  Despite being very simple, they make a big impact Most are 5-15% faster with optimizations than with Pool Alloc alone One is 44% faster, other is 29% faster Pool optzns effect can be additive with the pool allocation effect Pool optimizations help some progs that pool allocation itself doesn’t PA Time

Use 3 of Pool Allocation Detecting buffer overruns fast and transparently [Dhurjati and Adve, ICSE 2006, to appear]

Array Bounds Errors Most common reason for security attacks  Over 50% of attacks reported by CERT 1988: First exploited … 2006: Continues to get exploited Key problem : Tracking target object of each pointer is very expensive (without “fat pointers”)

Jones-Kelley: Transparent Bounds Checking p = malloc(n * sizeof(int)); … q =...; … r = q + i; (…, …) (p,n *4) (…, …) ref = lookup(q); Check(ref, r); lookup q (p, n*4) Idea : Register all array objects in a global splay tree; lookup on every pointer calculation Advantage : Backwards-compatible: no wrappers needed Problem : 4-5x slowdowns (up to 12x for Ruwase-Lam extension)

Separate search tree per pool p = malloc(n * sizeof(int)); … q =...; … r = q + i; ref = lookup(P1,q); Check(ref, r); (p, n*4) P1 (…, …) P2 3 Key Insights: 1.Splay tree for a pool should be (very) small. In fact, 2-element cache works great! 2.Pool for each pointer is known! 3.In type-homogeneous pools, can distinguish (and ignore) scalars.

Experimental Results Dramatic improvement in lookup overheads  Average overhead: 12% for Olden (34%, 69% for 2 cases)  < 4% for 2 system daemons Compares with 5x-6x for original Jones-Kelly. Up to 11x-12x for Ruwase-Lam extension (which we use). Effective in finding bugs  Zitser’s suite: models 14 buffer overruns in sendmail (7), wu-ftpd (4), bind (3)  All 14 detected successfully. Caveat: Like J-K, doesn’t work for casts from pointers to int and back

Use 5: SAFECode A Safe Compilation Strategy for C/C++ Programs Sound analysis [Dhurjati and Adve, PLDI 2006, to appear] Formal proof of soundness is in accompanying technical report [TR: UIUCDCS-R ]. Memory safety [Dhurjati et al., PLDI 2006, TECS 2005]

Safe Languages Provide Basic Guarantees 1.Prevent memory access violations 2.Detect errors during development 3.Enable sound compile-time analyses  e.g. in tools for safety checking, model checking, program verification e.g., Java, C#, Modula -3, ML Weakly typed languages like C, C++ do not provide any of these benefits Often ignored

Why care about C/C++? Huge body of essential legacy software Dominant in critical domains: OS kernels, embedded systems, daemons, language run-time systems. Example: Microsoft Longhorn (basis of Vista)?  Less than 25% in C# [Amitabh Srivastava, CGO 04 keynote address]  Mostly high level components, e.g., windowing system  Performance critical code still in C/C++ The features that make C/C++ popular for system software are the features that make C/C++ unsafe: Nested structs; stack-allocated objects; untagged unions; explicit free; custom allocators.

SolutionOverhead No memory violations Error checking Sound static analysis Purify, Valgrind Several 100x-some- SafeC 5x-some- Jones-Kelley 5-6x-some- SFI Over 2xy-- FisherPatil 2x-6xYY- Yong Over 2x-some- SAFECode 0-30%YsomeY CCured Upto 1.87xY someY Cyclone 1x-2xYsomey Modified C Pure C Current Solutions

SAFECode Compiler and Run-time System A typed assembly language (LLVM)  Language-independent  Simple, transparent runtime system Sound analysis and memory safety  Heap safety: via Automatic Pool Allocation + run-time checks  Stack safety: via Data Structure Analysis (DSA) + heap conversion  Array safety: via pool checks or precise array bounds checks Initially, for “type-safe” C, with restricted pointer casts [TECS 2005] Now, for nearly arbitrary, unmodified C programs [PLDI 2006]

Guaranteeing Static Analysis Many program verification tools build on alias analysis, call graph, assumed type information  E.g., SLAM, ESP, BLAST Memory errors can invalidate these analyses Detecting all memory errors is expensive  Dangling pointer errors  Precise array bounds errors Solution : Enforce key analyses in the presence of some memory errors: Alias analysis, call graph, type information.

What is Alias Analysis int P[4]; P[i] = …. struct List *Q = (Struct List *)P; Q->val = … TU S,A P Q field0 TK : Type Known, TU : Type Unknown struct List* head = makeList(20); struct List (TK) H next val head A static summary of memory objects and their connectivity

B Int S,A Memory errors invalidate alias analysis struct List tail, head; head.field1 = &tail; Tmp = (struct List*)B; Tmp->field6 =.. //could corrupt head.field1 int B[4]; &tail Struct List s Field1 &head Struct List s Field1 Tmp TU ?? head.field1 could point any where in memory pointer analysis incorrect head.field1 could corrupt memory of another TK node B Int S,A

Enforcing Alias Analysis Problem 1:  Must ensure that tmp points to an object in this points-to set With normal allocation:  Objects are scattered in memory  Checking set membership at run-time is extremely expensive Insight1: Automatic Pool Allocation partitions heap corresponding to nodes in the graph. These partitions are compact and can be checked efficiently! struct List (TK) H next val tmp Caveat: Currently only flow-insensitive, unification based TU S,A field0

Enforcing Alias Analysis Problem 2:  Checking every pointer access or initialization is still very expensive Insight 2: Ignoring memory errors, any pointer obtained from TK pool already has correct aliasing behavior. Pointers obtained from other pools will be explicitly checked: Poolcheck(PP, p, align): Mask lower k bits of p, look in hash table of page addresses in PP Alignment check if array references in TK pool

Tolerating Dangling Pointers Problem 3:  But memory errors (dangling pointer errors, array bounds violations) could corrupt locations in TK pools Insight 3 (also used for “type-safe” C w/o GC): Reallocating a freed block to a new request of the same type cannot cause any type violation or (in the same pool) aliasing violation, despite dangling pointers. Only array references in TK pools must be checked (can optimize): Poolcheck(PP, p, align).

Evaluation of Run-time Overhead Programs: Olden, Ptrdist, 3 system daemons No source changes necessary Compared Olden with Ccured. ProgramSAFECode ratio CCured ratio bh bisort em3d treeadd tsp yacr ftpd fingerd Max

Summary

What Could You Do With Pool Allocation? Embedded Systems  Pointer compression, data compression for embedded codes  Data partitioning for explicit local memories / buffers / tiles  Power savings for dead / dormant pools Dependable Systems  Efficient checkpointing by ignoring unmodified pools  Efficient replicated execution for servers  Focusing instrumentation for program testing High Performance Systems  Data-structure-centric profiling  Linked pointer prefetching  …

Summary Automatic Pool Allocation Gives compilers information about data structure layouts, lifetimes, points-to information SAFECode A sound execution strategy for C, C++ programs: enable sound analysis, enforce memory safety. llvm.cs.uiuc.edu