Automatic Pool Allocation for Disjoint Data Structures Presented by: Chris Lattner Joint work with: Vikram Adve ACM.

Slides:



Advertisements
Similar presentations
A Framework for describing recursive data structures Kenneth Roe Scott Smith.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Program Representations. Representing programs Goals.
Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap June 13, 2005 PLDI Chris.
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
Introduction to Advanced Topics Chapter 1 Mooly Sagiv Schrierber
SAFECode Memory Safety Without Runtime Checks or Garbage Collection By Dinakar Dhurjati Joint work with Sumant Kowshik, Vikram Adve and Chris Lattner University.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
SAFECode SAFECode: Enforcing Alias Analysis for Weakly Typed Languages Dinakar Dhurjati University of Illinois at Urbana-Champaign Joint work with Sumant.
CS 61C L07 More Memory Management (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
Code Generation Mooly Sagiv html:// Chapter 4.
TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Previous finals up on the web page use them as practice problems look at them early.
Run time vs. Compile time
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
May 9, 2001OSQ Retreat 1 Run-Time Type Checking for Pointers and Arrays in C Wes Weimer, George Necula Scott McPeak, S.P. Rahul, Raymond To.
Procedure Optimizations and Interprocedural Analysis Chapter 15, 19 Mooly Sagiv.
Secure Virtual Architecture John Criswell, Arushi Aggarwal, Andrew Lenharth, Dinakar Dhurjati, and Vikram Adve University of Illinois at Urbana-Champaign.
Making Object-Based STM Practical in Unmanaged Environments Torvald Riegel and Diogo Becker de Brum ( Dresden University of Technology, Germany)
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
EE4E. C++ Programming Lecture 1 From C to C++. Contents Introduction Introduction Variables Variables Pointers and references Pointers and references.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
Chapter 0.2 – Pointers and Memory. Type Specifiers  const  may be initialised but not used in any subsequent assignment  common and useful  volatile.
Architecture for a Next-Generation GCC Chris Lattner Vikram Adve The First Annual GCC Developers'
Static Program Analyses of DSP Software Systems Ramakrishnan Venkitaraman and Gopal Gupta.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 9.
1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.
Mark Marron IMDEA-Software (Madrid, Spain) 1.
COMP3190: Principle of Programming Languages
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
Automatic Pool Allocation: Improving Performance by Controlling Data Structure Layout in the Heap Paper by: Chris Lattner and Vikram Adve University of.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Transparent Pointer Compression for Linked Data Structures June 12, 2005 MSP Chris Lattner Vikram Adve.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Data Flow Analysis for Software Prefetching Linked Data Structures in Java Brendon Cahoon Dept. of Computer Science University of Massachusetts Amherst,
Escape Analysis for Java Will von Rosenberg Noah Wallace.
Mark Marron 1, Deepak Kapur 2, Manuel Hermenegildo 1 1 Imdea-Software (Spain) 2 University of New Mexico 1.
Automatic Pool Allocation for better Memory System Performance Presented by: Chris Lattner Joint work with: Vikram Adve
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
CS 326 Programming Languages, Concepts and Implementation
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 – More Memory Management Lecturer PSOE Dan Garcia
Introduction to Advanced Topics Chapter 1 Text Book: Advanced compiler Design implementation By Steven S Muchnick (Elsevier)
Compositional Pointer and Escape Analysis for Java Programs
Seminar in automatic tools for analyzing programs with dynamic memory
Dynamic Memory Allocation
CS 153: Concepts of Compiler Design November 28 Class Meeting
Automatic Pool Allocation
Interprocedural Analysis Chapter 19
For Example: User level quicksort program Three address code.
Closure Representations in Higher-Order Programming Languages
자바 언어를 위한 정적 분석 (Static Analyses for Java) ‘99 한국정보과학회 가을학술발표회 튜토리얼
point when a program element is bound to a characteristic or property
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 – More Memory Management Lecturer PSOE Dan Garcia
CMPE 152: Compiler Design May 2 Class Meeting
Presentation transcript:

Automatic Pool Allocation for Disjoint Data Structures Presented by: Chris Lattner Joint work with: Vikram Adve ACM SIGPLAN Workshop on Memory System Performance (MSP 2002) June 16,

Slide #2 The Problem Memory system performance is important! –Fast CPU, slow memory, not enough cache “Data structures” are bad for compilers –Traditional scalar optimizations are not enough –Memory traffic is main bottleneck for many apps Fine grain approaches have limited gains: –Prefetching recursive structures is hard –Transforming individual nodes give limited gains

Slide #3 Our Approach Fully Automatic Pool Allocation Disjoint Logical Data Structure Analysis –Identify data structures used by program Automatic Pool Allocation –Converts data structures into a form that is easily analyzable High-Level Data Structure Optimizations!  Analyze and transform entire data structures –Use a macroscopic approach for biggest gains –Handle arbitrarily complex data structures lists, trees, hash tables, ASTs, etc…

Slide #4 Talk Overview ›Problems, approach ›Data Structure Analysis ›Fully Automatic Pool Allocation ›Potential Applications of Pool Allocation

Slide #5 LLVM Infrastructure Strategy for Link-Time/Run-Time Optimization Low Level Representation with High Level Types Code retained in LLVM form until final link C, C++ Java Fortran C, C++ Java Fortran Linker IP Optimizer Codegen Linker IP Optimizer Codegen LLVM or Machine code Machine code Static Compiler 1 LLVM Runtime Optimizer Runtime Optimizer Static Compiler N Libraries

Slide #6 Logical Data Structure Analysis Identify disjoint logical data structures –Entire lists, trees, heaps, graphs, hash tables... Capture data structure graph concisely Context sensitive, flow insensitive analysis –Related to heap shape analysis, pointer analysis –Very fast: Only one visit per call site

Slide #7 Data Structure Graph Each node represents a memory object –malloc(), alloca(), and globals –Each node contains a set of fields Edges represent “may point to” set –Edges point from fields, to fields Scalar nodes: (lighter boxes) –Track points-to for scalar pointers –We completely ignore non-pointer scalars reg107 new lateral new branch new leaf new root

Slide #8 Analysis Overview Intraprocedural Analysis (separable) –Initial pass over function Creates nodes in the graph –Worklist processing phase Add edges to the graph Interprocedural Analysis –Resolve “call” nodes to a cloned copy of the invoked function graphs

Slide #9 Intraprocedural Analysis data nlist list b shadow List nextdata new List nextdata shadow Patient struct List { Patient *data; List *next } shadow List nextdata list b shadow List nextdatanext nlist list b new List nextdata List *list void addList(List *list, Patient *data Patient *data){ List *b = NULL, *nlist; while (list ≠ NULL) { b = list; list = list  next; } malloc(List) nlist = malloc(List); nlist  data = data; nlist  next = NULL; b  next = nlist; }

Slide #10 Interprocedural Closure new Patient L1 tmp1 new List nextdata new Patient new List nextdata call data list fn L2 tmp2 call data list fn fn addList new List nextdata shad Patient call data list fn call data list fn list shad Patient call data list fn new List nextdata new Patient L2 tmp2 data call data list fn new List nextdata new Patient L2 tmp2 new List nextdata new Patient L2 tmp2 L1 tmp1 new Patient new List nextdata call data list fn fn addList call data list fn L1 tmp1 new Patient new List nextdata addListList *list void addList(List *list, Patient *data Patient *data); void ProcessLists(int N) { calloc(List) List *L1 = calloc(List); calloc(List) List *L2 = calloc(List); /* populate lists */ for (int i=0; i≠N; ++i) { malloc(Patient) tmp1 = malloc(Patient); addList addList(L1, tmp1); malloc(Patient) tmp2 = malloc(Patient); addList addList(L2, tmp2); }

Slide #11 Important Analysis Properties Intraprocedural Algorithm –Only executed once per function –Flow insensitive Interprocedural –Only one visit per call site –Resolve calls from bottom up –Inlines a copy of the called function’s graph Overall –Efficient algorithm to identify disjoint data structures –Graphs are very compact in practice

Slide #12 Talk Overview ›Problems, approach ›Data Structure Analysis ›Fully Automatic Pool Allocation ›Potential Applications of Pool Allocation

Slide #13 Automatic Pool Allocation Pool allocation is often applied manually –… but never fully automatically … for imperative programs which use malloc & free We use a data structure driven approach Pool allocation accuracy is important –Accurate pool allocation enables aggressive transformations –Heuristic based approaches are not sufficient

Slide #14 Pool Allocation Strategy We have already identified logical DS’s –Allocate each node to a different pool –Disjoint data structures uses distinct pools Pool allocate a data structure when safe to: –All nodes of data structure subgraph are allocations –Can identify function F, whose lifetime contains DS Escape analysis for the entire data structure Pool allocate data structure into F!

Slide #15 Pool Allocation Transformation L1tmp new List nextdata new Patient void ProcessLists(unsigned N) { List *L1 = malloc(List); for (unsigned i=0;i≠N;++i) { tmp = malloc(Patient); addList(L1, tmp); }  L1 is contained by ProcessLists! PoolDescriptor_t L1Pool, PPool; Allocate pool descriptors Initialize memory pools poolinit(&L1Pool, sizeof(List)); poolinit(&PPool, sizeof(Patient)); Destroy pools on exit pooldestroy(&PPool); pooldestroy(&L1Pool); pa_addList(L1, tmp, &L1Pool); Transform called function tmp = poolalloc(&PPool); Transform function body List = poolalloc(&L1Pool);

Slide #16 Pool Allocation Properties Each node gets separate pool –Each pool has homogenous objects –Good for locality and analysis of pool Related Pool Desc’s are linked –“Isomorphic” to data structure graph Actually contains a superset of edges Disjoint Data Structures –Each has a separate set of pools –e.g. two disjoint lists in two distinct pools P1 P2 P3 P4 reg107 new lateral new branch new leaf new root

Slide #17 Preliminary Results Pool allocation for most Olden Benchmarks –Most only build a single large data structure  Analysis failure for some benchmarks –Not type-safe: e.g. “msp” uses void* hash table –Work in progress to enhance LLVM type system

Slide #18 Talk Overview ›Problems, approach ›Data Structure Analysis ›Fully Automatic Pool Allocation ›Potential Applications of Pool Allocation

Slide #19 Applications of Pool Allocation Pool allocation enables novel transformations Pointer Compression (briefly described next) New prefetching schemes: –Allocation order prefetching for free –History prefetching using compressed pointers More aggressive structure reordering, splitting, … Transparent garbage collection Critical feature: Accurate pool allocation provides important information at compile and runtime!

Slide #20 Pointer Compression Pointers are large and very sparse –Consume cache space & memory bandwidth How does pool allocation help? –Pool indices are denser than node pointers! Replace 64 bit pointer fields with 16 or 32 bit indices –Identifying all external pointers to the data structure –Find all data structure nodes at runtime If overflow detected at runtime, rewrite pool Grow indices as required: 16  32  64 bit

Slide #21 Contributions Disjoint logical data structure analysis Fully Automatic Pool Allocation  Macroscopic Data Structure Transformations