Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Performance of Cache Memory
CMSC 330: Organization of Programming Languages Memory and Garbage Collection.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
CMSC 330: Organization of Programming Languages Memory and Garbage Collection.
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
Resurrector: A Tunable Object Lifetime Profiling Technique Guoqing Xu University of California, Irvine OOPSLA’13 Conference Talk 1.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Chapter 10 Implementing Subprograms. Copyright © 2012 Addison- Wesley. All rights reserved. 1-2 Chapter 10 Topics The General Semantics of Calls and Returns.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
1 Storage Registers vs. memory Access to registers is much faster than access to memory Goal: store as much data as possible in registers Limitations/considerations:
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by: Khanh Nguyen.
1 1 Lecture 4 Structure – Array, Records and Alignment Memory- How to allocate memory to speed up operation Structure – Array, Records and Alignment Memory-
CS 536 Spring Run-time organization Lecture 19.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Memory Subsystem Performance of Programs using Coping Garbage Collection Authers: Amer Diwan David Traditi Eliot Moss Presented by: Ronen Shabo.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Run time vs. Compile time
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:
Garbage collection (& Midterm Topics) David Walker COS 320.
Run-time Environment and Program Organization
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
7/13/20151 Topic 3: Run-Time Environment Memory Model Activation Record Call Convention Storage Allocation Runtime Stack and Heap Garbage Collection.
Using Generational Garbage Collection To Implement Cache- conscious Data Placement Trishul M. Chilimbi & James R. Larus מציג : ראובן ביק.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.
ISBN Chapter 10 Implementing Subprograms.
Chapter 7 Run-Time Environments
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
File I/O Applied Component-Based Software Engineering File I/O CSE 668 / ECE 668 Prof. Roger Crawfis.
A Real-Time Garbage Collector Based on the Lifetimes of Objects Henry Lieberman and Carl Hewitt (CACM, June 1983) Rudy Kaplan Depena CS395T: Memory Management.
Chapter 10 Implementing Subprograms. Copyright © 2012 Addison-Wesley. All rights reserved.1-2 Chapter 10 Topics The General Semantics of Calls and Returns.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
Functions and Procedures. Function or Procedure u A separate piece of code u Possibly separately compiled u Located at some address in the memory used.
Chapter 4. INTERNAL REPRESENTATION OF FILES
Copyright © 2005 Elsevier Chapter 8 :: Subroutines and Control Abstraction Programming Language Pragmatics Michael L. Scott.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Runtime Environments. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
Activation Records (in Tiger) CS 471 October 24, 2007.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 9.
Processes CS 6560: Operating Systems Design. 2 Von Neuman Model Both text (program) and data reside in memory Execution cycle Fetch instruction Decode.
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
CMSC 202 Advanced Section Classes and Objects: Object Creation and Constructors.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Chapter 10 Implementing Subprograms. Copyright © 2012 Addison-Wesley. All rights reserved.1-2 Chapter 10 Topics The General Semantics of Calls and Returns.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
Simple Generational GC Andrew W. Appel (Practice and Experience, February 1989) Rudy Kaplan Depena CS 395T: Memory Management February 9, 2009.
Runtime Environments Chapter 7. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
Rifat Shahriyar Stephen M. Blackburn Australian National University
Run-time organization
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Strategies for automatic memory management
Binding Times Binding is an association between two things Examples:
Presentation transcript:

Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch

The problem Some data die young, and some data die old. In recursions, most deep stack unwind very infrequently. Scanning unchanged roots may take a dominant time.

We compare the following types Semispace stack collection (Cheney). Generational collector. General Collection with stack marker. Pretenuring with Stack marker.

Semispace copy collection Scanning the Stack for roots, and copy data that reachable from the roots to unused areas (Nursery, Survive). Disadvantage: –all data is copied, when some data die young, and some die old.

Generational collection Base on semispace copy collection. Arrange some heap areas according to the objects life time. Disadvantage: –For programs with deep call chain, The stack scanning can take a lot of time. –Long time object are typically copied several times before they are tenured.

General stack collection Use stack marker in order to cache the root scan. Disadvantage: –Long time object are typically copied several times before they are tenured

Pretenuring Making a run, in order to build profiles for each object life time according to it’s allocation site.

TIL Compiler Optimization compiler for ML (SML). Intentional polymorphism. Nearly Tag free garbage collection. Conventional functional language optimization. Loop Optimization.

Stack Scanning At any execution point, data is live if it is accessed as the program continue to execute. The collector need to retain data that is accessible by following the all pointers roots. The roots are registers and stack slots.

Difficulties Accurate determine the root set. In callee-save registers, the content of a register or stack slot can come from caller frames so stack frames cannot be decoded in isolation. In Polymorphism the compiler cannot statically compute whether a value is a pointer of not.

Finding the root When the GC is called from mutator, the return address indicate the current execution point (Return Address). By the RA (Using a table), we can determine the frame layout of the GC - caller frame. By continuing this way, we can find the root.

Finding the roots Determine the roots set from the initial frame, By scanning downwards. The two ways scanning is needed since there are stack slots that their type depend on the previous stack slot.

Trace table information The Return address (RA). Stack frame size. For each stack-slot we record its trace: –Pointer: The compiler statically determine that it’s a pointer. –Non Pointer - The value is not a root. –Calee-save + (Register) - Calle-save information.

Trace table information - 2 –Compute: Compiler couldn’t statically determine the pointer status of a value. Have an additional information to determine where the type of such value reside.

Stack frames and the corresponding table entry. RA=0x2001c Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot INT Stack Frame RA=0x2001c718 Frame size = 6 Non Pointer Pointer Compute: Stack 4 Entry 1 Entry 2 Entry 3 Entry 4 Entry 5 Entry 6 Entry 7 Compute: Calle $10 …Trace info on Register Table Entry

Semispace against Generations collections

SemiSpace against Generations collections

Semispace against Generations collections

Stack marking When the stack is deep, scanning the root may take a dominant time of the GC time. Most of the stack usually doesn’t change from the previous GC, to the current GC. Marking the stack frames that didn’t changed, can significant improve the roots scanning.

Marking the stack - 1st method On each stack frame, add a flag whether it was changed. The collector reset this flag when passing it, while the mutator set this flag. Disadvantage: –The mutator is involved in the GC process. –The compiler need to do several operations for the GC, on each return, while most time the GC is not used.

Marking the stack - 2nd method When scanning the roots, set the RA of every n stack frame to a special stub function. The stub function hold a table of the RA. The stub function notes that this frame was deactivate, and continue to the original RA.

Marking the stack - Method 2 The Problems with this method: –Functions doesn’t always return normally. –When exception is raised, It’s invoked in stack order until there is a matching handler. –Fortunately, we can hold a value of M that updated on exceptions that is contains the shallowest stack pointer that occurred as a result of raised exception.

Stack Marker improvement

Pretenuring Using profile data to predict the survival rate of an object. We speculate that object allocated from the same place in program would have to be similar lifetime. In order to check this hypothesis we divide the program to some heap allocations site.

Pretenuring - 2 The compiler is modified in order to update a table of allocation sites when creating. During garbage collection the entries are updated. We scan allocation area after each collection to located death object and update their allocation site.

Pretenuring - 3 Using this information we can create statistics about the number, size and average age of object created from each allocation site. We include only allocation sites that included at least 1% of the allocations, or 1% of the copied data.

The profile results

The results According to the results we can see that 90% of the allocation have very short life time, but % of the copied date are generated from 4 sites.

Using the profile data Object that created from allocated site that have long life time, directly created into the older generation. Problem: An object directly allocated in the older generation may have a reference to an object in the younger generation.

Solutions ? Allocating that type of object in the young generation. –May lead to a lot more copying. Remember the area of the older generation that have reference to the young reference, and scan it on each minor generation. –Scanning without copying doesn’t take a lot of time.

Improvement of pretenuring (ms)

Improvement of pretenuring (bytes copy)

Comparing between all the methods

Conclusion for pretenuring The reduction of GC time is smaller that excepted from the reduction of data copied. Since we have to check the younger generations, the cost of GC time is still proportional to the live data (With a smaller constant).

Suggestion to improve the speed Creating a control-flow and data-flow analysis on objects.

Conclusions Generational collector is twice faster on GC time. And also improve the GC time, since it’s improve the cache locality. For programs that use deep stack, caching the roots data can improve GC time up to 74%. Profiling the heap can improve the speed for some cases by 50%.

The End