On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,

Slides:



Advertisements
Similar presentations
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Advertisements

Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Incorporating Generations into a Modern Reference Counting Garbage Collector Hezi Azatchi Advisor: Erez Petrank.
CMSC 330: Organization of Programming Languages Memory and Garbage Collection.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection What is garbage and how can we deal with it?
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
5. Memory Management From: Chapter 5, Modern Compiler Design, by Dick Grunt et al.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
CPSC 388 – Compiler Design and Construction
CS 536 Spring Automatic Memory Management Lecture 24.
An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures, Schorr and Waite CACM August 1967, pp Curtis Dunham.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Garbage collection (& Midterm Topics) David Walker COS 320.
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
Compiler Optimizations for Nondeferred Reference-Counting Garbage Collection Pramod G. Joisha Microsoft Research, Redmond.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Taking Off The Gloves With Reference Counting Immix
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
OOPLs /FEN March 2004 Object-Oriented Languages1 Object-Oriented Languages - Design and Implementation Java: Behind the Scenes Finn E. Nordbjerg,
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
An Efficient, Incremental, Automatic Garbage Collector P. Deutsch and D. Bobrow Ivan JibajaCS 395T.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Garbage Collection What is garbage and how can we deal with it?
Java 9: The Quest for Very Large Heaps
Concepts of programming languages
Automatic Memory Management
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
Ulterior Reference Counting Fast GC Without The Wait
Smart Pointers.
Strategies for automatic memory management
Memory Management Kathryn McKinley.
Automating Memory Management
Reference Counting.
Garbage Collection What is garbage and how can we deal with it?
Reference Counting vs. Tracing
Presentation transcript:

On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi, and Harel Paz

Erez PetrankGC via Sliding Views2 Garbage Collection User allocates space dynamically, the garbage collector automatically frees the space when it “no longer needed”. Usually “no longer needed” = unreachable by a path of pointers from program local references (roots). Programmer does not have to decide when to free an object. (No memory leaks, no dereferencing of freed objects.) Built into Java, C#.

Erez PetrankGC via Sliding Views3 Garbage Collection Two Classic Approaches Reference counting [Collins 1960]: keep a reference count for each object, reclaim objects with count 0. Tracing [McCarthy 1960]: trace reachable objects, reclaim objects not traced. Traditional Wisdom Good Problematic

Erez PetrankGC via Sliding Views4 What (was) Bad about RC ? Does not reclaim cycles A heavy overhead on pointer modifications. Traditional belief: “Cannot be used efficiently with parallel processing” A B

Erez PetrankGC via Sliding Views5 What’s Good about RC ? Reference Counting work is proportional to work on creations and modifications. Can tracing deal with tomorrow’s huge heaps? Reference counting has good locality. The Challenge: RC overhead on pointer modification seems too expensive. RC seems impossible to “parallelize”.

Erez PetrankGC via Sliding Views6 Garbage Collection Today Today’s advanced environments: multiprocessors + large memories Dealing with multiprocessors Single-threaded stop the world

Erez PetrankGC via Sliding Views7 Garbage Collection Today Today’s advanced environments: multiprocessors + large memories Dealing with multiprocessors Concurrent collectionParallel collection

Erez PetrankGC via Sliding Views8 Terminology (stop the world, parallel, concurrent, …) Stop-the-World Parallel (STW) Concurrent On-the-Fly program GC

Erez PetrankGC via Sliding Views9 Benefits & Costs Informal Pause times 200ms 2ms 20ms Throughput Loss: 10-20% Stop-the-World Parallel (STW) Concurrent On-the-Fly program GC

Erez PetrankGC via Sliding Views10 This Talk Introduction: RC and Tracing, Coping with SMP’s.  RC introduction and parallelization problem.  Main focus: a novel concurrent reference counting algorithm (suitable for Java).  Concurrent made on-the-fly based on “sliding views” Extensions: cycle collection, mark and sweep, generations, age- oriented. Implementation and measurements on Jikes. Extremely short pauses, good throughput.

Erez PetrankGC via Sliding Views11 Basic Reference Counting Each object has an RC field, new objects get o.rc:=1. When p that points to o 1 is modified to point to o 2 execute: o 2.rc++, o 1.rc--. if then o 1.rc==0: Delete o 1. Decrement o.rc for all children of o 1. Recursively delete objects whose rc is decremented to 0. o1o1 o2o2 p

Erez PetrankGC via Sliding Views12 An Important Term: A write barrier is a piece of code executed with each pointer update. “p  o2 ” implies: Read p; (see o1) p  o2; o2.rc++; o1.rc- -; o1o1 o2o2 p

Erez PetrankGC via Sliding Views13 Deferred Reference Counting Problem: overhead on updating program variables (locals) is too high. Solution [Deutch & Bobrow 76] : Don’t update rc for local variables (roots). “Once in a while”: collect all objects with o.rc=0 that are not referenced from local variables. Deferred RC reduces overhead by 80%. Used in most modern RC systems. Still, “heap” write barrier is too costly.

Multithreaded RC? Traditional wisdom: write barrier must be synchronized !

Multithreaded RC? Problem 1: ref-counts updates must be atomic Fortunately, this can be easily solved : Each thread logs required updates in a local buffer and the collector applies all the updates during GC (as a single thread).

Multithreaded RC? Problem 1: ref-counts updates must be atomic A BDC Thread 2: Read A.next; (see B) A.next  D; B.rc- -; D.rc++ Thread 1: Read A.next; (see B) A.next  C; B.rc- -; C.rc++ Problem 2: parallel updates confuse counters:

Erez PetrankGC via Sliding Views17 Known Multithreaded RC [DeTreville 1990, Bacon et al 2001]: Cmp & swp for each pointer modification. Thread records its updates in a buffer.

Erez PetrankGC via Sliding Views18 To Summarize Problems… Write barrier overhead is high. Even with deferred RC. Using RC with multithreading seems to bear high synchronization cost. Lock or “compare & swap” with each pointer update.

Reducing RC Overhead: We start by looking at the “parent’s point of view”. We are counting rc for the child, but rc changes when a parent’s pointer is modified. Parent Child

An Observation Consider a pointer p that takes the following values between GC’s: O 0,O 1, O 2, …, O n. All RC algorithms perform 2n operations: O 0.rc--; O 1.rc++; O 1.rc--; O 2.rc++; O 2.rc--; … ; O n.rc++; But only two operations are needed: O 0.rc-- and O n.rc++ p O1O1 O2O2 O3O3 OnOn..... O4O4 O0O0

Use of Observation Time Only the first modification of each pointer is logged. Garbage Collection P  O 1 ; (record p’s previous value O 0 ) P  O 2 ; (do nothing) … P  O n ; (do nothing) Garbage Collection: For each modified slot p: Read p to get O n, read records to get O 0. Read p to get O n, read records to get O 0. O 0.rc--, O n.rc++ O 0.rc--, O n.rc++

Some Technical Remarks When a pointer is first modified, it is marked “dirty” and its previous value is logged. We actually log each object that gets modified (and not just a single pointer). Reason 1: we don’t want a dirty bit per pointer. Reason 2: object’s pointers tend to be modified together. Only non-null pointer fields are logged. New objects are “born dirty”.

Effects of Optimization RC work significantly reduced: The number of logging & counter updates is reduced by a factor of for typical Java benchmarks !

Elimination of RC Updates BenchmarkNo of stores No of “first” stores Ratio of “first” stores jbb71,011,357264,1151/269 Compress64,905511/1273 Db33,124,78030,6961/1079 Jack135,174,7751,5461/87435 Javac22,042,028535,2961/41 Jess26,258,10727,3331/961 Mpegaudio5,517,795511/108192

Effects of Optimization RC work significantly reduced: The number of logging & counter updates is reduced by a factor of for typical Java benchmarks ! Write barrier overhead dramatically reduced. The vast majority of the write barriers run a single “if”. Last but not least: the task has changed ! We need to record the first update.

Erez PetrankGC via Sliding Views26 Reducing Synch. Overhead Our second contribution: A carefully designed write barrier (and an observation) does not require any sync. operation.

The write barrier Update(Object **slot, Object *new){ Object *old = *slot if (!IsDirty(slot)) { log( slot, old ) SetDirty(slot) } *slot = new } Observation: If two threads: 1.invoke the write barrier in parallel, and 2.both log an old value, then both record the same old value.

Running Write Barrier Concurrently Thread 1: Update(Object **slot, Object *new){ Object *old = *slot if (!IsDirty(slot)) { /* if we got here, Thread 2 has */ /* yet set the dirty bit, thus, has */ /* not yet modified the slot. */ log( slot, old ) SetDirty(slot) } *slot = new } Thread 2: Update(Object **slot, Object *new){ Object *old = *slot if (!IsDirty(slot)) { /* if we got here, Thread 1 has */ /* yet set the dirty bit, thus, has */ /* not yet modified the slot. */ log( slot, old ) SetDirty(slot) } *slot = new }

Concurrent Algorithm: Use write barrier with program threads. To collect: Stop all threads Scan roots (local variables) get the buffers with modified slots Clear all dirty bits. Resume threads For each modified slot: decrement rc for old value (written in buffer), increment rc for current value (“read heap”), Reclaim non-local objects with rc 0.

Timeline Stop threads. Scan roots; Get buffers; erase dirty bits; Resume threads. Decrement values in read buffers; Increment “current” values; Collect dead objects

Timeline Stop threads. Scan roots; Get buffers; erase dirty bits; Resume threads. Decrement values in read buffers; Increment “current” values; Collect dead objects Unmodified current values are in the heap. Modified are in new buffers.

Concurrent Algorithm: Use write barrier with program threads. To collect: Stop all threads Scan roots (local variables) get the buffers with modified slots Clear all dirty bits. Resume threads For each modified slot: decrease rc for old value (written in buffer), increase rc for current value (“read heap”), Reclaim non-local objects with rc 0. Goal 2: stop one thread at a time Goal 1: clear dirty bits during program run.

Erez PetrankGC via Sliding Views33 The Sliding Views “Framework” Develop a concurrent algorithm There is a short time in which all the threads are stopped simultaneously to perform some task. Avoid stopping the threads together. Instead, stop one thread at a time. Tricky part: “fix” the problems created by this modification. Idea borrowed from the Distributed Computing community [Lamport].

Erez PetrankGC via Sliding Views34 Graphically A Snapshot A Sliding View time Heap Addr. Heap Addr. tt1t2

Erez PetrankGC via Sliding Views35 Fixing Correctness The way to do this in our algorithm is to use snooping: While collecting the roots, record objects that get a new pointer. Do not reclaim these objects. No details…

Erez PetrankGC via Sliding Views36 Cycles Collection Our initial solution: use a tracing algorithm infrequently. More about this tracing collector and about cycle collectors later…

Erez PetrankGC via Sliding Views37 Performance Measurements Implementation for Java on the Jikes Research JVM Compared collectors: Jikes parallel stop-the-world (STW) Jikes concurrent RC (Jikes concurrent) Benchmarks: SPECjbb2000: a server benchmark --- simulates business-like transactions. SPECjvm98: a client benchmarks --- a suite of mostly single-threaded benchmarks

Erez PetrankGC via Sliding Views38 Pause Times vs. STW

Erez PetrankGC via Sliding Views39 Pause Times vs. Jikes Concurrent

Erez PetrankGC via Sliding Views40 SPECjbb2000 Throughput

Erez PetrankGC via Sliding Views41 SPECjvm98 Throughput

Erez PetrankGC via Sliding Views42 SPECjbb2000 Throughput

Erez PetrankGC via Sliding Views43 A Glimpse into Subsequent Work: SPECjbb2000 Throughput

Erez PetrankGC via Sliding Views44 Subsequent Work Cycle Collection [CC’05]) Cycle Collection A Mark and Sweep Collector [OOPSLA’03] A Mark and Sweep Collector A Generational Collector [CC’03] A Generational Collector An Age-Oriented Collector [CC’05] An Age-Oriented Collector

Erez PetrankGC via Sliding Views45 Related Work It’s not clear where to start… RC, concurrent, generational, etc… Some more relevant work was mentioned.

Erez PetrankGC via Sliding Views46 Conclusions A Study of Concurrent Garbage Collection with a Focus on RC. Novel techniques obtaining short pauses, high efficiency. The best approach: age-oriented collection with concurrent RC for old and concurrent tracing for young. Implementation and measurements on Jikes demonstrate non-obtrusiveness and high efficiency.

Erez PetrankGC via Sliding Views47 Project Building Blocks A novel reference counting algorithm. State-of-the-art cycle collection. Generational RC (for old) and tracing (for young) A concurrent tracing collector. An age-oriented collector: fitting generations with concurrent collectors.