Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection What is garbage and how can we deal with it?
Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Rarely Copying Garbage Collection Yoshinori Kobayashi,Toshio Endo,Kenjiro Taura, Akinori Yonezawa University of Tokyo PLDI 2002 Student Research Forum.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Mark DURING Sweep rather than Mark then Sweep Presented by Ram Mantsour Authors: Chrisitan Queinnec, Barbara Beaudoing, Jean-Pierre Queille.
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002.
CS 536 Spring Automatic Memory Management Lecture 24.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
Garbage Collection Mooly Sagiv html://
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by: Khanh Nguyen.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
03/09/2007CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
CS533 - Concepts of Operating Systems Virtual Memory Primitives for User Programs Presentation by David Florey.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
An Implementation and Performance Evaluation of Language with Fine-Grain Thread Creation on Shared Memory Parallel Computer Yoshihiro Oyama, Kenjiro Taura,
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT Hans-Juergen Boehm Computer Science Dept. Rice University, Houston Mark Wieser Xerox Corporation, Palo.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
GARBAGE COLLECTION Student: Jack Chang. Introduction Manual memory management Memory bugs Automatic memory management We know... A program can only use.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Garbage Collection What is garbage and how can we deal with it?
Rifat Shahriyar Stephen M. Blackburn Australian National University
Automatic Memory Management
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Strategies for automatic memory management
Beltway: Getting Around Garbage Collection Gridlock
Chapter 12 Memory Management
Chapter 8 & 9 Main Memory and Virtual Memory
Garbage Collection What is garbage and how can we deal with it?
Mooly Sagiv html:// Garbage Collection Mooly Sagiv html://
Reference Counting vs. Tracing
Presentation transcript:

Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] Target: Multimedia, game etc. – Pauses should be <10ms Collection tasks are divided into small pieces Success: Pauses of <5ms [Cheng 01] – They assume compiler cooperation Reduction of pause for ‘ conservative ’ GCs is insufficient

Conservative GC [Boehm et al. 88] Mark sweep GC for C/C++ programs No compiler cooperation (e.g., write barriers) Mostly parallel GC [Boehm et al. 91] Incremental, conservative Pauses >100ms fairly common

Write barriers in conservative GCs No fine-grain write barrier by compiler VM ’ s write protection Coarse grain – Page level – Detect only first update after protection Restrict design

Incremental mark sweep algorithms Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – Make (conceptual) heap snapshot before marking – Promise short pause – Large space overhead with VM write barrier Incremental update [Steele 75] [Dijkstra 78 ] – Maintain consistency after marking Need final marking before finish Unlimitedly long! Only choice With VM

Contributions Analyze why previous algorithms fail Propose techniques to bound pauses & guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results – < 5ms in applications – < 12ms in the stress-test benchmark (constant across all heap sizes) (This talk omits parallel issues)

Overview of presentation Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary

Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental markUser write fault Remember dirty (=updated) pages addr. Unprotect Final marking Incremental sweepUser Trap handler End GC

Mostly parallel garbage collector (2) Second update is un-trapped – Mark r in final phase Need final marking writer p q r p qr p q

Final marking heap root 1. Scan all dirty pages + root 2. Mark all unmarked objects from scanned region The amount of work is unbounded # of dirty pages Objects reachable from a dirty page Makes pauses >100ms

Overview of presentation Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Goal of our collector Bound pause time (< constant) – Mutator utilization is important, but focus on pause Guarantee progress of collection Combine two techniques: Bound dirty pages (BD) Retry incremental marking (RI)

Bounding dirty pages (1) Basic collector produces many dirty pages Keep # of dirty pages < a given limit – If exceeds limit, choose a dirty page – Re-protect, scan, clean it – Good: Reduce task in final marking – Bad: More protection cost

Bounding dirty pages (2) Is pause now bounded? … No! Unmarked objects reachable from a dirty page are not bounded heap root

Retrying incremental marking (1) Start GC Write-protect heap Incremental markUser Trap handler Final marking Incremental sweepUser End GC Finished before limit? Yes. No. Retry! Keep works of final marking < a given limit

Retrying incremental marking (2) Good: Bound length of single final marking Bad: Risk of starvation (no progress) – Final marking may abort before finishing scanning (unbounded) dirty pages – Unmarked objects may ‘ escape ’ from collector

The worst case Abort a final marking with no progress Final aborts write Final aborts write Incr. finishes Incr. finishes

Ensuring bounded pause and progress Either is insufficient … Need two techniques: – Bounding dirty pages (BD) – Retrying incremental marking (RI) BD  Every final marking can scan all dirty pages  It finds some unmarked objects, if any

Overview of presentation Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

Experimental Environments 400MHz UltraSPARC, Solaris 8 Four GCs – Stop: Stop-the-world GC – Basic: Basic incremental GC – BD: Use bounding dirty pages – BD+R: Use bounding dirty pages + retrying incremental marking Basic/BD/BD+R: GC starts when heap usage > 75% BD/BD+R: # of dirty pages < 16

The iukiller synthetic benchmark ‘ Stress-test ’ benchmark for mostly parallel GC Trees tend to escape from collector Final marking tends to be long root large binary trees repeat

Results of iukiller benchmark: the maximum pause time Previous collectors fail – > 1.8 seconds – The larger the heap, the longer BD+R achieves <12ms pause – independent from heap size

Application benchmarks Programs written in C/C++ – deltablue: an incremental constraint solver (25MB) – espresso: a logic optimizer for PLA (10MB) – N-Body: an N-Body solver with Barnes-Hut (15MB) – CKY: a context free grammar parser (40MB) – Cube: a Rubik ’ s cube puzzle solver (8MB)

Results of application benchmarks: the maximum pause time BD+R achieves <5ms pause in five applications BD is also OK (< 16ms) 215ms 283ms

Results of application benchmarks: overhead BD/BD+R is <9% slower than Basic – More protection All incr. GCs are 1 — 53% slower than Stop – VM write barrier – Floating garbage – More GC cycles Total execution times ( ‘ Stop ’ =1)

Related work [Appel et al. 88] – Copy GC with VM read barrier. Slower than write barrier [Furuso et al. 91] – Snapshot-at-beginning on VM. Large space overhead Recent version of [Boehm et al. 91] – Time limit on final marking. Risks of starvation [Printezis et al. 00] [Ossia et al. 02] – Keep # of dirty cards small. Final marking is still unbounded

Summary An incremental conservative GC Short pause (<5ms in 5 applications) GC progress Use both techniques: – Bounding dirty pages – Retrying incremental marking

Future direction Reducing overhead of BD – Strategy for proper limit for dirty pages Bounding roots to be scanned – Protect stacks partially

Mostly parallel garbage collector (cont. 1) Stop-the-world GC time Mostly parallel GC time User GC GC cycle Initialization &protection concurrent marking final marking concurrent sweeping markingsweeping

Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root

Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

Technique 2: Retrying concurrent marking Instead of a single final marking, we repeat concurrent marking and termination check – If termination check takes longer time than a given limit, it aborts and restarts concurrent marking Boehm ’ s implementation on Web repeats termination check up to twice time GC cycle Initialization concurrent marking termination check concurrent sweeping

Discussion on techniques Each technique is not novel, but combining the two is essential Without retrying, final marking may be still long Without bounding, progress of termination check may be insufficient w/o bounding with bounding termination check aborted termination check found unmarked objects

Other techniques Concurrent protecting Atomic protecting takes O(heap-size) time! Allocating black in later stages of GC cycle – Allocating always black retains many short lived objects – Allocating always white (unmarked) may prevent GC progress Allocating white first, and black in later

Results: Minimum mutator utilization (MMU) Window sizes are on a log scale The optimized collector shows good MMUs for small windows

Results of application benchmarks: the number of repetition BD+R: Repetition of incr. marking per GC Usually <2times No infinite loop The worst case is 5 times. Need improvement?