Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,
A Block-structured Heap Simplifies Parallel GC Simon Marlow (Microsoft Research) Roshan James (U. Indiana) Tim Harris (Microsoft Research) Simon Peyton.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
A Coherent and Managed Runtime for ML on the SCC KC SivaramakrishnanLukasz Ziarek Suresh Jagannathan Purdue University SUNY Buffalo Purdue University.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
Beltway: Getting Around Garbage Collection Gridlock Mrinal Deo CS395T Presentation March 2, Content borrowed from Jennifer Sartor & Kathryn McKinley.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Task-aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Laurent Daynès Chandra Krintz RACE Lab, UC Santa Barbara Sun Microsystems.
Garbage Collection Mooly Sagiv html://
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Incremental Garbage Collection
Compilation 2007 Garbage Collection Michael I. Schwartzbach BRICS, University of Aarhus.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
Garbage Collection Mooly Sagiv
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Mark and Split Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Taking Off The Gloves With Reference Counting Immix
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Real-time collection for multithreaded Java Microcontroller Garbage Collection. Garbage Collection. Application of Java in embedded real-time systems.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Real-time Garbage Collection By Tim St. John Low Overhead and Consistent Utilization. Low Overhead and Consistent Utilization. Multithreaded Java Microcontroller.
An Efficient, Incremental, Automatic Garbage Collector P. Deutsch and D. Bobrow Ivan JibajaCS 395T.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Java 9: The Quest for Very Large Heaps
Yak: A High-Performance Big-Data-Friendly Garbage Collector
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Strategies for automatic memory management
Beltway: Getting Around Garbage Collection Gridlock
Created By: Asst. Prof. Ashish Shah, J.M.Patel College, Goregoan West
Run-time environments
Reference Counting.
Mooly Sagiv html:// Garbage Collection Mooly Sagiv html://
Presentation transcript:

Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala University, Sweden

Goals of this work Efficiently implement concurrency through asynchronous message-passing Memory management with real-time characteristics o Short stop-times o High mutator utilization Design for multithreading

Our context: Erlang Designed for highly concurrent applications Soft Real-Time Light-weight processes No destructive updates Data types: atoms, numbers, PIDs, tuples, cons cells (lists), binaries heap data

Our context: the Erlang/OTP system Industrial-strength implementation Used in embedded applications Three memory architectures: [ISMM’02] o Private o Shared o Hybrid

Stack Heap Private heaps PP

PP O(|message|) copy

Private heaps PP Garbage collection is a private business Fast memory reclamation of terminated processes

O(1) Shared heap PP Global synchronization Longer stop-times No fast reclamation of process-local data

Hybrid architecture PP Message area Process-local heaps Big objects area

Several possible methods o User annotations o Dynamic monitoring [Petrank et al ISMM’02] o Static analysis guided allocation Allocating messages in the message area

Static message analysis [SAS’03] Similar to escape analysis Allocation is process-local by default o Possible messages allocated on message area o Copy on demand Analysis is quite precise o Typically finds 99% of all messages

Process-local heaps Private business: No synchronization required Message area Two generations Copying collector in young generation o Fast allocation Mark-and-sweep in old generation o Prevents repeated copying of old objects Garbage Collection in Hybrid Arch.

GC of the message area is a bottleneck 1.Generational process scanning 2.Remembered set in local heaps The root-set for the message area consists of all stacks and process-local heaps This is not enough... We need an incremental collector in the Message Area!

Properties of incremental collector No overhead on mutator No space overhead on heap objects Short stop-times High mutator utilization

Old generation Organization of the Message Area Fwd Black-map Young generation Nursery From- space Nursery and from- space always have a constant size,  (=100k words) Storage area for forwarding pointers. Size bound by  (currently =  ) List of arbitrary sized areas Free-list, first-fit allocation Bit-array used to mark objects in mark- and-sweep

N limit N top allocation limit Nursery Organization of the Message Area

Incremental collector Two approaches to choose from: Work-based Reclaim n live words each step Time-based A step takes no more than t ms n and t are user-specified

Work-based collection The mutator wants to allocate need words reclaim = max( n, need ) N limit N top allocation limit Allocation limit = N top + reclaim

Time-based collection 1.User annotations (as in Metronome) 2.Dynamic worst-case calculation How much can the mutator allocate? How much live data is there?

Time-based collection  GC = reclaimed after GC – reclaimed before GC GC steps =  – reclaimed after GC  GC w M = N free GC steps N limit N top allocation limit Allocation limit = N top + w M 

Collecting the Message Area P1P2P3 FwdNurseryFromspace

Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery

Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery

Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery P1

Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery

Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery

Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit Cheap write barrier Link receiver to a list in the send operation

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit P1

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit P1

Collecting the Message Area P2P3 FwdFromspaceNurseryallocation limit P1

Performance evaluation: Settings Intel Xeon 2.4 GHz, 1GB RAM, Linux Start with small process-local heaps (233 words, grows when needed) Measure active CPU time o using hardware performance monitors

Performance evaluation: Benchmarks Mnesia – Distributed database system 1,109 processes 2,892,855 messages Yaws – HTTP Web server 420 processes 2,275,467 messages Adhoc – Data mining application 137 processes 246,021 messages

Stop-times – Time-based Mnesia Yaws t = 1ms

Stop-times – Work-based AdhocYaws n = 2 words Mean: 3 Geo. Mean: 2 Mean: 9 Geo. Mean: 1

Stop-times – Work-based AdhocYaws n = 100 words Mean: 53 Geo. Mean: 46 Mean: 268 Geo. Mean: 36 Time (  s)

Bench- mark n = 2 MA GC n = 100 MA GC n = 1000 MA GC Non-Inc. MA GC Mnesia Yaws Adhoc Message area total GC times incremental vs. non-incremental Times in ms

Bench- mark Mutator Local GC MA n = 2 MA n = 100 MA n = 1000 Mnesia52,9064, Yaws237,62911, Adhoc61,0458, Runtimes – Incremental Times in ms

Minimum Mutator Utilization The fraction of time that the mutator executes in any time window [Cheng & Blelloch PLDI 2001]

Mutator Utilization – Work-based Adhoc Yaws n = 100 words

Concluding Remarks Memory allocator is guided by the intended use of data Incremental Garbage Collector High mutator utilization Small overhead on total runtime No mutator overhead Small space overhead Really short stop-times!

Runtimes incremental vs. non-incremental Times in ms Bench- mark Inc. Mutator Non-Inc. Mutator Mnesia52,90653,276 Yaws237,629240,985 Adhoc61,04561,578

Total GC times incremental vs. non-incremental Times in ms Bench- mark Inc. Local GC Non-Inc. Local GC Mnesia4,4394,487 Yaws11,72811,359 Adhoc8,1947,848

Mutator Utilization – Time-based Mnesia Yaws t = 1ms