Presentation is loading. Please wait.

Presentation is loading. Please wait.

David F. Bacon, Perry Cheng, and V.T. Rajan

Similar presentations


Presentation on theme: "David F. Bacon, Perry Cheng, and V.T. Rajan"— Presentation transcript:

1 David F. Bacon, Perry Cheng, and V.T. Rajan
A Real-Time Garbage Collector with Low Overhead and Consistent Utilization David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center Presented by Jason VanFickell thanks to Srilakshmi Swati Pendyala for 2009 slides

2 Need for a real-time garbage collector with low memory usage.
Motivation Real-time systems growing in importance Desirability of higher level programming languages Constraints for Real-Time Systems Hard constraints for continuous performance (Low Pause Times) Memory Constraints (less memory in embedded systems) Maximum Pause Time < Required Response CPU Utilization sufficient to accomplish task Measured with Minimum Mutator Utilization Memory Requirement < Resource Limit Important Constraint in Embedded Systems Need for a real-time garbage collector with low memory usage.

3 Problems with Previous Works
Fragmentation Early works (Baker’s Treadmill) handles a single object size Fragmentation not a major problem for a family of C and C++ benchmarks (Johnstone’ Paper) Unsustainable for long-running programs Use of single (large) block size Increase in memory requirements and internal fragmentation High Space Overhead Copying algorithms to avoid fragmentation increase space overhead Uneven Mutator Utilization Fraction of processor devoted to mutator execution Copying algorithms suffer from uneven mutator utilization Long low-utilization periods Inability to handle large data structures

4 Components and Concepts in Metronome
Segregated free list allocator Geometric size progression limits internal fragmentation Mostly non-copying Objects are usually not moved. Defragmentation Moves objects to a new page when page is fragmented due to GC Read barrier: to-space invariant [Brooks] New techniques with only 4% overhead Incremental mark-sweep collector Mark phase fixes stale pointers Arraylets: bound fragmentation, large object ops Time-based scheduling New Old

5 Segregated Free List Allocator
Heap divided into fixed-size pages Each page divided into fixed-size blocks Objects allocated in smallest block that fits 12 16 24

6 Limiting Internal Fragmentation
Choose page size P and block sizes sk such that sk = sk-1(1+ρ) How do we choose small s0 & ρ ? s0 ~ minimum block size ρ ~ sufficiently small to avoid internal fragmentation Too small a ρ leads to too many pages and hence a wastage of space, but it should be okay for long running processes Too large a ρ leads to internal fragmentation Memory for a page should be allocated only when there is at least one object in that page.

7 Defragmentation When do we move objects?
At the end of sweep phase, when there are no sufficient free pages for the mutator to execute, that is, when there is fragmentation Usually, program exhibits locality of size Dead objects are re-used quickly Defragment either when Dead objects are not re-used for a GC cycle Free pages fall below limit for performing a GC In practice: we move 2-3% of data traced Major improvement over copying collector

8 Read Barrier: To-space Invariant
Problem: Collector moves objects (defragmentation) Mutator is finely interleaved Solution: read barrier ensures consistency Each object contains a forwarding pointer [Brooks] Read barrier unconditionally forwards all pointers Mutator never sees old versions of objects Will the mutator utilization have any effects because of the read barrier ? X X Y A Y A A′ Z Z From-space To-space BEFORE AFTER

9 Read Barrier Optimization
Previous studies: 20-40% overhead [Zorn, Nielsen] Several optimizations applied to the read barrier and reduced the cost over-head to <10% using Eager Read Barriers “Eager” read barrier preferred over “Lazy” read barrier.

10 Incremental Mark-Sweep
Mark/sweep finely interleaved with mutator Write barrier: snapshot-at-the-beginning [Yuasa] Ensures no lost objects Treats objects in write buffer as roots Read barrier ensures consistency Marker always traces correct object Simpler interleaving

11 Pointer Fix-up During Mark
When can a moved object be freed? When there are no more pointers to it Mark phase updates pointers Redirects forwarded pointers as it marks them Object moved in collection n can be freed: At the end of mark phase of collection n+1 X Y A A′ Z From-space To-space

12 Arraylets Large arrays create problems
Fragment memory space Can not be moved in a short, bounded time Solution: break large arrays into arraylets Access via indirection; move one arraylet at a time A1 A2 A3

13 Program Start Stack Heap (one size only)

14 Program is allocating Stack Heap free allocated

15 GC starts Stack Heap free unmarked

16 Program allocating and GC marking
Stack Heap free unmarked marked or allocated

17 Sweeping away blocks Stack Heap free unmarked marked or allocated

18 GC moving objects and installing redirection
Stack Heap free evacuated allocated

19 2nd GC starts tracing and redirection fixup
Stack Heap free evacuated unmarked marked or allocated

20 2nd GC complete Stack Heap free allocated

21 Scheduling the Collector
Scheduling Issues Poor CPU utilization and space usage Loose program and collector coupling Competing options: Time-Based Trigger the collector to run for CT seconds whenever the mutator runs for QT seconds Work-Based Trigger the collector to collect CW work whenever the mutator allocate QW bytes

22 Scheduling Memory allocation does not need to be monitored.
Time – Based Work – Based Very predictable mutator utilization Memory allocation does not need to be monitored. Uneven mutator utilization due to bursty allocation Memory allocation rates need to be monitored to make sure real-time performance is obtained

23 Experimental Results 500 MHz PowerPC RS64 III 4 GB RAM
IBM RS/6000 Enterprise Server F80 AIX 5.1 500 MHz PowerPC RS64 III 4 GB RAM 4 MB of L2 cache Jikes Research Virtual Machine (RVM) 2.1.1 Adaptive compilation disabled

24 Pause Time Distribution for javac (Time-Based vs. Work-Based)

25 Utilization vs. Time for javac (Time-Based vs. Work-Based)
0.45

26 Minimum Mutator Utilization for javac (Time-Based vs. Work-Based)

27 Space Usage for javac (Time-Based vs. Work-Based)

28 Conclusions The Metronome provides true real-time GC Critical features
First collector to do so without major sacrifice Short pauses (12.4 ms) Copying limited to 4% overhead High MMU during collection (50%) Low memory consumption (2.5 x max live) Critical features Time-based scheduling Hybrid, mostly non-copying approach Integration with the compiler

29 Discussion What are the downsides of incremental real-time collection?
What is preserved that Baker's algorithm does not? Was the architecture used for the experiments appropriate? Were the performance characteristics adequately explored?


Download ppt "David F. Bacon, Perry Cheng, and V.T. Rajan"

Similar presentations


Ads by Google