Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Similar presentations


Presentation on theme: "A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research."— Presentation transcript:

1 A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research

2 n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

3 n Real-time Embedded Systems n Memory usage important n Uniprocessor Problem Domain

4 3 Styles of Uniprocessor Garbage Collection: Stop-the-World vs. Incremental vs. Real-Time STW Inc RT time

5 Pause Times (Average and Maximum) STW Inc RT 1.5s 1.7s 0.5s 0.7s0.3s 0.5s0.9s 0.3s 0.15 - 0.19 s 1.6s 0.5s 0.18s

6 Coarse-Grained Utilization vs. Time STW Inc RT 2.0 s window

7 Fine-Grained Utilization vs. Time STW Inc RT 0.4 s window

8 Minimum Mutator Utilization (MMU) STW Inc RT

9 Space Usage over Time max live trigger 2 X max live

10 Problems with Existing RT Collectors max live 2 X max live 3 X max live 4 X max live Non-moving Collector max live 2 X max live 3 X max live 4 X max live Replicating Collector Not fully incremental, Tight coupling, Work-based scheduling

11 Our Collector n GoalsResults n Real-Time ~10 ms n Low Space Overhead ~2X n Good Utilization during GC ~ 40% n Solution n Incremental Mark-Sweep Collector n Write barrier – snapshot-at-the-beginning [Yuasa] n Segregated free list heap architecture n Read Barrier – to support defragmentation [Brooks] n Incremental defragmentation n Segmented arrays – to bound fragmentation

12 n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

13 Fragmentation and Compaction n Intuitively: available but unusable memory è avoidance and coalescing - no guarantees è compaction use d neede d fre e

14 Heap Architecture n Segregated Free Lists – heap divided into pages – each page has equally-sizes blocks (1 object per block) – Large arrays are segmented usedfree sz 24 sz 32 external internal page-internal

15 Controlling Internal and Page-Internal Fragmentation n Choose page size (page) and block sizes (s k ) n If s k = s k-1 (1 + q ), internal fragmentation [ q n page-internal fragmentation [ page / s max n E.g. If page = 16K, q = 1/8, s max = 2K, maximum non-external fragmentation to 12.5%.

16 Fragmentation - small heap ( q = 1/8 vs. q = 1/2 ) q =1/8 q =1/2

17 Incremental Compaction n Compact only a part of the heap è Requires knowing what to compact ahead of time n Key Problems è Popular objects è Determining references to moved objects use d

18 Incremental Compaction: Redirection n Access all objects via per-object redirection pointers n Redirection is initially self-referential n Move an object by updating ONE redirection pointer original replica

19 Consistency via Read Barrier [Brooks] n Correctness requires always using the replica n E.g. field selection must be modified x[offset] x x[redirect][offset ] x normal access read barrier access x

20 Some Important Details n Our read barrier is decoupled from collection n Complication: In Java, any reference might be null n actual read barrier for GetField(x,offset) must be augmented tmp = x[offset]; return (tmp == null) ? null : tmp[redirect] è CSE, code motion (LICM and sinking), null-check combining n Barrier Variants - when to redirect è lazy - easier for collector è eager - better for optimization

21 Barrier Overhead to Mutator n Conventional wisdom says read barriers are too expensive è Studies found overhead of 20-40% (Zorn, Nielsen) è Our barrier has 4-6% overhead with optimizations

22 Heap (one size only)Stack Program Start

23 HeapStack free allocated Program is allocating

24 HeapStack free unmarked GC starts

25 HeapStack free unmarked marked or allocated Program allocating and GC marking

26 HeapStack free unmarked marked or allocated Sweeping away blocks

27 HeapStack free allocated evacuated GC moving objects and installing redirection

28 HeapStack free unmarked evacuated marked or allocated 2 nd GC starts tracing and redirection fixup

29 HeapStack free allocated 2 nd GC complete

30 n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

31 Scheduling the Collector n Scheduling Issues n bad CPU utilization and space usage n loose program and collector coupling n Time-Based n Trigger the collector to run for C T seconds whenever the program runs for Q T seconds n Work-Based n Trigger the collector to collect C W work whenever the program allocate Q W bytes

32 Time-Based Scheduling n Trigger the collector to run for C T seconds whenever the program runs for Q T seconds Space (Mb) Time (s) MMU (CPU Utilization) Window Size (s)

33 Work-Based Scheduling MMU (CPU Utilization) n Trigger the collector to collect C W bytes whenever the program allocates Q W bytes Window Size (s) Space (Mb) Time (s)

34 n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

35 Pause Time Distribution for javac (Time-Based vs. Work-Based) 12 ms

36 Utilization vs. Time for javac (Time-Based vs. Work-Based) Utilization (%) Time (s) 0.4 0.2 0 0.6 0.8 1.0 0.4 0.2 0 0.6 0.8 1.0 Utilization (%) 0.45

37 Minimum Mutator Utilization for javac (Time-Based vs. Work-Based)

38 Space Usage for javac (Time-Based vs. Work-Based)

39 n 3 inter-related factors: Space Bound (tradeoff) Utilization (tradeoff) Allocation Rate (lower is better) n Other factors Collection rate (higher is better) Pointer density (lower is better) Intrinsic Tradeoff

40 Summary: Mostly Non-moving RT GC n Read Barriers n Permits incremental defragmentation n Overhead is 4-6% with compiler optimizations n Low Space Overhead n Space usage is only about 2 X max live data n Fragmentation still bounded n Consistent Utilization n Always at least 45% at 12 ms resolution

41 Conclusions n Real-time GC is real n There are tradeoffs just like in traditional GC n Scheduling should be primarily time-based n Fallback to work-based due to user’s incorrect parameter estimations n Incremental defragmentation is possible n Compiler support is important!

42 Future Work n Lowering the real-time resolution n Sub-millisecond worst-case pause n Main issue: breaking up stack scan n Segmented array optimizations n Reduce segmented array cost below ~2% n Opportunistic contiguous layout n Type-based specialization with invalidation n Strip-mining


Download ppt "A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research."

Similar presentations


Ads by Google