Presentation is loading. Please wait.

Presentation is loading. Please wait.

380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.

Similar presentations


Presentation on theme: "380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap."— Presentation transcript:

1 380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap a lot? – Why you need to care about workloads – Alias analysis – Dependence analysis – Loop transformations – EDGE architectures 1

2 2 380C lecture 18 Garbage Collection – Why use garbage collection? – What is garbage? Reachable vs live, stack maps, etc. – Allocators and their collection mechanisms Semispace Marksweep Performance comparisons Mark Region – Incremental age based collection Write barriers: Friend or foe? Generational Beltway

3 Mark Region and Other Advances in Garbage Collection Kathryn S. McKinley Stephen M. Blackburn University of Texas at Austin Australian National University PLDI’08: Immix: A Mark-Region Collector With Space Efficiency, Fast Collection, and Mutator Performance

4 Isn’t GC a bit retro? 4 “Languages without automated garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing the performance penalty you have to pay for garbage collection.” Paul Jansen, managing director of TIOBE Software, in Dr Dobbs, April 2008 “Languages without automated garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing the performance penalty you have to pay for garbage collection.” Paul Jansen, managing director of TIOBE Software, in Dr Dobbs, April 2008 Mark-Compact Styger, 1967 Mark-Sweep McCarthy, 1960 Semi-Space Cheney, 1970

5 GC Fundamentals The Time–Space Tradeoff 5

6 6 Our Goal

7 GC Fundamentals Algorithmic Components AllocationReclamation 7 Identification Bump Allocation Free List ` Tracing (implicit) Reference Counting (explicit) Sweep-to-Free Compact Evacuate 31

8 Mark-Compact [Styger 1967] Bump allocation + trace + compact GC Fundamentals Canonical Garbage Collectors 8 ` Sweep-to-Free Compact Evacuate Mark-Sweep [McCarthy 1960] Free-list + trace + sweep-to-free Semi-Space [Cheney 1970] Bump allocation + trace + evacuate

9 Mark-Sweep Free List Allocation + Trace + Sweep-to-Free 9 Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Space efficient ✓ ✓ Simple, very fast collection Poor locality

10 10 Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Space efficient Mark-Compact Bump Allocation + Trace + Compact Expensive multi-pass collection ✓ ✓ Good locality Good locality

11 Semi-Space Bump Allocation + Trace + Evacuation 11 Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Good locality Space inefficient

12 Mark-Region with Sweep-To-Region 12 ` Sweep-to-Free Compact Evacuate Reclamation Sweep-to-Region Mark-Sweep Free-list + trace + sweep-to-free Mark-Compact Bump allocation + trace + compact Semi-Space Bump allocation + trace + evacuate Mark-Region Bump + trace + sweep-to-region

13 Mark-Region Bump Allocation + Trace + Sweep-to-Region 13 ✓ ✓ Simple, very fast collection ✓ ✓ Space efficient ✓ ✓ Good locality Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Excellent performance Excellent performance

14 Naïve Mark-Region 14 Contiguous allocation into regions Excellent locality – For simplicity, objects cannot span regions Simple mark phase (like mark-sweep) – Mark objects and their containing region Unmarked regions can be freed 0 0

15 Immix Efficient Mark-Region Garbage Collection 15

16 Lines and Blocks 16 Small Regions Large Regions ✗ Fragmentation (can’t fill blocks) ✓ More contiguous allocation ✗ Fragmentation (false marking) Lines & Blocks N pagesapprox 1 cache line ✓ Less fragmentation  Objects span lines ✓ Fast common case  Lines marked with objects ✗ Increased metadata o/h ✗ Constrained object sizes 0 0  TLB locality, cache locality  Block > 4 X max object size Free Recyclable lines

17 Allocation Policy (Recycling) 17 Recycle partially marked blocks first Minimizes fragmentation Maximizes sharing of freed blocks Recycle in address order – We explored other options Allocate into free blocks last

18 Opportunistic Defragmentation 18 0 0 Identify source and target blocks – (see paper for heuristics) Evacuate objects in source blocks – Allocate into target blocks Opportunistic – Leave in place if no space, or object pinned Opportunistically evacuate fragmented blocks – Lightweight, uses same allocation mechanism – No cost in common case (specialized GC)

19 Other Optimizations 19 Implicit Marking ✓ Most objects small  Small objects implicitly mark next line ✓ V. Fast common case  Large objects mark lines exactly Implicit line mark Line mark Overflow Allocation  Multi-line objects may skip many small holes  Overflow allocation (used on failure) ✓ Large objects uncommon ✓ V. effective solution ✓ ✓

20 Results Complete data available at: http://cs.anu.edu.au/~Steve.Blackburn/pubs 20

21 Evaluation 20 Benchmarks Hardware 21 Collectors ` Methodology DaCapo SPECjvm98 SPEC jbb2000 MMTk Jikes RVM 2.9.3 (Perf ≈ HotSpot 1.5) Replay compiler Discard outliers Report 95 th %ile Full Heap Immix MarkSweep MarkCompact SemiSpace Generational GenIX GenMS GenCopy Sticky StickyIX StickyMS Core 2 Duo 2.4GHz, 32KB L1, 4MB L2, 2GB RAM AMD Athlon 3500+ 2.2GHz, 64KB L1, 512KB L2, 2GB RAM PowerPC 970 1.6GHz, 32KB L1, 512KB L2, 2GB RAM Please see the paper for details.

22 Mutator Time 22 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

23 Minimum Heap 23

24 GC Time 24 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

25 Total Performance 25 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

26 Generational Performance 26 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

27 Sticky Performance 27 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

28 PseudoJBB 2000 28 On 2.4GHz Core 2 Duo

29 PseudoJBB 2000 29 On 2.4GHz Core 2 Duo

30 Prior Work http://www.ibm.com/developerworks/ibm/library/i-garbage1/ IBM product collector – Mark-Region not characterized – Collector not evaluated – Product and basis for other research [Domani et al 2000][Kermany & Petrank 2006] 30

31 Mark-Region Collection 31 ` Sweep-to-Free Compact Evacuate Mark-Sweep Free-list + trace + sweep-to-free Mark-Compact Bump allocation + trace + compact Semi-Space Bump allocation + trace + evacuate Mark-Region Bump allocation + trace + sweep-to-region Sweep-to-Region

32 Immix Efficient Mark-Region Collection 32 ✓ ✓ Simple, very fast collection ✓ ✓ Space efficient ✓ ✓ Good locality Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Excellent performance Excellent performance

33 Open Source Code available in JikesRVM 2.9.3 onward. http://www.jikesrvm.org Complete data available at: http://cs.anu.edu.au/~Steve.Blackburn/pubs 33

34 Research History PLDI 1998 – Clinger & Hanson postulated the radioactive decay model for object lifetimes Genesis of Older-First – [Stefanovic, McKinley, Moss OOPSLA’99] 34

35 Garbage Collection Hypotheses Generational hypothesis: younger objects die quickly, so collect them first Older-first hypothesis: the collector can collect less the longer it waits 35 Survival function s(v) for object lifetime distribution younger  older 0 1/2V V Age ordered heap s(v)

36 Older-first Algorithm 36

37 Next Steps Beltway – [BJMM PLDI’02] – Increments – Belts – Combines generational and older-first Ulterior Reference Counting – [BM OOPSLA’03] – Reference count on-per-object basis – Responsiveness and throughput MMTk : [BCM SIGMETRICS’04 ICSE’04] – Toolkit for building & understanding GC – Motivated today’s work 37 3 45678910 3334353637383940 01

38 Garbage Collection is the Answer to All Your Problems Improves data and code locality – [Huang et al. OOPSLA’02 ISMM’04, VEE’04] Cooperative GC optimizations – Colocation [Guyer OOPSLA’05] – Free-me [Guyer et al. PLDI’06] Finds leaks – [Bond ASPLOS’06, Jump POPL’07] Tolerates leaks – [Bond OOSLA’08] Helps with dynamic software updating! – [Subramaniam, Hicks ??’08] DaCapo Benchmarks – [Blackburn et al. OOPSLA’06 CACM’08] 38

39 380C Where are we & where we are going – Why you need to care about workloads – Managed languages Dynamic compilation Inlining Garbage collection – Opportunity to improve data locality on-the-fly – Read: X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng, The Garbage Collection Advantage: Improving Program Locality, ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 69-80, Vancouver, Canada, October 2004. – Alias analysis – Dependence analysis – Loop transformations – EDGE architectures


Download ppt "380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap."

Similar presentations


Ads by Google