Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.

Similar presentations


Presentation on theme: "1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss."— Presentation transcript:

1 1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (Umass), Zhenlin Wang (MTU), Perry Cheng (IBM) Presented by Na Meng Many thanks to authors and the anonymous speaker on MM course last time

2 2 Motivation Memory gap problem OO programs exacerbates memory gap problem –Automatic memory management Pointer data structures Goal: improve OO program locality

3 3 Opportunity Copying garbage collector reorders objects at runtime

4 4 1 4 6 5 7 23 Copying of Linked Objects Breadth First 6 5 7 4 32 1

5 5 7123456 1 4 6 5 7 23 Copying of Linked Objects 6 5 7 4 32 1 Breadth First Depth First

6 6 7 123 4 56 Copying of Linked Objects Depth First Online Object Reordering 1 4 Breadth First 6 1 2 347 5 1 4 6 5 7 23 6 5 7 4 32 1 4 1

7 7 Outline Motivation Online Object Reordering (OOR) Methodology Experimental Results Conclusion

8 8 Online Object Reordering Where are the cache misses? How to identify hot field accesses at runtime? How to reorder the objects?

9 9 Where Are The Cache Misses? VM ObjectsStack Older Generation Heap structure: Nursery Not to scale

10 10 Where Are The Cache Misses?

11 11 Where Are The Cache Misses? Two opportunities to reorder objects in the older generation –Promote nursery objects –Full heap collection

12 12 How to Find Hot Fields? Runtime info (intercept every read)? Compiler analysis? Runtime information + compiler analysis Key: Low overhead estimation

13 13 Which Classes Need Reordering? Step 1: Compiler analysis –Excludes cold basic blocks –Identifies field accesses Step 2: JIT adaptive sampling identifies hot methods –Mark as hot field accesses in hot methods

14 14 Example: Compiler Analysis Compiler Hot BB Collect access info Cold BB Ignore Compiler Access List: 1. A.b 2. …. …. Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }

15 15 Example: Adaptive Sampling Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } Adaptive Sampling Foo is hot Foo Accesses: 1. A.b 2. …. …. A.b is hot A B b ….. c A’s type information cb

16 16 1 4 6 5 7 23 Copying of Linked Objects 6 5 7 4 3 Online Object Reordering Type Information 1 43 2 1 Hot space Cold space

17 17 OOR System Overview Baseline Compiler Source Code Executing Code Adaptive Sampling Optimizing Compiler Hot Methods Access Info Database Register Hot Field Accesses Look Up Adds Entries GC: Copies Objects Affects Locality Advice GC: Copies Objects OOR addition JikesRVM componentInput/Output Optimizing Compiler Adaptive Sampling Improves Locality

18 18 Outline Motivation Online Object Reordering Methodology Experimental Results Conclusion

19 19 Virtual Machine Jikes RVM –VM written in Java –High performance –Timer based adaptive sampling –Dynamic optimization Experiment setup –Pseudo-adaptive –2 nd iteration [Eeckhout et al.]

20 20 Memory Management Memory Management Toolkit (MMTk) –Allocators and garbage collectors –Multi-space heap Boot image Large object space (LOS) Immortal space Experiment setup –Generational copying GC with 4M bounded nursery

21 21 Overhead: OOR Analysis Only BenchmarkBase Execution Time (sec) w/ only OOR Analysis (sec) Overhead jess 4.394.430.84% jack 5.795.820.57% raytrace 4.634.61-0.59% mtrt 4.954.990.70% javac 12.8312.70-1.05% compress 8.568.540.20% pseudojbb 13.3913.430.36% db 18.88 -0.03% antlr 0.940.91-2.90% hsqldb 160.56158.46-1.30% ipsixql 41.6242.431.93% jython 37.7137.16-1.44% ps-fun 129.24128.04-1.03% Mean -0.19%

22 22 Detailed Experiments Separate application and GC time Vary thresholds for method heat Vary thresholds for cold basic blocks Three architectures –x86, AMD, PowerPC x86 Performance counter: –DL1, trace cache, L2, DTLB, ITLB

23 23 Performance javac

24 24 Performance db

25 25 Performance jython Is the improvement significant?

26 26 Phase Changes

27 Algorithm: Decay Field Heat 27 DECAY-HEAT(method) 1 for each fieldAccess in method do 2 if PotentiallyHot(fieldAccess)then 3 hotField  fieldAccess.field 4 class  hotField.instantiatingClass 5 class.hasHotField  true 6 for each field in class do 7 period  Now() – class.lastUpdate 8 decay  HI/(HI + period) 9 field.heat  field.heat * decay 10 if field.heat < LO then 11 field.heat = 0 12 hotField.heat  HI 13 class.lastUpdate  Now() Will the latest access pattern erase the earlier access pattern(s)? m1(){ for(… …){ … … a.b = … } m2(){ for(… …){ … … = a.c; } for(… …){ m1(); //GC works m2(); //GC works }

28 OOR w/o vs. w phase change 28 Almost all hot fields within an object are visited around the same time The standard benchmarks have few, if any, traversal order phases.

29 Copying Advantage (javac) 29 GenCopy vs. MS Mutator time? GC time? Total time?

30 A Possible Comparison 30 GenCopy vs. GenOOR ?

31 Discussion Any other solution to improve the locality while doing copying collection 31

32 32 Questions? Thank you!


Download ppt "1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss."

Similar presentations


Ads by Google