Presentation is loading. Please wait.

Presentation is loading. Please wait.

380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

Similar presentations


Presentation on theme: "380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality."— Presentation transcript:

1 380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality on-the-fly –Other opportunities? –Why you need to care about workloads –Alias analysis –Dependence analysis –Loop transformations –EDGE architectures 1 CS380C Lecture 19

2 2 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM) CS380C Lecture 19

3 3 Today: Advanced Topics Generational Garbage Collection Copying objects is an opportunity Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT), J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM), “The Garbage Collection Advantage: Improving Program Locality,” OOPSLA 2004. CS380C Lecture 19

4 4 Motivation Memory gap problem OO programs become more popular OO programs exacerbates memory gap problem –Automatic memory management –Pointer data structures –Many small methods Goal: improve OO program locality CS380C Lecture 19

5 5 Allocation Mechanisms Fast (increment & bounds check) contemporaneous object locality 8 Can't incrementally free & reuse: must free en masse Bump-Pointer CS380C Lecture 19

6 6 Allocation Mechanisms Fast (increment & bounds check) contemporaneous object locality 8 Can't incrementally free & reuse: must free en masse Bump-Pointer CS380C Lecture 19

7 7 Allocation Mechanisms Fast (increment & bounds check) contemporaneous object locality 8 Can't incrementally free & reuse: must free en masse Bump-PointerFree-List 8 Slightly slower (consult list for fit) 8 Mystery locality Can incrementally free & reuse cells CS380C Lecture 19

8 8 State-of-the-art throughput Copying Generational GC Requirements –write-barrier to track inter-generation pointers remsets, cards –copy reserve Advantages: –Minimizes copying of older objects –Compaction of long-lived objects Problems: –Not very incremental –Very youngest objects always copied –What order should GC use to copy objects? etc. etc … ‘nursery’‘older generation’ CS380C Lecture 19

9 9 Opportunity Generational copying garbage collector reorders objects at runtime CS380C Lecture 19

10 10 1 4 6 5 7 23 Copying of Linked Objects Breadth First 6 5 7 4 32 1 CS380C Lecture 19

11 11 7123456 1 4 6 5 7 23 Copying of Linked Objects 6 5 7 4 32 1 Breadth First Depth First CS380C Lecture 19

12 12 7 123 4 56 Copying of Linked Objects Depth First Online Object Reordering 1 4 Breadth First 6 1 2 347 5 1 4 6 5 7 23 6 5 7 4 32 1 4 1 CS380C Lecture 19

13 13 Outline Motivation Online Object Reordering (OOR) Methodology Experimental Results Conclusion CS380C Lecture 19

14 14 Cache Performance Matters CS380C Lecture 19

15 15 Online Object Reordering Where are the cache misses? How to identify hot field accesses at runtime? How to reorder the objects? CS380C Lecture 19

16 16 Where Are The Cache Misses? VM ObjectsStack Older Generation Heap structure: Nursery Not to scale CS380C Lecture 19

17 17 Where Are The Cache Misses? CS380C Lecture 19

18 18 Where Are The Cache Misses? Two opportunities to reorder objects in the older generation –Promote nursery objects –Full heap collection CS380C Lecture 19

19 19 How to Find Hot Fields? Runtime info (intercept every read)? Compiler analysis? Runtime information + compiler analysis Key: Low overhead estimation CS380C Lecture 19

20 20 Which Classes Need Reordering? Step 1: Compiler analysis –Excludes cold basic blocks –Identifies field accesses Step 2: JIT adaptive sampling identifies hot methods –Mark as hot field accesses in hot methods Key: Low overhead estimation CS380C Lecture 19

21 21 Example: Compiler Analysis Compiler Hot BB Collect access info Cold BB Ignore Compiler Access List: 1. A.b 2. …. …. Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } CS380C Lecture 19

22 22 Example: Adaptive Sampling Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } Adaptive Sampling Foo is hot Foo Accesses: 1. A.b 2. …. …. A.b is hot A B b ….. c A’s type information cb CS380C Lecture 19

23 23 1 4 6 5 7 23 Copying of Linked Objects 6 5 7 4 3 Online Object Reordering Type Information 1 43 2 1 Hot space Cold space CS380C Lecture 19

24 24 OOR System Overview Baseline Compiler Source Code Executing Code Adaptive Sampling Optimizing Compiler Hot Methods Access Info Database Register Hot Field Accesses Look Up Adds Entries GC: Copies Objects Affects Locality Advice GC: Copies Objects OOR addition JikesRVM componentInput/Output Optimizing Compiler Adaptive Sampling Improves Locality CS380C Lecture 19

25 25 Outline Motivation Online Object Reordering Methodology Experimental Results Conclusion CS380C Lecture 19

26 26 Methodology: Virtual Machine Jikes RVM –VM written in Java –High performance –Timer based adaptive sampling –Dynamic optimization Experiment setup –Pseudo-adaptive –2 nd iteration [Eeckhout et al.] CS380C Lecture 19

27 27 Methodology: Memory Management Memory Management Toolkit (MMTk): –Allocators and garbage collectors –Multi-space heap Boot image Large object space (LOS) Immortal space Experiment setup –Generational copying GC with 4M bounded nursery CS380C Lecture 19

28 28 Overhead: OOR Analysis Only BenchmarkBase Execution Time (sec) w/ only OOR Analysis (sec) Overhead jess 4.394.430.84% jack 5.795.820.57% raytrace 4.634.61-0.59% mtrt 4.954.990.70% javac 12.8312.70-1.05% compress 8.568.540.20% pseudojbb 13.3913.430.36% db 18.88 -0.03% antlr 0.940.91-2.90% hsqldb 160.56158.46-1.30% ipsixql 41.6242.431.93% jython 37.7137.16-1.44% ps-fun 129.24128.04-1.03% Mean -0.19% CS380C Lecture 19

29 29 Detailed Experiments Separate application and GC time Vary thresholds for method heat Vary thresholds for cold basic blocks Three architectures –x86, AMD, PowerPC x86 Performance counter: –DL1, trace cache, L2, DTLB, ITLB CS380C Lecture 19

30 30 Performance javac CS380C Lecture 19

31 31 Performance db CS380C Lecture 19

32 32 Performance jython Any static ordering leaves you vulnerable to pathological cases. CS380C Lecture 19

33 33 Phase Changes CS380C Lecture 19

34 34 Related Work Evaluate static orderings [Wilson et al.] –Large performance variation Static profiling [Chilimbi et al., and others] –Lack of flexibility Instance-based object reordering [Chilimbi et al.] –Too expensive CS380C Lecture 19

35 35 Conclusion Static traversal orders have up to 25% variation OOR improves or matches best static ordering OOR has very low overhead Past predicts future CS380C Lecture 19

36 380C Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads & methodology Read: Blackburn et al., Wake Up and Smell the Coffee: Evaluation Methodology for the 21 st Century, ACM CACM, 51(8): 83--89, August, 2008. –Alias analysis –Dependence analysis –Loop transformations –EDGE architectures 36 CS380C Lecture 19


Download ppt "380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality."

Similar presentations


Ads by Google