1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.

Slides:



Advertisements
Similar presentations
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Advertisements

Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.
Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science CRAMM: Virtual Memory Support for Garbage-Collected Applications Ting Yang, Emery.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
Taking Off The Gloves With Reference Counting Immix
Connectivity-Based Garbage Collection Martin Hirzel University of Colorado at Boulder Collaborators: Amer Diwan, Michael Hind, Hal Gabow, Johannes Henkel,
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Lecture 10 : Introduction to Java Virtual Machine
Institute of Computing Technology On Improving Heap Memory Layout by Dynamic Pool Allocation Zhenjiang Wang Chenggang Wu Institute of Computing Technology,
Oct Using Platform-Specific Performance Counters for Dynamic Compilation Florian Schneider and Thomas Gross ETH Zurich.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
CS380 C lecture 20 Last time –Linear scan register allocation –Classic compilation techniques –On to a modern context Today –Jenn Sartor –Experimental.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn,
Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines Kris Venstermans, Lieven Eeckhout, Koen De Bosschere Department of Electronics.
Polar Opposites: Next Generation Languages & Architectures Kathryn S McKinley The University of Texas at Austin.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Dynamic Compilation Vijay Janapa Reddi
Cork: Dynamic Memory Leak Detection with Garbage Collection
Approaches to Reflective Method Invocation
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Adaptive Code Unloading for Resource-Constrained JVMs
Correcting the Dynamic Call Graph Using Control Flow Constraints
Adaptive Optimization in the Jalapeño JVM
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Garbage Collection Advantage: Improving Program Locality
Program-level Adaptive Memory Management
Practical Assignment Sinking for Dynamic Compilers
Presentation transcript:

1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (Umass), Zhenlin Wang (MTU), Perry Cheng (IBM) Presented by Na Meng Many thanks to authors and the anonymous speaker on MM course last time

2 Motivation Memory gap problem OO programs exacerbates memory gap problem –Automatic memory management Pointer data structures Goal: improve OO program locality

3 Opportunity Copying garbage collector reorders objects at runtime

Copying of Linked Objects Breadth First

Copying of Linked Objects Breadth First Depth First

Copying of Linked Objects Depth First Online Object Reordering 1 4 Breadth First

7 Outline Motivation Online Object Reordering (OOR) Methodology Experimental Results Conclusion

8 Online Object Reordering Where are the cache misses? How to identify hot field accesses at runtime? How to reorder the objects?

9 Where Are The Cache Misses? VM ObjectsStack Older Generation Heap structure: Nursery Not to scale

10 Where Are The Cache Misses?

11 Where Are The Cache Misses? Two opportunities to reorder objects in the older generation –Promote nursery objects –Full heap collection

12 How to Find Hot Fields? Runtime info (intercept every read)? Compiler analysis? Runtime information + compiler analysis Key: Low overhead estimation

13 Which Classes Need Reordering? Step 1: Compiler analysis –Excludes cold basic blocks –Identifies field accesses Step 2: JIT adaptive sampling identifies hot methods –Mark as hot field accesses in hot methods

14 Example: Compiler Analysis Compiler Hot BB Collect access info Cold BB Ignore Compiler Access List: 1. A.b 2. …. …. Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c }

15 Example: Adaptive Sampling Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } Adaptive Sampling Foo is hot Foo Accesses: 1. A.b 2. …. …. A.b is hot A B b ….. c A’s type information cb

Copying of Linked Objects Online Object Reordering Type Information Hot space Cold space

17 OOR System Overview Baseline Compiler Source Code Executing Code Adaptive Sampling Optimizing Compiler Hot Methods Access Info Database Register Hot Field Accesses Look Up Adds Entries GC: Copies Objects Affects Locality Advice GC: Copies Objects OOR addition JikesRVM componentInput/Output Optimizing Compiler Adaptive Sampling Improves Locality

18 Outline Motivation Online Object Reordering Methodology Experimental Results Conclusion

19 Virtual Machine Jikes RVM –VM written in Java –High performance –Timer based adaptive sampling –Dynamic optimization Experiment setup –Pseudo-adaptive –2 nd iteration [Eeckhout et al.]

20 Memory Management Memory Management Toolkit (MMTk) –Allocators and garbage collectors –Multi-space heap Boot image Large object space (LOS) Immortal space Experiment setup –Generational copying GC with 4M bounded nursery

21 Overhead: OOR Analysis Only BenchmarkBase Execution Time (sec) w/ only OOR Analysis (sec) Overhead jess % jack % raytrace % mtrt % javac % compress % pseudojbb % db % antlr % hsqldb % ipsixql % jython % ps-fun % Mean -0.19%

22 Detailed Experiments Separate application and GC time Vary thresholds for method heat Vary thresholds for cold basic blocks Three architectures –x86, AMD, PowerPC x86 Performance counter: –DL1, trace cache, L2, DTLB, ITLB

23 Performance javac

24 Performance db

25 Performance jython Is the improvement significant?

26 Phase Changes

Algorithm: Decay Field Heat 27 DECAY-HEAT(method) 1 for each fieldAccess in method do 2 if PotentiallyHot(fieldAccess)then 3 hotField  fieldAccess.field 4 class  hotField.instantiatingClass 5 class.hasHotField  true 6 for each field in class do 7 period  Now() – class.lastUpdate 8 decay  HI/(HI + period) 9 field.heat  field.heat * decay 10 if field.heat < LO then 11 field.heat = 0 12 hotField.heat  HI 13 class.lastUpdate  Now() Will the latest access pattern erase the earlier access pattern(s)? m1(){ for(… …){ … … a.b = … } m2(){ for(… …){ … … = a.c; } for(… …){ m1(); //GC works m2(); //GC works }

OOR w/o vs. w phase change 28 Almost all hot fields within an object are visited around the same time The standard benchmarks have few, if any, traversal order phases.

Copying Advantage (javac) 29 GenCopy vs. MS Mutator time? GC time? Total time?

A Possible Comparison 30 GenCopy vs. GenOOR ?

Discussion Any other solution to improve the locality while doing copying collection 31

32 Questions? Thank you!