380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

1 Wake Up and Smell the Coffee: Performance Analysis Methodologies for the 21st Century Kathryn S McKinley Department of Computer Sciences University of.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.
Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science CRAMM: Virtual Memory Support for Garbage-Collected Applications Ting Yang, Emery.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
Taking Off The Gloves With Reference Counting Immix
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Connectivity-Based Garbage Collection Martin Hirzel University of Colorado at Boulder Collaborators: Amer Diwan, Michael Hind, Hal Gabow, Johannes Henkel,
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Lecture 10 : Introduction to Java Virtual Machine
The Jikes RVM | Ian Rogers, The University of Manchester | Dr. Ian Rogers Jikes RVM Core Team Member Research Fellow, Advanced.
Oct Using Platform-Specific Performance Counters for Dynamic Compilation Florian Schneider and Thomas Gross ETH Zurich.
Adaptive Optimization with On-Stack Replacement Stephen J. Fink IBM T.J. Watson Research Center Feng Qian (presenter) Sable Research Group, McGill University.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
Java Virtual Machine Case Study on the Design of JikesRVM.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
CS380 C lecture 20 Last time –Linear scan register allocation –Classic compilation techniques –On to a modern context Today –Jenn Sartor –Experimental.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn,
Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines Kris Venstermans, Lieven Eeckhout, Koen De Bosschere Department of Electronics.
Polar Opposites: Next Generation Languages & Architectures Kathryn S McKinley The University of Texas at Austin.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Dynamic Compilation Vijay Janapa Reddi
Cork: Dynamic Memory Leak Detection with Garbage Collection
Rifat Shahriyar Stephen M. Blackburn Australian National University
Approaches to Reflective Method Invocation
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Correcting the Dynamic Call Graph Using Control Flow Constraints
Adaptive Optimization in the Jalapeño JVM
Garbage Collection Advantage: Improving Program Locality
CSc 453 Interpreters & Interpretation
Program-level Adaptive Memory Management
CMPE 152: Compiler Design May 2 Class Meeting
Presentation transcript:

380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality on-the-fly –Other opportunities? –Why you need to care about workloads –Alias analysis –Dependence analysis –Loop transformations –EDGE architectures 1 CS380C Lecture 19

2 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM) CS380C Lecture 19

3 Today: Advanced Topics Generational Garbage Collection Copying objects is an opportunity Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT), J Eliot B Moss (UMass), Zhenlin Wang (MTU), Perry Cheng (IBM), “The Garbage Collection Advantage: Improving Program Locality,” OOPSLA CS380C Lecture 19

4 Motivation Memory gap problem OO programs become more popular OO programs exacerbates memory gap problem –Automatic memory management –Pointer data structures –Many small methods Goal: improve OO program locality CS380C Lecture 19

5 Allocation Mechanisms Fast (increment & bounds check) contemporaneous object locality 8 Can't incrementally free & reuse: must free en masse Bump-Pointer CS380C Lecture 19

6 Allocation Mechanisms Fast (increment & bounds check) contemporaneous object locality 8 Can't incrementally free & reuse: must free en masse Bump-Pointer CS380C Lecture 19

7 Allocation Mechanisms Fast (increment & bounds check) contemporaneous object locality 8 Can't incrementally free & reuse: must free en masse Bump-PointerFree-List 8 Slightly slower (consult list for fit) 8 Mystery locality Can incrementally free & reuse cells CS380C Lecture 19

8 State-of-the-art throughput Copying Generational GC Requirements –write-barrier to track inter-generation pointers remsets, cards –copy reserve Advantages: –Minimizes copying of older objects –Compaction of long-lived objects Problems: –Not very incremental –Very youngest objects always copied –What order should GC use to copy objects? etc. etc … ‘nursery’‘older generation’ CS380C Lecture 19

9 Opportunity Generational copying garbage collector reorders objects at runtime CS380C Lecture 19

Copying of Linked Objects Breadth First CS380C Lecture 19

Copying of Linked Objects Breadth First Depth First CS380C Lecture 19

Copying of Linked Objects Depth First Online Object Reordering 1 4 Breadth First CS380C Lecture 19

13 Outline Motivation Online Object Reordering (OOR) Methodology Experimental Results Conclusion CS380C Lecture 19

14 Cache Performance Matters CS380C Lecture 19

15 Online Object Reordering Where are the cache misses? How to identify hot field accesses at runtime? How to reorder the objects? CS380C Lecture 19

16 Where Are The Cache Misses? VM ObjectsStack Older Generation Heap structure: Nursery Not to scale CS380C Lecture 19

17 Where Are The Cache Misses? CS380C Lecture 19

18 Where Are The Cache Misses? Two opportunities to reorder objects in the older generation –Promote nursery objects –Full heap collection CS380C Lecture 19

19 How to Find Hot Fields? Runtime info (intercept every read)? Compiler analysis? Runtime information + compiler analysis Key: Low overhead estimation CS380C Lecture 19

20 Which Classes Need Reordering? Step 1: Compiler analysis –Excludes cold basic blocks –Identifies field accesses Step 2: JIT adaptive sampling identifies hot methods –Mark as hot field accesses in hot methods Key: Low overhead estimation CS380C Lecture 19

21 Example: Compiler Analysis Compiler Hot BB Collect access info Cold BB Ignore Compiler Access List: 1. A.b 2. …. …. Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } CS380C Lecture 19

22 Example: Adaptive Sampling Method Foo { Class A a; try { …=a.b; … } catch(Exception e){ …a.c } Adaptive Sampling Foo is hot Foo Accesses: 1. A.b 2. …. …. A.b is hot A B b ….. c A’s type information cb CS380C Lecture 19

Copying of Linked Objects Online Object Reordering Type Information Hot space Cold space CS380C Lecture 19

24 OOR System Overview Baseline Compiler Source Code Executing Code Adaptive Sampling Optimizing Compiler Hot Methods Access Info Database Register Hot Field Accesses Look Up Adds Entries GC: Copies Objects Affects Locality Advice GC: Copies Objects OOR addition JikesRVM componentInput/Output Optimizing Compiler Adaptive Sampling Improves Locality CS380C Lecture 19

25 Outline Motivation Online Object Reordering Methodology Experimental Results Conclusion CS380C Lecture 19

26 Methodology: Virtual Machine Jikes RVM –VM written in Java –High performance –Timer based adaptive sampling –Dynamic optimization Experiment setup –Pseudo-adaptive –2 nd iteration [Eeckhout et al.] CS380C Lecture 19

27 Methodology: Memory Management Memory Management Toolkit (MMTk): –Allocators and garbage collectors –Multi-space heap Boot image Large object space (LOS) Immortal space Experiment setup –Generational copying GC with 4M bounded nursery CS380C Lecture 19

28 Overhead: OOR Analysis Only BenchmarkBase Execution Time (sec) w/ only OOR Analysis (sec) Overhead jess % jack % raytrace % mtrt % javac % compress % pseudojbb % db % antlr % hsqldb % ipsixql % jython % ps-fun % Mean -0.19% CS380C Lecture 19

29 Detailed Experiments Separate application and GC time Vary thresholds for method heat Vary thresholds for cold basic blocks Three architectures –x86, AMD, PowerPC x86 Performance counter: –DL1, trace cache, L2, DTLB, ITLB CS380C Lecture 19

30 Performance javac CS380C Lecture 19

31 Performance db CS380C Lecture 19

32 Performance jython Any static ordering leaves you vulnerable to pathological cases. CS380C Lecture 19

33 Phase Changes CS380C Lecture 19

34 Related Work Evaluate static orderings [Wilson et al.] –Large performance variation Static profiling [Chilimbi et al., and others] –Lack of flexibility Instance-based object reordering [Chilimbi et al.] –Too expensive CS380C Lecture 19

35 Conclusion Static traversal orders have up to 25% variation OOR improves or matches best static ordering OOR has very low overhead Past predicts future CS380C Lecture 19

380C Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads & methodology Read: Blackburn et al., Wake Up and Smell the Coffee: Evaluation Methodology for the 21 st Century, ACM CACM, 51(8): , August, –Alias analysis –Dependence analysis –Loop transformations –EDGE architectures 36 CS380C Lecture 19