Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Review: Search problem formulation
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Graphs COP Graphs  Train Lines Gainesville OcalaDeltona Daytona Melbourne Lakeland Tampa Orlando.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
Incorporating Generations into a Modern Reference Counting Garbage Collector Hezi Azatchi Advisor: Erez Petrank.
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
Object Field Analysis for Heap Space Optimization ISMM 2004 G. Chen, M. Kandemir, N. Vijaykrishnanan and M. J. Irwin The Pennsylvania State University.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Presented by: Ofer Kiselov & Omer Kiselov Supervised by: Dmitri Perelman Final Presentation.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
© 2005 IBM Corporation ISMM’06 Ottawa, Ontario, Canada June 10 th 2006 | ISMM’06 Ottawa, Ontario, Canada © 2006 IBM Corporation Improving Locality with.
Games with Chance Other Search Algorithms CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 3 Adapted from slides of Yoonsuck Choe.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
Mark and Split Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden.
Taking Off The Gloves With Reference Counting Immix
CS261 Data Structures DFS and BFS – Edge List Representation.
Tree-Based Density Clustering using Graphics Processors
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
CSC 213 – Large Scale Programming. Today’s Goals  Make Britney sad through my color choices  Revisit issue of graph terminology and usage  Subgraphs,
Understanding Parallelism-Inhibiting Dependences in Sequential Java Programs Atanas (Nasko) Rountev Kevin Van Valkenburgh Dacong Yan P. Sadayappan Ohio.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
Efficient Labeling Scheme for Scale-Free Networks The scheme in detailsPerformance of the scheme First we fix the number of hubs (to O(log(N))) and show.
Lecture 3: Uninformed Search
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
SPANNING TREES Lecture 21 CS2110 – Fall Nate Foster is out of town. NO 3-4pm office hours today!
Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines Kris Venstermans, Lieven Eeckhout, Koen De Bosschere Department of Electronics.
Sunpyo Hong, Hyesoon Kim
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Brian Williams, Fall 041 Analysis of Uninformed Search Methods Brian C. Williams Sep 21 st, 2004 Slides adapted from: Tomas Lozano Perez,
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Cork: Dynamic Memory Leak Detection with Garbage Collection
Seminar in automatic tools for analyzing programs with dynamic memory
ME 521 Computer Aided Design 15-Optimization
An Empirical Analysis of Java Performance Quality
Ravi Mangal Mayur Naik Hongseok Yang
Ulterior Reference Counting Fast GC Without The Wait
CPSC 531: System Modeling and Simulation
Correcting the Dynamic Call Graph Using Control Flow Constraints
Algorithms: Design and Analysis
Garbage Collection Advantage: Improving Program Locality
Program-level Adaptive Memory Management
Deployment Optimization of IoT Devices through Attack Graph Analysis
Presentation transcript:

Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel

ISMM Outline Is tracing GC ready for the many-core?  How the heap shape is related? Evaluating the heap shape scalability  Idealized Trace Utilization Improving the heap shape scalability  Solution 1: Reshaping with Shortcut References  Solution 2: Tracing with Speculative Roots Related work & conclusion

ISMM Is Tracing GC Ready for Many-core ? a Heap h e bd g c j f i k l m Roots GC tracing  Traverse lots of objects Sequential trace  Each live object is touched (BFS, DFS) Parallel trace  Load balancing  1K cores really soon

ISMM Can Heaps Spoil the Scalability? Heap 1 2 Roots 3 4M live objects  Single linked list Sequential trace  4M steps Parallel trace  Not any faster 4K 4M

ISMM Deep Object Graphs Can be Evil Object Depth Length of the minimal path from some root object Object-Graph Depth Maximal live object depth Heap Object Depths Example: Definition: How deep are object graphs of Java programs?  SpecJVM, Dacapo, SpecJBB  Instrumented BFS trace

ISMM NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times ,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer ,195 pmd Java code analyzer ,482 xalan Transforms XML into HTML ,476 Other 15 benchmarks128 Object-Graph Depths of Java Benchmarks

ISMM NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times ,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer ,195 pmd Java code analyzer ,482 xalan Transforms XML into HTML ,476 Other 15 benchmarks128 Object-Graph Depths of Java Benchmarks

ISMM Object-Graph Depths of Java Benchmarks NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times ,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer ,195 pmd Java code analyzer ,482 xalan Transforms XML into HTML ,476 Other 15 benchmarks128

ISMM Not all Deep Object Graphs are Evil Heap 1 2 Roots 3 4K Object-graph  1K same sized linked lists of 4K objects Sequential trace  4M steps Parallel trace  Scales well for up to 1K processors … 4K

ISMM Definition: Deep and Narrow Object Graphs are Evil Object Depths Distribution Amount of objects at different depths Example: Heap #objects Graphical Representation (Object-graph shape): depth # objects

ISMM Object-Graph Shapes of Java Benchmarks jython # objects depth xalan # objects

ISMM Object-Graph Shapes of Java Benchmarks bloat javac mtrt xalan pmd db hsqldb antlr jython jess jack lusearch depth (log 10) # objects (log 10)

ISMM The Idealized Trace Utilization Simulate the idealized traversal by N threads  Perfect load balancing  Perfect cache behavior  BFS traversal  Single time tick object scan During the traversal, count  Objects available to be scanned at every time tick  Processor slots: some are busy and some are wasted At the end, report the utilization (ITU) Total Scanned Objects Total Processor Slots * 100%

ISMM Idealized Trace Utilization Example Heap objects Time ticks Scanned objects 8 15 Total Scanned Objects Total Processor Slots * 100% ITU == 15 8*4 * 100% = 47 % 4 Tracers Core 1 Core 2 Core 3 Core 4

ISMM Graphical Representation 1. Simulate and compute 2. Draw the graph depth # objects

ISMM Worst Case ITU for Java Benchmarks

ISMM Average ITU for Java Benchmarks

ISMM What’s Next? Problematic heaps exist  javac, mtrt, pmd, bloat, xalan Can we improve the trace scalability without modifying the benchmarks?  Reshape with Shortcut References  Trace with Speculative Roots

ISMM Reshape with Shortcut References Heap 1 2 Roots 3 4 Sequential trace  16K steps New references are added  Invisible to the program  Useful for the tracers Parallel trace  Scales for 4 processors 4K 16K

ISMM Evaluation Prototype Devise a shortcut strategy  Where shortcuts are needed When the program is stopped for GC  Compute the Idealized Trace Utilization  Run the shortcuts adding algorithm  Compute the ITU for the modified heap Report  ITU improvement  Amount of shortcuts added

ISMM Shortcut Strategy and Parameters Identify candidate subgraphs  With at least size objects  With depth-to-size ratio no less than ratio Add shortcut to the root of the subgraph  Leading to the objects length pointers away  Next shortcut introduced not closer than distance pointers away Distance (2)Length (4) Size=5 Depth=4 Ratio=0.8

ISMM Results for SpecJVM mtrt ~ 500K of live objects Max shortcuts – 110 Avg shortcuts – 94 Size=50 Ratio=0.2 Length=50 Distance=25

ISMM Results for DaCapo xalan ~ 400K of live objects Max shortcuts – 888 Avg shortcuts – 536 Size=50 Ratio=0.2 Length=50 Distance=25

ISMM Results for DaCapo bloat ~ 400K of live objects Max shortcuts – 940 Avg shortcuts – 378 Size=50 Ratio=0.2 Length=50 Distance=25

ISMM Results for DaCapo pmd ~ 434K of live objects Max shortcuts – 5,874 Avg shortcuts – 432 Size=600 Ratio=0.1 Length=120 Distance=40

ISMM Results for SpecJVM javac ~ 383K of live objects Max shortcuts – 292 Avg shortcuts – 16 Size=500 Ratio=0.1 Length=100 Distance=50

ISMM Trace with Speculative Roots Heap Roots 4K 4M Sequential trace  16M steps Helper tracers  Pick random roots  Trace using custom colors Parallel trace  Scales for 4 processors

ISMM Speculative Trace Helper tracer  Pick up the root  Pick up the color, e.g. red  Trace; if blue object is discovered, mark blue as reachable from red Regular trace  Trace from root; if blue object is discovered, mark blue as live Complete trace  All colors reachable from live colors marked live  All objects marked by live colors survive the collection

ISMM Evaluation Prototype Useful helpers work  Live objects colored by live colors Wasted helpers work  Dead objects colored by dead colors Floating garbage  Dead objects colored by live colors a Heap h e bd g c j f i k l m 4 regular tracers, 4 helper tracers Speculative roots – random unmarked objects ITU before and after the colored trace

ISMM Limit the floating garbage Maximal amount of objects colored by a single color  Helpers must save discovered but not traced objects  Trace completion phase takes care of the saved fronts Make the random roots choices smarter  To avoid choosing dead objects  To reach deeper parts of the live object graph Filter for the recursive objects  Objects with referents of their own type

ISMM Results Lots of floating garbage  Even with the filter Hard to find good roots  Progressively harder as the live objects are getting marked Trace completion phase is complex  Can defeat the purpose Modest improvement in the Idealized Trace Utilization scores

ISMM Results for DaCapo xalan Worst case ITU improvement, with the random choices filter

ISMM Results for DaCapo bloat Worst case ITU improvement, with the random choices filter

ISMM Related Work Parallel Garbage Collection Folklore  There are heap structures that can foil any clever load balancing scheme Siebert ( ISMM’08)  Reported object graph depths for SpecJVM benchmarks  Proposed upper bound on the worst case scalability as a way to compute RT guarantees for the GC tracing Random tracing originally proposed by Click

ISMM Summary Studied the heap shape properties of Java benchmarks  Out of twenty considered benchmarks, five had not scalable heap shapes during the run Devised a measure to quantify the heap shape scalability  Idealized Trace Utilization Proposed, prototyped and evaluated two approaches to improve the tracing scalability  Reshaping with Shortcuts appears to be more promising than Tracing from Speculative Roots

ISMM Thank You!