Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel
ISMM Outline Is tracing GC ready for the many-core? How the heap shape is related? Evaluating the heap shape scalability Idealized Trace Utilization Improving the heap shape scalability Solution 1: Reshaping with Shortcut References Solution 2: Tracing with Speculative Roots Related work & conclusion
ISMM Is Tracing GC Ready for Many-core ? a Heap h e bd g c j f i k l m Roots GC tracing Traverse lots of objects Sequential trace Each live object is touched (BFS, DFS) Parallel trace Load balancing 1K cores really soon
ISMM Can Heaps Spoil the Scalability? Heap 1 2 Roots 3 4M live objects Single linked list Sequential trace 4M steps Parallel trace Not any faster 4K 4M
ISMM Deep Object Graphs Can be Evil Object Depth Length of the minimal path from some root object Object-Graph Depth Maximal live object depth Heap Object Depths Example: Definition: How deep are object graphs of Java programs? SpecJVM, Dacapo, SpecJBB Instrumented BFS trace
ISMM NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times ,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer ,195 pmd Java code analyzer ,482 xalan Transforms XML into HTML ,476 Other 15 benchmarks128 Object-Graph Depths of Java Benchmarks
ISMM NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times ,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer ,195 pmd Java code analyzer ,482 xalan Transforms XML into HTML ,476 Other 15 benchmarks128 Object-Graph Depths of Java Benchmarks
ISMM Object-Graph Depths of Java Benchmarks NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times ,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer ,195 pmd Java code analyzer ,482 xalan Transforms XML into HTML ,476 Other 15 benchmarks128
ISMM Not all Deep Object Graphs are Evil Heap 1 2 Roots 3 4K Object-graph 1K same sized linked lists of 4K objects Sequential trace 4M steps Parallel trace Scales well for up to 1K processors … 4K
ISMM Definition: Deep and Narrow Object Graphs are Evil Object Depths Distribution Amount of objects at different depths Example: Heap #objects Graphical Representation (Object-graph shape): depth # objects
ISMM Object-Graph Shapes of Java Benchmarks jython # objects depth xalan # objects
ISMM Object-Graph Shapes of Java Benchmarks bloat javac mtrt xalan pmd db hsqldb antlr jython jess jack lusearch depth (log 10) # objects (log 10)
ISMM The Idealized Trace Utilization Simulate the idealized traversal by N threads Perfect load balancing Perfect cache behavior BFS traversal Single time tick object scan During the traversal, count Objects available to be scanned at every time tick Processor slots: some are busy and some are wasted At the end, report the utilization (ITU) Total Scanned Objects Total Processor Slots * 100%
ISMM Idealized Trace Utilization Example Heap objects Time ticks Scanned objects 8 15 Total Scanned Objects Total Processor Slots * 100% ITU == 15 8*4 * 100% = 47 % 4 Tracers Core 1 Core 2 Core 3 Core 4
ISMM Graphical Representation 1. Simulate and compute 2. Draw the graph depth # objects
ISMM Worst Case ITU for Java Benchmarks
ISMM Average ITU for Java Benchmarks
ISMM What’s Next? Problematic heaps exist javac, mtrt, pmd, bloat, xalan Can we improve the trace scalability without modifying the benchmarks? Reshape with Shortcut References Trace with Speculative Roots
ISMM Reshape with Shortcut References Heap 1 2 Roots 3 4 Sequential trace 16K steps New references are added Invisible to the program Useful for the tracers Parallel trace Scales for 4 processors 4K 16K
ISMM Evaluation Prototype Devise a shortcut strategy Where shortcuts are needed When the program is stopped for GC Compute the Idealized Trace Utilization Run the shortcuts adding algorithm Compute the ITU for the modified heap Report ITU improvement Amount of shortcuts added
ISMM Shortcut Strategy and Parameters Identify candidate subgraphs With at least size objects With depth-to-size ratio no less than ratio Add shortcut to the root of the subgraph Leading to the objects length pointers away Next shortcut introduced not closer than distance pointers away Distance (2)Length (4) Size=5 Depth=4 Ratio=0.8
ISMM Results for SpecJVM mtrt ~ 500K of live objects Max shortcuts – 110 Avg shortcuts – 94 Size=50 Ratio=0.2 Length=50 Distance=25
ISMM Results for DaCapo xalan ~ 400K of live objects Max shortcuts – 888 Avg shortcuts – 536 Size=50 Ratio=0.2 Length=50 Distance=25
ISMM Results for DaCapo bloat ~ 400K of live objects Max shortcuts – 940 Avg shortcuts – 378 Size=50 Ratio=0.2 Length=50 Distance=25
ISMM Results for DaCapo pmd ~ 434K of live objects Max shortcuts – 5,874 Avg shortcuts – 432 Size=600 Ratio=0.1 Length=120 Distance=40
ISMM Results for SpecJVM javac ~ 383K of live objects Max shortcuts – 292 Avg shortcuts – 16 Size=500 Ratio=0.1 Length=100 Distance=50
ISMM Trace with Speculative Roots Heap Roots 4K 4M Sequential trace 16M steps Helper tracers Pick random roots Trace using custom colors Parallel trace Scales for 4 processors
ISMM Speculative Trace Helper tracer Pick up the root Pick up the color, e.g. red Trace; if blue object is discovered, mark blue as reachable from red Regular trace Trace from root; if blue object is discovered, mark blue as live Complete trace All colors reachable from live colors marked live All objects marked by live colors survive the collection
ISMM Evaluation Prototype Useful helpers work Live objects colored by live colors Wasted helpers work Dead objects colored by dead colors Floating garbage Dead objects colored by live colors a Heap h e bd g c j f i k l m 4 regular tracers, 4 helper tracers Speculative roots – random unmarked objects ITU before and after the colored trace
ISMM Limit the floating garbage Maximal amount of objects colored by a single color Helpers must save discovered but not traced objects Trace completion phase takes care of the saved fronts Make the random roots choices smarter To avoid choosing dead objects To reach deeper parts of the live object graph Filter for the recursive objects Objects with referents of their own type
ISMM Results Lots of floating garbage Even with the filter Hard to find good roots Progressively harder as the live objects are getting marked Trace completion phase is complex Can defeat the purpose Modest improvement in the Idealized Trace Utilization scores
ISMM Results for DaCapo xalan Worst case ITU improvement, with the random choices filter
ISMM Results for DaCapo bloat Worst case ITU improvement, with the random choices filter
ISMM Related Work Parallel Garbage Collection Folklore There are heap structures that can foil any clever load balancing scheme Siebert ( ISMM’08) Reported object graph depths for SpecJVM benchmarks Proposed upper bound on the worst case scalability as a way to compute RT guarantees for the GC tracing Random tracing originally proposed by Click
ISMM Summary Studied the heap shape properties of Java benchmarks Out of twenty considered benchmarks, five had not scalable heap shapes during the run Devised a measure to quantify the heap shape scalability Idealized Trace Utilization Proposed, prototyped and evaluated two approaches to improve the tracing scalability Reshaping with Shortcuts appears to be more promising than Tracing from Speculative Roots
ISMM Thank You!