Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.

Similar presentations


Presentation on theme: "Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel."— Presentation transcript:

1 Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel

2 ISMM 20102 Outline Is tracing GC ready for the many-core?  How the heap shape is related? Evaluating the heap shape scalability  Idealized Trace Utilization Improving the heap shape scalability  Solution 1: Reshaping with Shortcut References  Solution 2: Tracing with Speculative Roots Related work & conclusion

3 ISMM 20103 Is Tracing GC Ready for Many-core ? a Heap h e bd g c j f i k l m Roots GC tracing  Traverse lots of objects Sequential trace  Each live object is touched (BFS, DFS) Parallel trace  Load balancing  1K cores really soon

4 ISMM 20104 Can Heaps Spoil the Scalability? Heap 1 2 Roots 3 4M live objects  Single linked list Sequential trace  4M steps Parallel trace  Not any faster 4K 4M

5 ISMM 20105 Deep Object Graphs Can be Evil Object Depth Length of the minimal path from some root object Object-Graph Depth Maximal live object depth Heap 0 1 2 3 Object Depths Example: Definition: How deep are object graphs of Java programs?  SpecJVM, Dacapo, SpecJBB  Instrumented BFS trace

6 ISMM 20106 NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times 3215 1,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer 48344 1,195 pmd Java code analyzer 4859 18,482 xalan Transforms XML into HTML 128129 8,476 Other 15 benchmarks128 Object-Graph Depths of Java Benchmarks

7 ISMM 20107 NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times 3215 1,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer 48344 1,195 pmd Java code analyzer 4859 18,482 xalan Transforms XML into HTML 128129 8,476 Other 15 benchmarks128 Object-Graph Depths of Java Benchmarks

8 ISMM 20108 Object-Graph Depths of Java Benchmarks NameDescription Heap Size (MB) GC Cycles Max Depth SpecJVM javac Java compiler run 3 times 3215 1,234 mtrt 3D raytracer 328 1,416 Dacapo bloat Java byte code analyzer 48344 1,195 pmd Java code analyzer 4859 18,482 xalan Transforms XML into HTML 128129 8,476 Other 15 benchmarks128

9 ISMM 20109 Not all Deep Object Graphs are Evil Heap 1 2 Roots 3 4K Object-graph  1K same sized linked lists of 4K objects Sequential trace  4M steps Parallel trace  Scales well for up to 1K processors … 4K

10 ISMM 201010 Definition: Deep and Narrow Object Graphs are Evil Object Depths Distribution Amount of objects at different depths Example: Heap 2 4 3 1 1 #objects Graphical Representation (Object-graph shape): depth # objects

11 ISMM 201011 Object-Graph Shapes of Java Benchmarks jython # objects depth xalan # objects

12 ISMM 201012 Object-Graph Shapes of Java Benchmarks bloat javac mtrt xalan pmd db hsqldb antlr jython jess jack lusearch depth (log 10) # objects (log 10)

13 ISMM 201013 The Idealized Trace Utilization Simulate the idealized traversal by N threads  Perfect load balancing  Perfect cache behavior  BFS traversal  Single time tick object scan During the traversal, count  Objects available to be scanned at every time tick  Processor slots: some are busy and some are wasted At the end, report the utilization (ITU) Total Scanned Objects Total Processor Slots * 100%

14 ISMM 201014 Idealized Trace Utilization Example Heap objects Time ticks Scanned objects 8 15 Total Scanned Objects Total Processor Slots * 100% ITU == 15 8*4 * 100% = 47 % 4 Tracers 1212 2525 3939 4 11 5 12 6 13 7 14 Core 1 Core 2 Core 3 Core 4

15 ISMM 201015 Graphical Representation 1. Simulate and compute 2. Draw the graph depth # objects

16 ISMM 201016 Worst Case ITU for Java Benchmarks

17 ISMM 201017 Average ITU for Java Benchmarks

18 ISMM 201018 What’s Next? Problematic heaps exist  javac, mtrt, pmd, bloat, xalan Can we improve the trace scalability without modifying the benchmarks?  Reshape with Shortcut References  Trace with Speculative Roots

19 ISMM 201019 Reshape with Shortcut References Heap 1 2 Roots 3 4 Sequential trace  16K steps New references are added  Invisible to the program  Useful for the tracers Parallel trace  Scales for 4 processors 4K 16K

20 ISMM 201020 Evaluation Prototype Devise a shortcut strategy  Where shortcuts are needed When the program is stopped for GC  Compute the Idealized Trace Utilization  Run the shortcuts adding algorithm  Compute the ITU for the modified heap Report  ITU improvement  Amount of shortcuts added

21 ISMM 201021 Shortcut Strategy and Parameters Identify candidate subgraphs  With at least size objects  With depth-to-size ratio no less than ratio Add shortcut to the root of the subgraph  Leading to the objects length pointers away  Next shortcut introduced not closer than distance pointers away 1 6 5432987 Distance (2)Length (4) Size=5 Depth=4 Ratio=0.8

22 ISMM 201022 Results for SpecJVM mtrt ~ 500K of live objects Max shortcuts – 110 Avg shortcuts – 94 Size=50 Ratio=0.2 Length=50 Distance=25

23 ISMM 201023 Results for DaCapo xalan ~ 400K of live objects Max shortcuts – 888 Avg shortcuts – 536 Size=50 Ratio=0.2 Length=50 Distance=25

24 ISMM 201024 Results for DaCapo bloat ~ 400K of live objects Max shortcuts – 940 Avg shortcuts – 378 Size=50 Ratio=0.2 Length=50 Distance=25

25 ISMM 201025 Results for DaCapo pmd ~ 434K of live objects Max shortcuts – 5,874 Avg shortcuts – 432 Size=600 Ratio=0.1 Length=120 Distance=40

26 ISMM 201026 Results for SpecJVM javac ~ 383K of live objects Max shortcuts – 292 Avg shortcuts – 16 Size=500 Ratio=0.1 Length=100 Distance=50

27 ISMM 201027 Trace with Speculative Roots Heap Roots 4K 4M Sequential trace  16M steps Helper tracers  Pick random roots  Trace using custom colors Parallel trace  Scales for 4 processors

28 ISMM 201028 Speculative Trace Helper tracer  Pick up the root  Pick up the color, e.g. red  Trace; if blue object is discovered, mark blue as reachable from red Regular trace  Trace from root; if blue object is discovered, mark blue as live Complete trace  All colors reachable from live colors marked live  All objects marked by live colors survive the collection

29 ISMM 201029 Evaluation Prototype Useful helpers work  Live objects colored by live colors Wasted helpers work  Dead objects colored by dead colors Floating garbage  Dead objects colored by live colors a Heap h e bd g c j f i k l m 4 regular tracers, 4 helper tracers Speculative roots – random unmarked objects ITU before and after the colored trace

30 ISMM 201030 Limit the floating garbage Maximal amount of objects colored by a single color  Helpers must save discovered but not traced objects  Trace completion phase takes care of the saved fronts Make the random roots choices smarter  To avoid choosing dead objects  To reach deeper parts of the live object graph Filter for the recursive objects  Objects with referents of their own type

31 ISMM 201031 Results Lots of floating garbage  Even with the filter Hard to find good roots  Progressively harder as the live objects are getting marked Trace completion phase is complex  Can defeat the purpose Modest improvement in the Idealized Trace Utilization scores

32 ISMM 201032 Results for DaCapo xalan Worst case ITU improvement, with the random choices filter

33 ISMM 201033 Results for DaCapo bloat Worst case ITU improvement, with the random choices filter

34 ISMM 201034 Related Work Parallel Garbage Collection Folklore  There are heap structures that can foil any clever load balancing scheme Siebert ( ISMM’08)  Reported object graph depths for SpecJVM benchmarks  Proposed upper bound on the worst case scalability as a way to compute RT guarantees for the GC tracing Random tracing originally proposed by Click

35 ISMM 201035 Summary Studied the heap shape properties of Java benchmarks  Out of twenty considered benchmarks, five had not scalable heap shapes during the run Devised a measure to quantify the heap shape scalability  Idealized Trace Utilization Proposed, prototyped and evaluated two approaches to improve the tracing scalability  Reshaping with Shortcuts appears to be more promising than Tracing from Speculative Roots

36 ISMM 201036 Thank You!


Download ppt "Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel."

Similar presentations


Ads by Google