Download presentation
Presentation is loading. Please wait.
Published byAustin Anthony Modified over 8 years ago
1
The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen
2
2 Introduction GC determines efficiency of Memory manage collection time Data locality Various garbage collectors Perform differently on different applications
3
3 GC Selection Selecting the best garbage collector for an execution Application-specific selection [Fitgerald & Tarditi: ISMM’00, Soman et al.: ISMM’04, Singer et al.: ISMM’07] Selecting a GC for each application Based on offline profilings
4
4 Influence of Inputs An important but under-explored dimension Determine the robustness of profiling-based selection Preliminarily covered previously [ Soman et al.: ISMM’04, Singer et al.: ISMM’07] Few inputs per application Different observations
5
5 Objective A comprehensive understanding of the influence of inputs on the selection of garbage collectors. A comprehensive understanding of the influence of inputs on the selection of garbage collectors.
6
6 Overview A systematic measurement 1580 inputs 10 programs 316,000 executions 5 garbage collectors 4 heap size ratios A statistical analysis to address indeterminism
7
7 Overview Findings Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable
8
8 Outline Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable
9
9 Measurement Environment Intel Xeon E5310 Linux 2.6.9 Jikes RVM 2.9.1 5 Garbage collectors (included in MMTK) GenCopy, GenMS, MarkSweep, RefCount, SemiSpace
10
10 Heap Size Ratio heap size min possible heap size 4 heap size ratios: 1, 2, 4, 8 The min possible heap size differs across applications, and inputs r =
11
11 Benchmarks Benchmark Min heap size (MB) Number of inputs Compress j 20-9818 Db j 16-31100 Mpegaudio j 16-2030 Mtrt j 15-49100 Bloat d 22-23976 Fop d 72-86224 Euler g 16-5514 MoDyn g 18-2115 MonteCarlo g 39-7430 Search g 21-218
12
12 Metrics End-to-end execution time Including start-up time No replay Challenge Non-determinism in performance JIT compilation Thread scheduling Noises from environment Average time? Min time? Max time?
13
13 Statistical Analysis Thanks to Georeges et al. [OOPSLA’07] 10 repetitive runs Compute confidence interval Student’s t-distribution 90%-confidence interval means the interval contains the true running time with 90% probability Interval overlap => Not significantly different in performance Interval overlap => Not significantly different in performance
14
14 Example { 22, 22.1, 21.9, 22.2, 21.8 } 20.5 23.5 22 GC1 (s) {21.1, 20.8, 20.7, 20.7, 22.8} 19.7 22.8 21.2 GC2 (s) Overlap => Not significantly different in performance Overlap => Not significantly different in performance
15
15 Outline Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable
16
16 Top Sets of GC A top set of collectors for an execution contains The collectors performing the best Their confidence intervals overlap with one another
17
17 Variations of Top Sets {GC2, GC3} {GC3} {GC2}
18
18 Mtrt in Detail
19
19 Implication Risk of profiling-based GC selection
20
20 Outline Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable
21
21 Coverage of a collector # of inputs that GC i is a top collector total number of inputs
22
22 Coverage
23
23 Risk of Using Top Collector
24
24 Implication Profiling on a spectrum of inputs and select the top collector. Is it enough?
25
25 Outline Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable
26
26 Coverage Changes
27
27 Implication Profiling on many inputs and multiple heap size ratios Select the top collector for each heap size ratio input sensitive heap size min possible heap size r =
28
28 Outline Measurement Methodology Statistical performance analysis Top collectors vary across inputs Cross-input consistency exists Heap size ratio matters Heap size ratio is predictable
29
29 Cross-Input Pred. Machine learning technique... Regression Trees minSize = f (input) Details in our paper.
30
30 Prediction Acc. BenchmarkGC1GC2GC3GC4GC5 Compress j 99.8 100 99.9 Db j 98.197.498.297.098.2 Mpegaudio j 10098.196.396.096.8 Mtrt j 86.190.587.490.590.7 Bloat d 99.910099.799.499.9 Fop d 98.297.296.697.798.3 Euler g 91.392.791.490.493.9 MoDyn g 98.699.098.199.398.6 MonteCarlo g 98.999.199.499.599.3 Search g 100 Average97.197.496.797.497.5
31
31 Conclusions Top garbage collector consistent across inputs for a fixed heap size ratio. But heap size ratio is input-sensitive. Cross-input adaptation is necessary for GC selection. The promise is suggested by the predictability of min heap size ratios.
32
32 Acknowledgement Steve Blackburn Anonymous reviewers NSF CSR & CCF
33
33 Questions? Feng Mao fmao@cs.wm.edu The College of William and Mary Mar 2009
34
34 Cluster Intervals Set1:{3, 1 } Set2:{2} Set2:{4, 5} Top set Confidence interval for each GC 13254 Execution time
35
35 FQA The practical use of this technique ? Profiling overhead and input coverage?
36
36 Diagram 1 ThemeGallery is a Design Digital Content & Contents mall developed by Guild Design Inc. 3 2
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.