The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.

The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen

2 Introduction  GC determines efficiency of  Memory manage collection time  Data locality  Various garbage collectors  Perform differently on different applications

3 GC Selection  Selecting the best garbage collector for an execution  Application-specific selection [Fitgerald & Tarditi: ISMM’00, Soman et al.: ISMM’04, Singer et al.: ISMM’07]  Selecting a GC for each application  Based on offline profilings

4 Influence of Inputs  An important but under-explored dimension  Determine the robustness of profiling-based selection  Preliminarily covered previously [ Soman et al.: ISMM’04, Singer et al.: ISMM’07] Few inputs per application Different observations

5 Objective A comprehensive understanding of the influence of inputs on the selection of garbage collectors. A comprehensive understanding of the influence of inputs on the selection of garbage collectors.

6 Overview  A systematic measurement  1580 inputs  10 programs  316,000 executions  5 garbage collectors  4 heap size ratios  A statistical analysis to address indeterminism

7 Overview  Findings  Top collectors vary across inputs  Cross-input consistency exists  Heap size ratio matters  Heap size ratio is predictable

8 Outline  Measurement  Methodology  Statistical performance analysis  Top collectors vary across inputs  Cross-input consistency exists  Heap size ratio matters  Heap size ratio is predictable

9 Measurement  Environment  Intel Xeon E5310  Linux 2.6.9  Jikes RVM 2.9.1  5 Garbage collectors (included in MMTK)  GenCopy, GenMS, MarkSweep, RefCount, SemiSpace

10 Heap Size Ratio heap size min possible heap size  4 heap size ratios: 1, 2, 4, 8  The min possible heap size differs across applications, and inputs r =

11 Benchmarks Benchmark Min heap size (MB) Number of inputs Compress j 20-9818 Db j 16-31100 Mpegaudio j 16-2030 Mtrt j 15-49100 Bloat d 22-23976 Fop d 72-86224 Euler g 16-5514 MoDyn g 18-2115 MonteCarlo g 39-7430 Search g 21-218

12 Metrics  End-to-end execution time  Including start-up time  No replay  Challenge  Non-determinism in performance JIT compilation Thread scheduling Noises from environment  Average time? Min time? Max time?

13 Statistical Analysis  Thanks to Georeges et al. [OOPSLA’07]  10 repetitive runs  Compute confidence interval  Student’s t-distribution  90%-confidence interval means the interval contains the true running time with 90% probability Interval overlap => Not significantly different in performance Interval overlap => Not significantly different in performance

14 Example { 22, 22.1, 21.9, 22.2, 21.8 } 20.5 23.5 22 GC1 (s) {21.1, 20.8, 20.7, 20.7, 22.8} 19.7 22.8 21.2 GC2 (s) Overlap => Not significantly different in performance Overlap => Not significantly different in performance

16 Top Sets of GC  A top set of collectors for an execution contains  The collectors performing the best  Their confidence intervals overlap with one another

17 Variations of Top Sets {GC2, GC3} {GC3} {GC2}

18 Mtrt in Detail

19 Implication  Risk of profiling-based GC selection

21 Coverage of a collector # of inputs that GC i is a top collector total number of inputs

22 Coverage

23 Risk of Using Top Collector

24 Implication  Profiling on a spectrum of inputs and select the top collector. Is it enough?

26 Coverage Changes

27 Implication  Profiling on many inputs and multiple heap size ratios  Select the top collector for each heap size ratio input sensitive heap size min possible heap size r =

29 Cross-Input Pred.  Machine learning technique... Regression Trees minSize = f (input) Details in our paper.

30 Prediction Acc. BenchmarkGC1GC2GC3GC4GC5 Compress j 99.8 100 99.9 Db j 98.197.498.297.098.2 Mpegaudio j 10098.196.396.096.8 Mtrt j 86.190.587.490.590.7 Bloat d 99.910099.799.499.9 Fop d 98.297.296.697.798.3 Euler g 91.392.791.490.493.9 MoDyn g 98.699.098.199.398.6 MonteCarlo g 98.999.199.499.599.3 Search g 100 Average97.197.496.797.497.5

31 Conclusions  Top garbage collector consistent across inputs for a fixed heap size ratio.  But heap size ratio is input-sensitive.  Cross-input adaptation is necessary for GC selection.  The promise is suggested by the predictability of min heap size ratios.

32 Acknowledgement  Steve Blackburn  Anonymous reviewers  NSF CSR & CCF

33 Questions? Feng Mao fmao@cs.wm.edu The College of William and Mary Mar 2009

34 Cluster Intervals Set1:{3, 1 } Set2:{2} Set2:{4, 5} Top set Confidence interval for each GC 13254 Execution time

35 FQA  The practical use of this technique ?  Profiling overhead and input coverage?

36 Diagram 1 ThemeGallery is a Design Digital Content & Contents mall developed by Guild Design Inc. 3 2

The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.

Similar presentations

Presentation on theme: "The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.

Similar presentations

Presentation on theme: "The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen."— Presentation transcript:

Similar presentations

About project

Feedback