Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century May 4th 2017 Ben Lenard.

Similar presentations


Presentation on theme: "Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century May 4th 2017 Ben Lenard."— Presentation transcript:

1 Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century
May 4th 2017 Ben Lenard

2 Introduction Methodology is the foundation determining if an experiment yielded good or bad results Like anything else in life, methodology namely, needs to be inline with the current technologies Article compared this to the testing methods for c/c++ vs Java and how outdated benchmarks can provide the wrong conclusions DaCapo is a suite of benchmarking for Java

3 Workload Design and Use
DaCapo was created in 2003 after they pointed out to a NSF panel the need for realistic Java benchmarks Despite being providing additional NSF funds, the group continued to develop the benchmark suite since the current benchmarks are dated Relevant and diverse workload: Wide range of current applications Suitable for research: controlled and easy to use

4 Relevance and Diversity
The authors used ‘real world’ applications, such as Eclipse – a Java IDE The DaCapo suite was able to run repeatable runs with various parameters; each run was about a minute In addition to standard metrics, the authors also collected metrics about the Java heap such as allocation rate, GC, and growth

5 Suitable for Research Easy to control workloads
Easy to use instrumentation / packaging to encourage use and the ease in ability to make multiple runs The ability to use one host and not a whole infrastructure

6 The Researcher/ Do Not Cherry-Pick!
Workloads need to be relevant to the experiment and if one does exist create one with a consortium A well designed benchmark reflects a range of behaviors for an application, and all results should be shown so ideas are not skewed.

7 Experimental Design / Gaming Your Results
In addition to selecting a baseline when conducting an experiment, one most also identify the parameters that have relevance in the experiment Make sure your results don’t mislead people For example the authors cited that people compare Java Garbage Collection without comparing different heap sizes.

8 Control in a Changing World
In C/C++ and Fortran, most important variables are the host and compiler and runtime libraries In Java you have more variables: Heap size and its parameters Warm up of the JVM or runtime environment Nondeterminism The Java/JIT compiler itself

9 A Case Study The authors designed a study to evaluate garbage collection in a JVM The space-time tradeoff in the heap The relationship between the collector and the application itself Meaningful Baseline – this is needed to make sure the study is ‘apples-to-apples’ Host Platform - architecture-dependent performance properties Language Runtime – libraries and JIT compiler behave differently and should be controlled

10 A Case Study (cont) Heap size – Since the authors are studying GC different heap sizes should be used since GC can behavior differently Warm-up – As more iterations occur, less compiling and loading will occur yielding better results Controlling Nondeterminism: use deterministic replay of optimization plans take multiple measurements in a single JVM invocation, after being warm generate sufficient data points and apply suitable statistical analysis

11 Analysis Data analysis is:
Looking at repeated experiments to defeat experimental noise Looking at diverse experiments to draw conclusions

12 Conclusion Sound methodology relies on:
relevant workloads the use of principled experimental design rigorous analysis. The underlying point of the article is to control variables within the experiment's’ environment


Download ppt "Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century May 4th 2017 Ben Lenard."

Similar presentations


Ads by Google