Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.

Slides:



Advertisements
Similar presentations
© 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research,
Advertisements

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Java Implementation Arthur Sale & Saeid Nooshabadi The background to a Large Grant ARC Application.
JAVA Processors and JIT Scheduling. Overview & Literature n Formulation of the problem n JAVA introduction n Description of Caffeine * Literature: “Java.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
1 © NOKIA Nokia Research Center / Performance Data Collection: Hybrid Approach Edu Metz, Raimondas Lencevicius Software Performance Architecture.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
Why Events Are A Bad Idea (for high-concurrency servers) By Rob von Behren, Jeremy Condit and Eric Brewer.
The Use of Traces for Inlining in Java Programs Borys J. Bradel Tarek S. Abdelrahman Edward S. Rogers Sr.Department of Electrical and Computer Engineering.
Compilation Technology October 17, 2005 © 2005 IBM Corporation Software Group Reducing Compilation Overhead in J9/TR Marius Pirvu, Derek Inglis, Vijay.
Partial Method Compilation using Dynamic Profile Information John Whaley Stanford University October 17, 2001.
Previous finals up on the web page use them as practice problems look at them early.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
Embedded Java Research Geoffrey Beers Peter Jantz December 18, 2001.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
Chapter 3 Memory Management: Virtual Memory
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
7. Just In Time Compilation Prof. O. Nierstrasz Jan Kurs.
Lecture 10 : Introduction to Java Virtual Machine
The Jikes RVM | Ian Rogers, The University of Manchester | Dr. Ian Rogers Jikes RVM Core Team Member Research Fellow, Advanced.
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
Java Virtual Machine Case Study on the Design of JikesRVM.
[Tim Shattuck, 2006][1] Performance / Watt: The New Server Focus Improving Performance / Watt For Modern Processors Tim Shattuck April 19, 2006 From the.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
IBM Software Group, Compilation Technology © 2007 IBM Corporation Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Computer Science 313 – Advanced Programming Topics.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Assembly Code Optimization Techniques for the AMD64 Athlon and Opteron Architectures David Phillips Robert Duckles Cse 520 Spring 2007 Term Project Presentation.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
CS 3500 L Performance l Code Complete 2 – Chapters 25/26 and Chapter 7 of K&P l Compare today to 44 years ago – The Burroughs B1700 – circa 1974.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
Trace Fragment Selection within Method- based JVMs Duane Merrill Kim Hazelwood VEE ‘08.
CS 598 Scripting Languages Design and Implementation 14. Self Compilers.
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Operating Systems: Internals and Design Principles
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Lecture #20: Profiling NetBeans Profiler 6.0.
Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.
Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.
Dynamic Region Selection for Thread Level Speculation Presented by: Jeff Da Silva Stanley Fung Martin Labrecque Feb 6, 2004 Builds on research done by:
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Runtime Analysis of Hotspot Java Virtual Machine
The Simplest Heuristics May Be The Best in Java JIT Compilers
Capriccio – A Thread Model
CSCI1600: Embedded and Real Time Software
“just-in-time” compilation (JIT) technique prepared by - Harshada Hole
CSc 453 Interpreters & Interpretation
Adaptive Code Unloading for Resource-Constrained JVMs
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Adaptive Optimization in the Jalapeño JVM
Calpa: A Tool for Automating Dynamic Compilation
Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,
COMP755 Advanced Operating Systems
CSCI1600: Embedded and Real Time Software
CSc 453 Interpreters & Interpretation
Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab
Just In Time Compilation
Presentation transcript:

Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC

Talk overview Introduction: Background & Jalapeño JVM Adaptive Optimization System (AOS) Multi-level recompilation Miscellaneous issues Feedback-directed inlining Conclusion

Background Three waves of JVMs: – First: Compile method when first encountered; use fixed set of optimizations – Second: Determine hot methods dynamically and compile them with more advanced optimizations – Third: Feedback-directed optimizations Jalapeño JVM targets third wave, but current implementation is second wave

Jalapeño JVM Written in Java (core services precompiled to native code in boot image) Compiles at four levels: baseline, 0, 1, & 2 Compile-only strategy (no interpretation) Yield points for quasi-preemptive switching

Talk progress Introduction: Background & Jalapeño JVM Adaptive Optimization System (AOS) Multi-level recompilation Miscellaneous issues Feedback-directed inlining Conclusion

Adaptive Optimization System

AOS: Design “Distributed, asynchronous, object-oriented design” useful for managing lots of data, say authors Each successive pipeline (from raw data to compilation decisions) performs increasingly complex analysis on decreasing amounts of data

Talk progress Introduction: Background & Jalapeño JVM Adaptive Optimization System (AOS) Multi-level recompilation Other issues Feedback-directed inlining Conclusion

Multi-level recompilation

Multi-level recompilation: Sampling Sampling occurs on thread switch Thread switch triggered by clock interrupt Thread switch can occur only at yield points Yield points are method invocations and loop back edges Discussion: Is this approach biased?

Multi-level recompilation: Biased sampling Code with no method calls or back edges Short method Long method method call

Multi-level recompilation: Cost- benefit analysis Method m compiled at level i; estimate: – T i, expected time program will spend executing m if m not recompiled – C j, the cost of recompiling m at optimization level j, for i ≤ j ≤ N. – T j, expected time program will spend executing method m if m recompiled at level j. – If, for best j, C j + T j < T i, recompile m at level j.

Multi-level recompilation: Cost- benefit analysis (continued) Estimate T i : T i = T f * P m T f is the future running time of the program We estimate that the program will run for as long as it has run so far

Multi-level recompilation: Cost- benefit analysis (continued) P m is the percentage of T f spent in m P m estimated from sampling Sample frequencies decay over time. – Why is this a good idea? – Could it be a disadvantage in certain cases?

Multi-level recompilation: Cost- benefit analysis (continued) Statically-measured speedups S i and S i used to determine T j : T j = T i * S i / S j – Statically-measured speedups?! – Is there any way to do better?

Multi-level recompilation: Cost- benefit analysis (continued) C j (cost of recompilation) estimated using a linear model of speed for each optimization level: C j = a j * size(m), where a j = constant for level j Is it reasonable to assume a linear model? OK to use statically-determined a j ?

Multi-level recompilation: Results

Multi-level recompilation: Results (continued)

Multi-level recompilation: Discussion Adaptive multi-level compilation does better than JIT at any level in short term. But in the long run, performance is slightly worse than JIT compilation. The primary target is server applications, which tend to run for a long time.

Multi-level recompilation: Discussion (continued) So what’s so great about Jalapeño’s AOS? – Current AOS implementation gives good results for both short and long term – JIT compiler can’t do both cases well because optimization level is fixed. – The AOS can be extended to support feedback- directed optimizations such as fragment creation (i.e., Dynamo) determining if an optimization was effective

Talk progress Introduction: Background & Jalapeño JVM Adaptive Optimization System (AOS) Multi-level recompilation Miscellaneous issues Feedback-directed inlining Conclusion

Miscellaneous issues: Multiprocessing Authors say that if a processor is idle, recompilation can be done almost for free. – Why almost for free? – Are there situations when you could get free recompilation on a uniprocessor?

Miscellaneous issues: Models vs. heuristics Authors moving toward “analytic model of program behavior” and elimination of ad-hoc tuning parameters. Tuning parameters proved difficult because of “unforeseen differences in application behavior.” Is it believable that ad-hoc parameters can be eliminated and replaced with models?

Miscellaneous issues: More intrusive optimizations The future of Jalapeño is more intrusive optimizations, such as compiler-inserted instrumentation for profiling Advantages and disadvantages compared with current system? – Advantages: Performance gains in the long term Adjusts to phased behavior – Disadvantages: Unlike with sampling, you can’t profile all the time Harder to adaptively throttle overhead

Miscellaneous: Stack frame rewriting In the future, Jalapeño will support rewriting of a baseline stack frame with an optimized stack frame Authors say that rewriting an optimized stack frame with an optimized stack frame is more difficult? – Why?

Talk progress Introduction: Background & Jalapeño JVM Adaptive Optimization System (AOS) Multi-level recompilation Miscellaneous issues Feedback-directed inlining Conclusion

Feedback-directed inlining

Feedback-directed inlining: More cost-benefit analysis Boost factor estimated: – Boost factor b is a function of 1. The fraction f of dynamic calls attributed to the call edge in the sampling-approximated call graph 2. Estimate s of the benefit (i.e., speedup) from eliminating virtually all calls from the program – Presumably something like b = f * s.

Feedback-directed inlining: Results Why?

Talk progress Introduction: Background & Jalapeño JVM Adaptive Optimization System (AOS) Multi-level recompilation Other issues Feedback-directed inlining Conclusion

AOS designed to support feedback-directed optimizations (third wave) Current AOS implementation only supports selective optimizations (second wave) – Improves short-term performance without hurting long term – Uses mix of cost-benefit model and ad-hoc methods. Future work will use more intrusive performance monitoring (e.g., instrumentation for path profiling, checking that an optimization improved performance)