Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove 15-745 Spring 2006.

Slides:



Advertisements
Similar presentations
CSCI 4717/5717 Computer Architecture
Advertisements

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Program Representations. Representing programs Goals.
JAVA Processors and JIT Scheduling. Overview & Literature n Formulation of the problem n JAVA introduction n Description of Caffeine * Literature: “Java.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
The Use of Traces for Inlining in Java Programs Borys J. Bradel Tarek S. Abdelrahman Edward S. Rogers Sr.Department of Electrical and Computer Engineering.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Previous finals up on the web page use them as practice problems look at them early.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Improving Code Generation Honors Compilers April 16 th 2002.
Lecture 1CS 380C 1 380C Last Time –Course organization –Read Backus et al. Announcements –Hadi lab Q&A Wed 1-2 in Painter 5.38N –UT Texas Learning Center:
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
380C Lecture 15 Where are we & where we are going –Global Analysis & Optimization Dataflow & SSA Constants, Expressions, Scheduling, Register Allocation.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Highly Parallel, Object-Oriented Computer Architecture (also the Jikes RVM and PearColator) Vienna University of Technology August 2 nd Dr. Ian Rogers,
7. Just In Time Compilation Prof. O. Nierstrasz Jan Kurs.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
Lecture 10 : Introduction to Java Virtual Machine
CSC 310 – Imperative Programming Languages, Spring, 2009 Virtual Machines and Threaded Intermediate Code (instead of PR Chapter 5 on Target Machine Architecture)
The Jikes RVM | Ian Rogers, The University of Manchester | Dr. Ian Rogers Jikes RVM Core Team Member Research Fellow, Advanced.
O VERVIEW OF THE IBM J AVA J UST - IN -T IME C OMPILER Presenters: Zhenhua Liu, Sanjeev Singh 1.
Adaptive Optimization with On-Stack Replacement Stephen J. Fink IBM T.J. Watson Research Center Feng Qian (presenter) Sable Research Group, McGill University.
Java Virtual Machine Case Study on the Design of JikesRVM.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
ABCD: Eliminating Array-Bounds Checks on Demand Rastislav Bodík Rajiv Gupta Vivek Sarkar U of Wisconsin U of Arizona IBM TJ Watson recent experiments.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation I John Cavazos University.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Targeted Path Profiling : Lower Overhead Path Profiling for Staged Dynamic Optimization Systems Rahul Joshi, UIUC Michael Bond*, UT Austin Craig Zilles,
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
A Region-Based Compilation Technique for a Java Just-In-Time Compiler Toshio Suganuma, Toshiaki Yasue and Toshio Nakatani Presenter: Ioana Burcea.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
Duke CPS Java: make it run, make it right, make it fast (see Byte, May 1998, for more details) l “Java isn’t fast enough for ‘real’ applications”
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
CHAPTER 1 INTRODUCTION TO COMPILER SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Code Optimization.
Compositional Pointer and Escape Analysis for Java Programs
CS 153: Concepts of Compiler Design November 28 Class Meeting
CSc 453 Interpreters & Interpretation
Adaptive Code Unloading for Resource-Constrained JVMs
Inlining and Devirtualization Hal Perkins Autumn 2011
Inlining and Devirtualization Hal Perkins Autumn 2009
Correcting the Dynamic Call Graph Using Control Flow Constraints
Adaptive Optimization in the Jalapeño JVM
Garbage Collection Advantage: Improving Program Locality
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
CSc 453 Interpreters & Interpretation
JIT Compiler Design Maxine Virtual Machine Dhwani Pandya
Just In Time Compilation
Presentation transcript:

Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006

Jalapeño JVM Research JVM developed at IBM T.J. Watson Research Center Extensible system architecture based on federation of threads that communicate asynchronously Supports adaptive multi-level optimization with low overhead –Statistical sampling

Contributions Extensible adaptive optimization architecture that enables online feedback-directed optimization Adaptive optimization system that uses multiple optimization levels to improve performance Implementation and evaluation of feedback-directed inlining based on low-overhead sample data Doesn’t require programmer directives

Jalapeño JVM - Details Written in Java –Optimizations applied not only to application and libraries, but to JVM itself –Boot Strapped Boot image contains core Jalapeño services precompiled to machine code Doesn’t need to run on top of another JVM Subsystems –Dynamic Class Loader –Dynamic Linker –Object Allocator –Garbage Collector –Thread Scheduler –Profiler Online measurement system –2 Compilers

Jalapeño JVM - Details 2 Compilers –Baseline Translates bytecodes directly into native code by simulating Java’s operand stack No register allocation –Optimizing Compiler Linear scan register allocation Converts bytecodes into IR, which it uses for optimizations Compile-only –Compiles all methods to native code before execution –3 levels of optimization –…

Jalapeño JVM - Details Optimizing Compiler (without online feedback) –Level 0: Optimizations performed during conversion Copy, Constant, Type, Non-Null propagation Constant folding, arithmetic simplification Dead code elimination Inlining Unreachable code elimination Eliminate redundant null checks … –Level 1: Common Subexpression Elimination Array bounds check elimination Redundant load elimination Inlining (size heuristics) Global flow-insensitive copy and constant propagation, dead assignment elimination Scalar replacement of aggregates and short arrays

Jalapeño JVM - Details Optimizing Compiler (without online feedback) –Level 2 SSA based flow sensitive optimizations Array SSA optimizations

Jalapeño JVM - Details

Jalapeño Adaptive Optimization System (AOS) Sample based profiling drives optimized recompilation Exploit runtime information beyond the scope of a static model Multi-level and adaptive optimizations –Balance optimization effectiveness with compilation overhead to maximize performance 3 Component Subsystems (Asynchronous threads) –Runtime Measurement –Controller –Recompilation –Database (3+1 = 3 ?)

Jalapeño Adaptive Optimization System (AOS)

Subsystems – Runtime Measurement Sample driven program profile –Instrumentation –Hardware monitors –VM instrumentation –Sampling Timer interrupts trigger yields between threads Method-associative counters updated at yields –Triggers controller at threshold levels Data processed by organizers –Hot method organizer Tells controller the time dominant methods that aren’t fully optimized –Decay organizer Decreases sample weights to emphasize recent data

Hotness A hot method is where the program spends a lot of its time Hot edges are used later on to determine good function calls to inline In both cases, hotness is a function of the number of samples that are taken –In a method –In a given callee from a given caller The system can adaptively adjust hotness thresholds –To reduce optimization in startup –To encourage optimization of more methods –To reduce analysis time when too many methods are hot

Subsystems – Controller Orchestrates and conducts the other components of AOS –Directs data monitoring –Creates organizer threads –Chooses to recompile based on data and cost/benefit model

To recompile or not to recompile? Find j that minimizes expected future running time of recompiled m If, recompile m at level j Assume, arbitrarily, that program will run for twice its current duration, Pm is estimated percentage of future time Subsystems – Controller

System estimates effectiveness of optimization levels as constant based on offline measurements Uses linear model of compilation speed for each optimization level as function of method size –Linearity of higher level optimizations? Subsystems – Controller

Subsystems – Recompilation In theory –Multiple compilation threads that invoke compilers –Can occur in parallel to the application In practice –Single compilation thread Some JVM services require the master lock –Multiple compilation threads are not effective –Lock contention between compilation and application threads –Left as a footnote! Recompilation times are stored to improve time estimates in cost/benefit analysis

Feedback-Directed Inlining Statistical samples of method calls used to build dynamic call graph –Traverse call stack at yields Identify hot edges –Recompile caller methods with inlined callee (even if the caller was already optimized) Decay old edges Adaptive Inlining Organizer –Determine hot edges and hot methods worth recompiling with inlined method call –Weight inline rules with boost factor Based on number of calls on call edge and previous study on effects of removing call overhead Future work: more sophisticated heuristic Seems obvious: new inline optimizations don’t eliminate old inlines

Experimental Methodology System –Dual 333MHz PPC processors, 1 GB memory Timer interrupts at 10 ms intervals Recompilation organizer 2 times per second to 1 time every 4s DCG and adaptive inline organizer every 2.5 seconds Method sample half life 1.7 seconds Edge weight half life 7.3 seconds SPECjvm98 Jalapeño Optimizing Compiler Volano chat room simulator Startup and Steady-State measurements

Results Compile time overhead plays large role in startup

Results Multilevel Adaptive does well (and JIT’s don’t have overhead)

Results Startup doesn’t reach high enough optimization level to benefit

Questions Assuming execution time will be twice the current duration is completely arbitrary, but has nice outcome (less optimization at startup, more at steady state) Meaningless measurements of optimizations vs. phase shifts –Due to execution time estimation

Questions Does it scale? –More online-feedback optimizations More threads needing cycles –Organizer threads –Recompilation threads More data to measure Especially slow if there can only be one recompilation thread More complicated cost/benefit analysis –Potential speed ups and estimate compilation times

Questions