We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJorden Branson
Modified over 2 years ago
© 2009 IBM Corporation1 Feedback Directed Dynamic Recompilation for Statically Compiled Languages Dorit Nuzman, Sergei Dyshel, Revital Eres IBM Research, Haifa Thematic Session on Dynamic Compilation HiPEAC Computing Systems Week Paris, May 3 rd 2013
© 2009 IBM Corporation2 Motivating Scenario (IBMs) customer Independent Software Vendor Computer System Vendor (e.g., IBM) Third party software owned by some ISV Power780 server Increase target platform level? Performance problem Increase optimization level? Apply feedback directed optimization? No Nope Cant do
© 2009 IBM Corporation3 Fat Binary Runtime Engine Profiler Intermediate Representation Dynamic execution stage Program Source Code Static Compiler Motivating Scenario (IBMs) customer Independent Software Vendor Computer System Vendor (e.g., IBM) Power780 server Performance problem Native machine code JIT compiler opt = -O2 arch = common no-profile
© 2009 IBM Corporation4 Fat Binary Runtime Engine Profiler Intermediate Representation Dynamic execution stage Program Source Code Static Compiler Motivating Scenario (IBMs) customer Independent Software Vendor Computer System Vendor (e.g., IBM) Power780 server Performance problem Native machine code JIT compiler opt = -O2 arch = common no-profile
© 2009 IBM Corporation5 Fat Binary Runtime Engine Profiler Intermediate Representation Dynamic execution stage Program Source Code Static Compiler selective profile-driven recompilation Native machine code JIT compiler Our approach: Fat Binary based, feedback-directed, dynamic recompilation Used for years in dynamic languages & Java Needed also for static languages Opposed to dynamic binary optimization: includes high-level semantic information allows aggressive, speculative transformations
© 2009 IBM Corporation6 Background Modern compilers provide sophisticated optimizations. O3 (O4, O5) Inter-procedural Auto-vect/par Feedback-directed Hardware-specific Complicates build process Prolongs development & testing cycle Requires per-customer tuning – too costly No representative input We can gain back the lost performance benefit by applying the optimizations dynamically, at runtime. These optimizations are usually not used. –Only in benchmarking and HPC
© 2009 IBM Corporation7 Dynamic Recompilation Solves the static-compiler usability issue –Transparent feedback-directed optimization for current workload. –Tuning for current hardware –Separation of optimization from software production Allows adaptive optimization. Allows iterative optimization. Virtualization & Cloud: physical resources known only at runtime, and continuously change
© 2009 IBM Corporation8 Other Approaches: Focus only on very long running programs with heavy workloads to compensate for time spent profiling. Focus on optimization across consecutive runs of repetitive programs Domain specific (focus on a specific optimization, to a small pre-selected part of the code) Trace-based binary-optimization …Our Goal: Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads. Dynamic Recompilation for Static Languages …Our Goal: Demonstrate an execution environment with overheads that are low enough to allow the dynamic optimizer to speed up execution of the current invocation, for regular programs/workloads.
© 2009 IBM Corporation9 Fat Binary Runtime Engine Profiler Split-IR Dynamic execution stage Program Source Code Static Compiler Native machine code JIT compiler Our approach: Fat Binary based, feedback-directed, dynamic recompilation
© 2009 IBM Corporation10 t0t0 t1t1 t2t2 t3t3 Execution and sampling thread t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 Instrumentation-based profiling sampling-based profiling for method hottness Original method version Instrumented method version Optimized method version Instrumentation Optimization Recompilation thread Runtime Monitoring and Recompilation timeline Startup cost (loading & mapping) monitoring overhead Recompilation cost Slow instrumented execution Synchronization cost
© 2009 IBM Corporation11 SPECint2006: Dynamic Optimization Overheads – ref dataset Overall not degrading performance. Stress test1: using highly statically-optimized executable (–O3 -qhot)
© 2009 IBM Corporation12 SPECint2006: Dynamic Optimization Overheads – train dataset Works also for very short running programs. Stress test2: using highly statically-optimized executable (–O3 -qhot) Currently limited gain from FDO alone.
© 2009 IBM Corporation13 Optimization effect (isolated from overheads) (1) Similar impact gained using sampled profile as with using a perfect profiles. the problem is not it the profile quality (2) offline optimizer applies link- time FDO (cross methods and modules). Our optimizer limited currently to single module
© 2009 IBM Corporation14 Fat Binary Runtime Engine Profiler Intermediate Representation Dynamic execution stage Program Source Code Static Compiler (IBMs) customer Independent Software Vendor Computer System Vendor (e.g., IBM) Power780 server Native machine code JIT compiler opt = -O2 arch = common no-profile programs are statically under-optimized / moderately-optimized
© 2009 IBM Corporation15 SPECint2006: Overall Effect of Dynamic Execution (ref) Overall 7% improvement on average moderately-optimized scenario (program statically compiled with –O2) Selected methods from the program dynamically recompiled using a higher optimization level.
© 2009 IBM Corporation16 Selected methods from the program dynamically recompiled using a higher optimization level. Recompilation Statistics Default recompilation mode (default method hotness threshold) Aggressive recompilation mode (lower method hotness threshold) moderately-optimized scenario (program statically compiled with –O2) Overall 7% improvement on average Overall 8% improvement on average
© 2009 IBM Corporation17 More Benchmarks: SQlite SQlite: –Static version compiled with default compiler options: -O2 warm. –Using 1G of TPC-H tables. (smallest dataset) –Using TPC-H queries: Stream of 13 instances of query #1 13% improvement from dynamic FDO Most improvement comes from higher optimization level.
© 2009 IBM Corporation18 Overall cost of runtime optimization environment, including – environment startup cost – recompilation – profiling overheads is less than 2% on average (SPECint2006) For highly optimized native binaries, on average, there is no overall degradation These low overheads imply that the fat-binary based approach is practical for real-world use-cases and workloads –Feedback directed optimization can easily surpass these costs Aggressive optimization level for selected methods at runtime brings up to 20% speedup, and an 8% average speedup Much more potential available: – more aggressive optimizations: loop-nest, memory-hierarchy, parallelization – more profiling (event based?) – more synergy with static compiler more synergy with underlying (virtual) environment, to adapt to changes Summary and Conclusions
© 2009 IBM Corporation19 Thematic Session on Dynamic Compilation 1) What is the dynamic optimization stage? During program execution 2) What triggers the dynamic compilation cycle? A method gets warm 3) How are these triggers being detected? sampling execution/PCs (via time interrupts & code instrumentation) to monitor application behavior 4) How/when are the above triggers being inserted? at run-time 5) What is the recompilation scope/granularity? method 6) What is the target application domain? general purpose/commercial applications 7) What is the input code for the dynamic optimization? fat-binary (binary + IR) 8) What is the programming language of the target applications? statically compiled languages (C/C++...) 9) What specific adaptation / optimization / code-transformation is applied? general feedback-directed optimizations (BB ordering, …)
© 2009 IBM Corporation Session one: 1."Asynchronous Dynamic Code Adaptation for Generic Data-Parallel Array Programming" Clemens Grelck (U. of Amsterdam)
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)
Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08.
Impact of Cloud Computing on Enterprise Architecture Perspectives, Best Practices, & Pitfalls David March 2009.
Object Oriented Databases by Adam Stevenson. Object Databases Became commercially popular in mid 1990’s Became commercially popular in mid 1990’s You.
Adaptive Optimization in the Jalapeño JVM Matthew Arnold Stephen Fink David Grove Michael Hind Peter F. Sweeney Source: UIUC.
Dynamic Optimization using ADORE Framework 10/22/2003 Wei Hsu Computer Science and Engineering Department University of Minnesota.
Aarhus University, 2005Esmertec AG1 Implementing Object-Oriented Virtual Machines Lars Bak & Kasper Lund Esmertec AG
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Dynamic Compilation II John Cavazos University.
Virtual Support for Dynamic Join Points C. Bockisch, M. Haupt, M. Mezini, K. Ostermann Presented by Itai Sharon
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Compilation Technology October 17, 2005 © 2005 IBM Corporation Software Group Reducing Compilation Overhead in J9/TR Marius Pirvu, Derek Inglis, Vijay.
Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Method Profiling John Cavazos University.
Performance Analysis and Optimization through Run-time Simulation and Statistics Philip J. Mucci University Of Tennessee
Dynamo: A Transparent Dynamic Optimization System Bala, Dueterwald, and Banerjia projects/Dynamo.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
A Binary Agent Technology for COTS Software Integrity Anant Agarwal Richard Schooler InCert Software.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
© 2010 IBM Corporation What computer architects need to know about memory throttling WEED 2010 June 20, 2010 IBM Research – Austin Heather Hanson Karthick.
Instrumentation in Software Dynamic Translators for Self-Managed Systems Bruce R. Childers Naveen Kumar, Jonathan Misurda and Mary.
Compilation Technology Oct. 16, 2006 © 2006 IBM Corporation Software Group Reducing Startup Costs of Java Applications with Shared Relocatable Code Derek.
Replay Compilation: Improving Debuggability of a Just-in Time Complier Presenter: Jun Tao.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Title of Selected Paper: IMPRES: Integrated Monitoring for Processor Reliability and Security Authors: Roshan G. Ragel and Sri Parameswaran Presented by:
IBM JIT Compilation Technology AOT Compilation in a Dynamic Environment for Startup Time Improvement Kenneth Ma Marius Pirvu Oct. 30, 2008.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
IBM Software Group, Compilation Technology © 2007 IBM Corporation Some Challenges Facing Effective Native Code Compilation in a Modern Just-In-Time Compiler.
Efficient Program Compilation through Machine Learning Techniques Gennady Pekhimenko IBM Canada Angela Demke Brown University of Toronto.
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
SE-292 High Performance Computing Profiling and Performance R. Govindarajan
Software & Services Group PinPlay: A Framework for Deterministic Replay and Reproducible Analysis of Parallel Programs Harish Patil, Cristiano Pereira,
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.
© 2009 IBM Corporation Extracting User Profiles from Large Scale Data Joint work with Michal Shmueli-Scheuer, Haggai Roitman, David Carmel and Yosi Mass.
Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.
ICS201 Lecture 10 : Introduction to Java Virtual Machine King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information.
© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Full and Para Virtualization Dr. Sanjay P. Ahuja, Ph.D FIS Distinguished Professor of Computer Science School of Computing, UNF.
Secure Embedded Processing through Hardware-assisted Run-time Monitoring Zubin Kumar.
© 2017 SlidePlayer.com Inc. All rights reserved.