Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

Part IV: Memory Management
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Target Code Generation
Instruction Set Design
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.
C Chuen-Liang Chen, NTUCS&IE / 321 OPTIMIZATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.
JAVA Processors and JIT Scheduling. Overview & Literature n Formulation of the problem n JAVA introduction n Description of Caffeine * Literature: “Java.
Hastings Purify: Fast Detection of Memory Leaks and Access Errors.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Memory Management Memory Areas and their use Memory Manager Tasks:
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Run time vs. Compile time
Assembly Language for Intel-Based Computers Chapter 2: IA-32 Processor Architecture Kip Irvine.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Catriel Beeri Pls/Winter 2004/5 environment 68  Some details of implementation As part of / extension of type-checking: Each declaration d(x) associated.
2005 International Symposium on Code Generation and Optimization Progressive Register Allocation for Irregular Architectures David Koes
The environment of the computation Declarations introduce names that denote entities. At execution-time, entities are bound to values or to locations:
1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
1 Software Testing and Quality Assurance Lecture 31 – SWE 205 Course Objective: Basics of Programming Languages & Software Construction Techniques.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
Addressing Modes Chapter 11 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
Just-In-Time Java Compilation for the Itanium Processor Tatiana Shpeisman Guei-Yuan Lueh Ali-Reza Adl-Tabatabai Intel Labs.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 The Java Virtual Machine Yearly Programming Project.
High level & Low level language High level programming languages are more structured, are closer to spoken language and are more intuitive than low level.
Dr. José M. Reyes Álamo 1.  The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access ◦ Variables ◦ Arrays ◦
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 8: Main Memory.
CS 403: Programming Languages Lecture 2 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.
What’s in an optimizing compiler?
Lecture 10 : Introduction to Java Virtual Machine
O VERVIEW OF THE IBM J AVA J UST - IN -T IME C OMPILER Presenters: Zhenhua Liu, Sanjeev Singh 1.
1 Introduction to JVM Based on material produced by Bill Venners.
Code Generation Gülfem Savrun Yeniçeri CS 142 (b) 02/26/2013.
CSc 453 Runtime Environments Saumya Debray The University of Arizona Tucson.
CMPE 511 Computer Architecture A Faster Optimal Register Allocator Betül Demiröz.
The x86 Architecture Lecture 15 Fri, Mar 4, 2005.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
Addressing Modes Chapter 6 S. Dandamudi To be used with S. Dandamudi, “Introduction to Assembly Language Programming,” Second Edition, Springer,
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Compiler Construction Code Generation Activation Records
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
RealTimeSystems Lab Jong-Koo, Lim
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
More Code Generation and Optimization Pat Morin COMP 3002.
A Closer Look at Instruction Set Architectures
Compiler Construction (CS-636)
Optimization Code Optimization ©SoftMoore Consulting.
Unit IV Code Generation
CSc 453 Interpreters & Interpretation
8 Code Generation Topics A simple code generator algorithm
Systems Architecture I (CS ) Lecture 5: MIPS Instruction Set*
RUN-TIME STORAGE Chuen-Liang Chen Department of Computer Science
CSc 453 Interpreters & Interpretation
(via graph coloring and spilling)
Presentation transcript:

Fast, Effective Code Generation in a Just-In-Time Java Compiler Rejin P. James & Roshan C. Subudhi CSE Department USC, Columbia

Introduction What is JIT compilation? Java source code -> Byte code -> Native machine code Static compiler(JVM) -> interpreted (faster) JIT -> converts (comparatively slower) Efficient Java JIT compiler through various optimization techniques explained later. The Java JIT compiler for version 2.5 Intel’s Vtune for Java 2 CSCE 531 Presentation

Phases 5 major phases of Intel JIT Prepass Global Register Allocation Code Generation Code Emission Code and Patching 3 CSCE 531 Presentation

Pre-Pass phase Collects following information: depth of Java operand stack at entry of each basic block static reference count of each local variable Java operand stack locations containing references at each point where garbage collection may occur a list of those variables that alternately hold reference and non-reference values at different points in the method. 4 CSCE 531 Presentation

Code Generation Phase Lazy code selection goals: to keep intermediate values (i.e., Java operand stack values) in scratch registers to reduce register pressure and take advantage of the IA32 addressing modes by folding loads of immediate operands and accesses to memory operands into the compute instructions that use them. Auxiliary data structure called Mimic Stack. 5 CSCE 531 Presentation

Code Generation Phase 6 Operand class hierarchy CSCE 531 Presentation

Code Generation Phase Lazy code selection optimizations include: Strength Reduction Converts compare followed by branch instructions to one compare and branch instruction Elimination of redundant load after store 7 CSCE 531 Presentation

Code Generation Phase Common Sub-Expression Elimination by a fast and lightweight algorithm that focuses on extended basic blocks. expression tag represented by the pair expressions lengths of 16 bytes are compared, determined empirically Expressions in R are ‘killed’ when: Instructions that modify R Assignments that modify value of R 8 CSCE 531 Presentation

Code Generation Phase Common Sub-Expression Elimination limitations: “x+y” and “y+x” are different, If “x=w”, “x+y” and “w+y” are different, Non-contiguous byte code streams not applicable 9 CSCE 531 Presentation

Code Generation Phase IA32 Architecture: Code selector interfaces between code generation and register allocation, and the register manager. Register manager handles: 7 general-purpose scratch registers 3 caller-saved (eax, ecx, edx) – (local) 4 callee-saved (ebx, ebp, esi, edi) – (global) 10 CSCE 531 Presentation

Code Generation Phase Local Register Allocation LRU (circular allocation strategy), based on ‘mimic’ stack Global Register Allocation Algorithm 1: highest static reference counts Algorithm 2: priority-based scheme 11 CSCE 531 Presentation

Code Generation Phase Benefits of global register allocation (in prepass phase) less spill code is generated, callee-saved registers are used as spill locations (reducing stack frame accesses), more CS are found (registers with live expressions live longer) 12 CSCE 531 Presentation

Code Generation Phase Array Bounds Check Elimination Bounds checks can be eliminated if it can be proven that index is always within the correct range Eg: bounds check for A[7] is computed => A[5] bounds check need not be recomputed Also, this can be applied to the “newarray” bytecode 13 CSCE 531 Presentation

Code Generation Phase Array Bounds Check Elimination is limited as follows: applied only locally and not globally, only constant operands are used (symbolic information could further reduce bounds checks) 14 CSCE 531 Presentation

Code Generation Phase Out-of-Line Exception Throws is similar to the previous technique, but it differs as follows: subscripts are checked to be within the bounds of the array This is infrequently used and is referred to as “cold code” 15 CSCE 531 Presentation

Sample Test Scenario We will look at a code sequence from the MPEG player program, on the following slide, illustrating the following: Lazy code selection Common subexpression elimination Out-of-Line exception throwing Register allocation 16 CSCE 531 Presentation

17 Illustration of optimizations in Intel JIT CSCE 531 Presentation

Garbage Collection Freeing up heap space by deallocating memory of unreferenced variables Live reference analysis done by traversing a graph of reachable nodes from the root set Root set calculated by JIT because it has access to stack locations and registers 18 CSCE 531 Presentation

Garbage Collection Problem is “ambiguous types” which is solved by the JIT being able to precisely enumerate the complete set of variables containing valid references. This is done through: Maintaining an extra bit in the stack frame In general, not possible to analyze all variables and so a dynamic tagging approach is used Also, analyzing when a variable a) is initialized b) is live c) contains a reference 19 CSCE 531 Presentation

Experimental Analysis 20 CSCE 531 Presentation Performance of Lisp

Conclusions Optimization techniques include: Lazy code selection Common subexpression elimination Global register allocation Array bounds check elimination (memory is used sparingly because no explicit intermediate memory representation is used, only byte code is used) 21 CSCE 531 Presentation

Conclusions On the whole, the Intel JIT generates code in lesser compilation time and requires smaller memory space, which are the most important criteria to ensure fast performance, and yet high quality of the IA32 code is guaranteed. 22 CSCE 531 Presentation

References Adl-Tabatabai, Ali-Reza, Michał Cierniak, Guei-Yuan Lueh, Vishesh M. Parikh, and James M. Stichnoth. "Fast, effective code generation in a just-in-time Java compiler." ACM SIGPlAN Notices 33, no. 5 (1998): CSCE 531 Presentation

24 CSCE 531 Presentation

25 CSCE 531 Presentation