R Byte Code Optimization Compiler (1) March 7. 2012 1.

Slides:



Advertisements
Similar presentations
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Advertisements

1 Lecture 10 Intermediate Representations. 2 front end »produces an intermediate representation (IR) for the program. optimizer »transforms the code in.
1 SS and Pipelining: The Sequel Data Forwarding Caches Branch Prediction Michele Co, September 24, 2001.
Making Choices in C if/else statement logical operators break and continue statements switch statement the conditional operator.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
1 Compiler Construction Intermediate Code Generation.
Compiler Construction
Chapter 7Louden, Programming Languages1 Chapter 7 - Control I: Expressions and Statements "Control" is the general study of the semantics of execution.
Structure of a C program
C Programming Basics Lecture 5 Engineering H192 Winter 2005 Lecture 05
Differences between Java and C CS-2303, C-Term Differences between Java and C CS-2303, System Programming Concepts (Slides include materials from.
Assemblers Dr. Monther Aldwairi 10/21/20071Dr. Monther Aldwairi.
Code Generation Professor Yihjia Tsai Tamkang University.
Chapter 16 Java Virtual Machine. To compile a java program in Simple.java, enter javac Simple.java javac outputs Simple.class, a file that contains bytecode.
Introduction to Code Generation Mooly Sagiv html:// Chapter 4.
Chapter 7Louden, Programming Languages1 Chapter 7 - Control I: Expressions and Statements "Control" is the general study of the semantics of execution.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
Shallow Versus Deep Copy and Pointers Shallow copy: when two or more pointers of the same types point to the same memory – They point to the same data.
Peter Juszczyk CS 492/493 - ISGS. // Is this C# or Java? class TestApp { static void Main() { int counter = 0; counter++; } } The answer is C# - In C#
David Evans CS201j: Engineering Software University of Virginia Computer Science Lecture 18: 0xCAFEBABE (Java Byte Codes)
1 The Java Virtual Machine Yearly Programming Project.
C Tokens Identifiers Keywords Constants Operators Special symbols.
Program A computer program (also software, or just a program) is a sequence of instructions written in a sequence to perform a specified task with a computer.
Compiler Construction
Recursion Textbook chapter Recursive Function Call a recursive call is a function call in which the called function is the same as the one making.
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, Chapter 11: Compiler II: Code Generation slide 1www.idc.ac.il/tecs.
Hello.java Program Output 1 public class Hello { 2 public static void main( String [] args ) 3 { 4 System.out.println( “Hello!" ); 5 } // end method main.
Macro instruction synthesis for embedded processors Pinhong Chen Yunjian Jiang (william) - CS252 project presentation.
R Environment and Variable Lookup Apr R Environment and Variable Lookup Outline  R Environment and Variable Lookup  R Byte-Code Interpreter.
Stacks. A stack is a data structure that holds a sequence of elements and stores and retrieves items in a last-in first- out manner (LIFO). This means.
CIS-165 C++ Programming I CIS-165 C++ Programming I Bergen Community College Prof. Faisal Aljamal.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Chapter 7 Additional Control Structures. Chapter 7 Topics l Switch Statement for Multi-Way Branching l Do-While Statement for Looping l For Statement.
Group 4 Java Compiler Group Members: Atul Singh(Y6127) Manish Agrawal(Y6241) Mayank Sachan(Y6253) Sudeept Sinha(Y6483)
Chapter 12: Pointers, Classes, Virtual Functions, and Abstract Classes.
Engineering H192 - Computer Programming The Ohio State University Gateway Engineering Education Coalition Lect 5P. 1Winter Quarter C Programming Basics.
Virtual Machines, Interpretation Techniques, and Just-In-Time Compilers Kostis Sagonas
CPS4200 Unix Systems Programming Chapter 2. Programs, Processes and Threads A program is a prepared sequence of instructions to accomplish a defined task.
Lecture 3 Classes, Structs, Enums Passing by reference and value Arrays.
Engineering H192 - Computer Programming Gateway Engineering Education Coalition Lect 5P. 1Winter Quarter C Programming Basics Lecture 5.
Data Structure and c K.S.Prabhu Lecturer All Deaf Educational Technology.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Page Optimising a Symbolic Problem of Sequences of Instructions.
Slides created by: Professor Ian G. Harris Hello World #include main() { printf(“Hello, world.\n”); }  #include is a compiler directive to include (concatenate)
1 CS503: Operating Systems Spring 2014 Part 0: Program Structure Dongyan Xu Department of Computer Science Purdue University.
Functions, Scope, and The Free Store Functions Functions must be declared by a function prototype before they are invoked, return_type Function_name(type,
C LANGUAGE Characteristics of C · Small size
 Control Flow statements ◦ Selection statements ◦ Iteration statements ◦ Jump statements.
Variables in C Topics  Naming Variables  Declaring Variables  Using Variables  The Assignment Statement Reading  Sections
CMSC 104, Version 8/061L09VariablesInC.ppt Variables in C Topics Naming Variables Declaring Variables Using Variables The Assignment Statement Reading.
Array and Pointers An Introduction Unit Unit Introduction This unit covers the usage of pointers and arrays in C++
Instruction Set Architectures Continued. Expanding Opcodes & Instructions.
INTRODUCTION BEGINNING C#. C# AND THE.NET RUNTIME AND LIBRARIES The C# compiler compiles and convert C# programs. NET Common Language Runtime (CLR) executes.
C++ Lesson 1.
Data types Data types Basic types
LLVM Pass and Code Instrumentation
11/10/2018.
Java Byte Codes (0xCAFEBABE) cs205: engineering software
Character Set The character set of C represents alphabet, digit or any symbol used to represent information. Types Character Set Uppercase Alphabets A,
פרטים נוספים בסילבוס של הקורס
CSc 453 Interpreters & Interpretation
Chapter 7 Additional Control Structures
Instruction Set Architectures Continued
Govt. Polytechnic,Dhangar
Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.
MARIE: An Introduction to a Simple Computer
Comp Org & Assembly Lang
2. Second Step for Learning C++ Programming • Data Type • Char • Float
Variables in C Topics Naming Variables Declaring Variables
CSc 453 Interpreters & Interpretation
Presentation transcript:

R Byte Code Optimization Compiler (1) March

R Byte Code Optimization Compiler (1) Outline  Basic Compiling Process  Compiler Structure  Decoding Pass  Type Annotation Pass  Unbox Opportunity Identification Pass 2

R Byte Code Optimization Compiler (1) Basic Byte-Code Optimization Compiling Process  Typical Compiler Structure 3 INTSXP Org Byte-Code seq BC_STMTP Base Stmts BC_STMTP Opt Stmts INTSXP New Byte- Code seq Several Passes Profile Table Addr of GETVAR 1 Addr of LDCONST 2 Addr of ADD 3 Addr of SETVAR 4 Addr of POP Addr of GETFUN 5 Addr of MAKEPROM 6 Addr of CALL 7 Addr of RETRUN Decode pass Encode pass Optimization Rules Addr of GETVAR 1 Addr of LDCONST 2 Addr of ADD 3 Addr of SETVAR 4 Addr of POP Addr of GETFUN 5 Addr of MAKEPROM 6 Addr of CALL 7 Addr of RETRUN Work on a new IR for the R byte-Code

R Byte Code Optimization Compiler (1) Compiler Structure – New Components Required  A new IR is needed – The current R byte-code is just instructions opcode and operand in int array structure No addition information attached Hard to manipulate – A Very Simple IR for Current Optimization Requirements Attach profile information Attach type information Support simple type inference Support simple unbox opportunity identification  Basic Compiler Infrastructure – Passes definition – Engine for run all the passes – Stmt Printer: printing stmt as human readable text 4

R Byte Code Optimization Compiler (1) Compiler Structure – IR  IR and Stmts 5 typedef struct { enum OP_CODEopcode;//the op_code char* op_name; //name of the instruction unsignedoperand_num; //number of operands unsignedstack_use; //number of operands consumed on stack unsignedstack_gen; //number of operands produced on stack intneed_profile; //whether the instruction need profile void*addr;//used for decode and encode } BC_INSTR, *BC_INSTRP;//the instruction typedef struct { BC_INSTRP instr; //pointer to the instruction unsigned pc;//pc value, relative pc value int operands[4]; //we only support max 4 operands. In fact, only switch has 4 operands int type;//e.g. 0, unknown, logic, int, real, non-scalar. If the code gen more than one stac int type_source; //e.g. fixed or from const, profile based, derived(from reasoning) int output_shape; //Whether the output need box/unbox. 2bits, [need box][may unbox] } BC_STMT, *BC_STMTP;

R Byte Code Optimization Compiler (1) Compiler Structure – Passes and Engine  Pass function type and Examples  Skeleton code to run a pass 6 int(*pass_fun)(BC_STMTP, SEXP, unsigned*, PT_STACKP) roc_type_annotate_pass(BC_STMTP stmt, //The current statement SEXP constants, //Constant table unsigned * profile_table, //Profile table PT_STACKP type_stack); //Current simulated stack void roc_run_pass(PT_LISTP stmts, SEXP constants, unsigned * profile_table, int(*pass_fun)(BC_STMTP, SEXP, unsigned*, PT_STACKP)) { PT_STACKP type_stack = rou_create_pointer_stack(); //Prepare the simulated stack for (unsigned i = 0; i length; i++) { BC_STMTP stmt = rou_pointer_arraylist_get(stmts, i); (*pass_fun)(stmt, constants, profile_table, type_stack); //call the pass function //Update the stack int stack_use = stmt->instr->stack_use; int stack_gen = stmt->instr->stack_gen; for (int i = 0; i < stack_use; i++) { rou_pointer_stack_pop(type_stack); } for (int i = 0; i < stack_gen; i++) { rou_pointer_stack_push(type_stack, stmt); } } rou_remove_pointer_stack(type_stack); }

R Byte Code Optimization Compiler (1) Compiler Structure – One Pass  Skeleton Code for One Pass 7 int roc_type_annotate_pass(BC_STMTP stmt, SEXP constants, unsigned * profile_table, PT_STACKP type_stack) { SEXP arg0; unsigned* prof_cell; BC_STMTP op1_stmt, op2_stmt; enum OP_CODE op_code = stmt->instr->opcode; switch (op_code) { case LDCONST_OP:... break; case GETVAR_OP:... break; case DDVAL_OP:... break; default: break; } return xxx; }

R Byte Code Optimization Compiler (1) Decoding Pass  Transform original int Array into Stmts – Transform addr back to opcode – Organize opcode and operands into stmts 8 #Instructions Vec 7L, # code version Addr of LDCONST.OP, 1L, Addr of SETVAR.OP, 2L, Addr of POP.OP, Addr of GETVAR.OP, 2L, Addr of LDCONST.OP, 3L, Addr of ADD.OP, 4L, Addr of SETVAR.OP, 5L, Addr of POP.OP, Addr of GETFUN.OP, 6L, Addr of MAKEPROM.OP, 7L, Addr of CALL.OP, 8L, Addr of RETURN.OP PC STMT 1 LDCONST, 1 Type:Unknown, Type Source:Fixed 3 SETVAR, 2 Type:Unknown, Type Source:Fixed 5 POP Type:Unknown, Type Source:Fixed 6 GETVAR, 2 Type:Unknown, Type Source:Fixed 8 LDCONST, 3 Type:Unknown, Type Source:Fixed 10 ADD, 4 Type:Unknown, Type Source:Fixed 12 SETVAR, 5 Type:Unknown, Type Source:Fixed 14 POP Type:Unknown, Type Source:Fixed 15 GETFUN, 6 Type:Unknown, Type Source:Fixed 17 MAKEPROM, 7 Type:Unknown, Type Source:Fixed 19 CALL, 8 Type:Unknown, Type Source:Fixed 21 RETURN Type:Unknown, Type Source:Fixed

R Byte Code Optimization Compiler (1) Type Annotation Pass  A very simple type inference engine – Input: profile table, and some simple rules – Output: Type of the object on top of the stack after executing the stmt  Optimize the runtime profile by simple type inference – Reduce the runtime profile requirements 9 [Stack TOP] SEXP [...] LDCONST, 1 IdxValue r Just Check the constant’s type statically CALL, 8 GETVAR, 2 [Stack TOP] SEXP [???] Must to profile to get the type ADD, 4 [Stack TOP] SEXP [...] [Stack TOP] SEXP [...] If we know the types of the objects on top of the stack before add, we could reason the output’s type statically

R Byte Code Optimization Compiler (1) Type Annotation Pass – Output Example 10 PC STMT 1 LDCONST, 1 Type:Real Scalar, Type Source:Constant 3 SETVAR, 2 Type:Real Scalar, Type Source:Derived 5 POP Type:Non-Simple Type, Type Source:Derived 6 GETVAR, 2 Type:Real Scalar, Type Source:Profiled[0, 0, 1, 0] 8 LDCONST, 3 Type:Real Scalar, Type Source:Constant 10 ADD, 4 Type:Real Scalar, Type Source:Derived 12 SETVAR, 5 Type:Real Scalar, Type Source:Derived 14 POP Type:Non-Simple Type, Type Source:Derived 15 GETFUN, 6 Type:Non-Simple Type, Type Source:Fixed 17 MAKEPROM, 7 Type:Non-Simple Type, Type Source:Fixed 19 CALL, 8 Type:Real Scalar, Type Source:Profiled[0, 0, 1, 0] 21 RETURN Type:Real Scalar, Type Source:Derived From profile, the values are counts of [Logical scalar, Int Scalar, Real Scalar, Non-simple type] Derived Type. Because the two objects on stack are all real scalar

R Byte Code Optimization Compiler (1) Unbox Opportunity Identification Pass  Two bits to mark one opcode’s output should be boxed or unboxed – [Must boxed, May Unboxed] If only “May unboxed” is set, we can add unbox after the stmt  Identify some opcode(e.g. ADD) that could work on unboxed objects – Identify the source stmts that generate the objects on top of the stack – Mark the stmts as “May unboxed”  Identify some opcode (e.g. SETVAR, RETURN) that must work on boxed objects – Identify the source stmts that generate the object on top of the stack – Mark the stmts as “Must boxed” 11 ADD, 4 [Stack TOP] SEXP [...] GETVAR, 2 LDCONST, 3 ADD could work on unboxed objects, because the two objects on stack are real scalar  Mark the source stmt as “May unboxed”

R Byte Code Optimization Compiler (1) Unbox Opportunity Identification Pass – Output Example 12 PC STMT 1 LDCONST, 1 Type:Real Scalar, Type Source:Constant Output shape:Box 3 SETVAR, 2 Type:Real Scalar, Type Source:Derived 5 POP Type:Non-Simple Type, Type Source:Derived 6 GETVAR, 2 Type:Real Scalar, Type Source:Profiled[0, 0, 1, 0] Output shape:Unbox 8 LDCONST, 3 Type:Real Scalar, Type Source:Constant Output shape:Unbox 10 ADD, 4 Type:Real Scalar, Type Source:Derived Output shape:Box 12 SETVAR, 5 Type:Real Scalar, Type Source:Derived 14 POP Type:Non-Simple Type, Type Source:Derived 15 GETFUN, 6 Type:Non-Simple Type, Type Source:Fixed 17 MAKEPROM, 7 Type:Non-Simple Type, Type Source:Fixed 17 MAKEPROM, 7 Type:Non-Simple Type, Type Source:Fixed 19 CALL, 8 Type:Real Scalar, Type Source:Profiled[0, 0, 1, 0] Output shape:Box 21 RETURN Type:Real Scalar, Type Source:Derived Could unbox, because only used in a unbox situation Must box after the add, because it will be used by “SETVAR” Must box after the add, because it will be used by “RETURN”

R Byte Code Optimization Compiler (1) Implementation Status and Challenges  Status – Implemented the simple compiler infrastructure as describe – Implemented the three passes as described – Work well on the first example (RealAdd)  Challenges – There is nothing in R – Need implement them all – R is Pure ANIS C implemented Not C++, No OO, No STL, … – My current simple run pass engine can only support single Basic Block Handling control flows – Need a lot of additional effort – Identify all Basic Blocks, – Reverse poster order traverse, – Iteration until stable, …  Questions? – Is there a stack VM based compiler infrastructure available? 13