Code Gen of Expr Eval in Shark

Slides:



Advertisements
Similar presentations
CPSC 388 – Compiler Design and Construction
Advertisements

Reliable Scripting Using Push Logic Push Logic David Greaves, Daniel Gordon University of Cambridge Computer Laboratory Reliable Scripting.
Phillip Dickens, Department of Computer Science, University of Maine. In collaboration with Jeremy Logan, Postdoctoral Research Associate, ORNL. Improving.
8. Code Generation. Generate executable code for a target machine that is a faithful representation of the semantics of the source code Depends not only.
Recap 1.Programmer enters expression 2.ML checks if expression is “well-typed” Using a precise set of rules, ML tries to find a unique type for the expression.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Michael Armbrust A Functional Query Optimization Framework.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
ITEC200 – Week03 Inheritance and Class Hierarchies.
Objects and Classes David Walker CS 320. Advanced Languages advanced programming features –ML data types, exceptions, modules, objects, concurrency,...
Slides prepared by Rose Williams, Binghamton University Chapter 1 Getting Started 1.1 Introduction to Java.
VBA Modules, Functions, Variables, and Constants
Introduction to Programming with Java “Object Oriented” Programming Compiling & Running Java Programs.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
Lecture 2: Do you speak Java?. From Problem to Program Last Lecture we looked at modeling with objects! Steps to solving a business problem –Investigate.
1 Further OO Concepts II – Java Program at run-time Overview l Steps in Executing a Java Program. l Loading l Linking l Initialization l Creation of Objects.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Session-02. Objective In this session you will learn : What is Class Loader ? What is Byte Code Verifier? JIT & JAVA API Features of Java Java Environment.
JAVA v.s. C++ Programming Language Comparison By LI LU SAMMY CHU By LI LU SAMMY CHU.
OOP Languages: Java vs C++
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
1 Inheritance and Polymorphism Chapter 9. 2 Polymorphism, Dynamic Binding and Generic Programming public class Test { public static void main(String[]
CS 355 – Programming Languages
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
Standard Grade Computing SYSTEM SOFTWARE CHAPTER 19.
Java Virtual Machine Java Virtual Machine A Java Virtual Machine (JVM) is a set of computer software programs and data structures that use.
JIT in webkit. What’s JIT See time_compilation for more info. time_compilation.
CISC 673 : Optimizing Compilers Dept of Computer & Information Sciences University of Delaware JikesRVM.
1 October 1, October 1, 2015October 1, 2015October 1, 2015 Azusa, CA Sheldon X. Liang Ph. D. Computer Science at Azusa Pacific University Azusa.
ITEC 352 Lecture 20 JVM Intro. Functions + Assembly Review Questions? Project due today Activation record –How is it used?
1 Introduction to JVM Based on material produced by Bill Venners.
Lecture 2 Object Oriented Programming Basics of Java Language MBY.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Compiler Chapter# 5 Intermediate code generation.
1162 JDK 5.0 Features Christian Kemper Principal Architect Borland.
Basics of Java IMPORTANT: Read Chap 1-6 of How to think like a… Lecture 3.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
Chapter 8 High-Level Programming Languages. 8-2 Chapter Goals Describe the translation process and distinguish between assembly, compilation, interpretation,
Interpretation Environments and Evaluation. CS 354 Spring Translation Stages Lexical analysis (scanning) Parsing –Recognizing –Building parse tree.
Hoang Anh Viet Hà Nội University of Technology Chapter 1. Introduction to C# Programming.
Created By: Kevin Cherry. A library that creates a display to run on top of your game allowing you to retrieve/set values and invoke methods.
Lecture 8 February 29, Topics Questions about Exercise 4, due Thursday? Object Based Programming (Chapter 8) –Basic Principles –Methods –Fields.
EE2E1. JAVA Programming Lecture 3 Java Programs and Packages.
Garbage Collection and Classloading Java Garbage Collectors  Eden Space  Surviver Space  Tenured Gen  Perm Gen  Garbage Collection Notes Classloading.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 11 – gdb and Debugging.
 Programming - the process of creating computer programs.
Chapter 8 Class Inheritance and Interfaces F Superclasses and Subclasses  Keywords: super F Overriding methods  The Object Class  Modifiers: protected,
More on Loop Optimization Data Flow Analysis CS 480.
Software Development Introduction
Code Generation CPSC 388 Ellen Walker Hiram College.
CIS 200 Test 01 Review. Built-In Types Properties  Exposed “Variables” or accessible values of an object  Can have access controlled via scope modifiers.
Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.
Terms and Rules II Professor Evan Korth New York University (All rights reserved)
Compilers and Interpreters
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Programming 2 Intro to Java Machine code Assembly languages Fortran Basic Pascal Scheme CC++ Java LISP Smalltalk Smalltalk-80.
RealTimeSystems Lab Jong-Koo, Lim
CSCE 343 – Programming Language Concepts Welcome!.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Dart Omar Ansari, Kenneth Jones, Isaac Wilder. 1. Overview l Object Oriented Language l Developed By Google l Primarily used for building websites, servers,
JAVA MULTIPLE CHOICE QUESTION.
Lecture 1: Introduction to JAVA
2.1. Compilers and Interpreters
Pointers C#, pointers can only be declared to hold the memory addresses of value types int i = 5; int *p; p = &i; *p = 10; // changes the value of i to.
Programming language translators
M S COLLEGE ART’S, COMM., SCI. & BMS
Review of Java Fundamentals
Presentation transcript:

Code Gen of Expr Eval in Shark

Outlines CG examples Performance Comparison (CG Expr Eval V.S. Hive Expr Eval) CG Design & Major Class Diagram Implemented UDFs/Generic UDFs Future Works

CG Examples shark.expr.cg=true/false in hive-site.xml to enable/disable the feature; default is true.

Performance Comparison (CG Expr Eval V.S. Hive Expr Eval) 747,747,840 records / 66,909,023,675 bytes / RC File (with LzoCodec) on 4 Slaves Machines

Performance Comparison (CG Expr Eval V.S. Hive Expr Eval) (2) Why CG Expr Eval is Faster than Hive Expr Eval? In Hive Expr Eval: A.Keep re-evaluating the common sub node expressions e.g. in expression: concat(year(date_add(visitDate,7)), '/', month(date_add(visitDate,7)), '/', day(date_add(visitDate,7))), the “date_add(visitDate,7)” will be evaluated 3 times. B.Keep checking data types in the runtime The parameter types of “evaluate” method in GenericUDFs is uncertain until runtime, and Hive Expr Eval have to keep checking the value types inside of the “evaluating”. e.g. GenericUDFOPGreaterThan.evaluate, GenericUDFPrintf.evaluate etc. C.Un-necessary type converting e.g. in expression: (duration ), variable “duration” will be converted into a new object FloatWritable first in Hive Expr Eval, which creates lots of small temperate objects (GenericUDFBridge.conversionHelper) D.Large mount of virtual function calls in runtime Hive Expr Eval always use the base class objects, particularly the UDF objects and the field value objects E.Using the Java Reflection to call UDF evaluate() method Hive Expr Evals access the UDF (in class GenericUDFBridge) is based on the Java Reflection API, which cause another performance issue ( CG Expr Eval Generates Source Code with concrete objects and executing branches.

CG Design & Major Class Diagram

CG Design & Major Class Diagram (2) Why not generate the bytecode directly? A.The generated content is quite complicated, source code is much easier to debug / troubleshooting. B.Java complier could do another optimizations when compile the source code. Why not generate the evaluating source code according to Hive ExprNodeEvaluator tree, but the ExprNodeDesc tree? A.ExprNodeEvaluator tree loss some information, which may be helpful for further optimization. (e.g. the common sub node expression evaluating) B.Extracting the information from the ExprNodeEvaluator tree is kind of tough, as most of the variables are protected / private in ExprNodeEvaluator.

Implemented UDFs/Generic UDFs Supported Features: o Relational Operators (=,!=,<,<= etc.) o Arithmetic Operators (+,-,*,/,% etc.) o Logical Operators (AND,OR,NOT etc.) o Built-in Functions(UDF) and existed User-Defined Functions o Partial of the generic UDF GenericUDFBetween GenericUDFPrintf GenericUDFInstr GenericUDFBridge Unsupported Features o Conditional Functions (if/case/when etc.) o Map/Array o UDAF o UDTF o Misc. Functions (java_method/reflect/hash etc.)

Future Works Generated Java Source Compile once and distribute among the cluster Reuse the Generated.class for the same queries Support more General UDF (case/when/if etc.) Support Collection Type(Array/Map etc.) Code Gen in Aggregations