Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

Slides:



Advertisements
Similar presentations
From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005.
Advertisements

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Software and Services Group “Project Panthera”: Better Analytics with SQL, MapReduce and HBase Jason Dai Principal Engineer Intel SSG (Software and Services.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
©Silberschatz, Korth and Sudarshan22.1Database System Concepts 4 th Edition 1 Extended Aggregation SQL-92 aggregation quite limited  Many useful aggregates.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
A warehouse solution over map-reduce framework Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff.
Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.
Introduction to PL/SQL Chapter 9. Objectives Explain the need for PL/SQL Explain the benefits of PL/SQL Identify the different types of PL/SQL blocks.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 8 Advanced SQL.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 8 Advanced SQL.
Introduction to Databases Chapter 7: Data Access and Manipulation.
SAGE Computing Services Customised Oracle Training Workshops and Consulting Are you making the most of PL/SQL? Hints and tricks and things you may have.
H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Chapter 4: Organizing and Manipulating the Data in Databases
Hive Facebook 2009.
QMapper for Smart Grid: Migrating SQL-based Application to Hive Yue Wang, Yingzhong Xu, Yue Liu, Jian Chen and Songlin Hu SIGMOD’15, May 31–June 4, 2015.
Oct 26, 2005 CDT DOM Roadmap Doug Schaefer. Parser History  CDT 1.0 ► JavaCC based parser  Used to populate CModel and Structure Compare ► ctags based.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 7 © 2013 Pearson Education, Inc. Publishing as Prentice Hall 1 Modern Database Management 11 th Edition Jeffrey A. Hoffer, V. Ramesh, Heikki Topi.
Enabling data management in a big data world Craig Soules Garth Goodson Tanya Shastri.
Database Development Tr ươ ng Quý Quỳnh. References UDEMY: SQL Database MasterClass: Go From Pupil To Master! Database Systems - A Practical Approach.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Chapter 6 Procedural Language SQL and Advanced SQL Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
SQL Tips IMS User Group Meeting Fall Aggregation: COUNT Provides a count of all (distinct) values in a particular column or table. The column can.
Hive. What is Hive? Data warehousing layer on top of Hadoop – table abstractions SQL-like language (HiveQL) for “batch” data processing SQL is translated.
CPSC-608 Database Systems Fall 2015 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
QUERY PROCESSING RELATIONAL DATABASE KUSUMA AYU LAKSITOWENING
1 Execution Strategies for SQL Subqueries Mostafa Elhemali, César Galindo- Legaria, Torsten Grabs, Milind Joshi Microsoft Corp.
1 SQL – IV Grouping data from tables in SQL –The concept of grouping –GROUP BY clause –HAVING Clause –Determining whether values are unique –Group by using.
Subqueries.
Page 1 © Hortonworks Inc – All Rights Reserved What's new in Hive 2.0 Sergey Shelukhin.
C H A P T E R T W O Linking Syntax And Semantics Programming Languages – Principles and Paradigms by Allen Tucker, Robert Noonan.
Random Query Generator for Hive November 2015 Hive Contributor Meetup Szehon Ho.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
©SoftMoore ConsultingSlide 1 Structure of Compilers.
Oracle 10g Database Administrator: Implementation and Administration Chapter 10 Basic Data Management.
Lens Server REST API for querying and schema update JDBC Client Java Client CLI Applications – Reporting, Ad Hoc Queries OLAP Cube Metastore Hive (MR)
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Lab 2 Writing PL/SQL Blocks CISB514 Advanced Database Systems.
1 Ch. 11: Grouping Things Together  ANSI standard SQL Group functions: AVG, COUNT, MAX, MIN, STDDEV, SUM, VARIANCE  Others: 8i: GROUPING (used with CUBE.
 CONACT UC:  Magnific training   
Prediction-Based Multivariate Query Modeling Analytic Queries.
Select Complex Queries Database Management Fundamentals LESSON 3.1b.
5/7/ :44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL
Database Systems: Design, Implementation, and Management Tenth Edition
Big Data Intro.
CPSC-608 Database Systems
Oracle Tuning Practice
Hive on steroid Project stinger.
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Server & Tools Business
Enhance BI Applications and Simplify Development
Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Contents Preface I Introduction Lesson Objectives I-2
Chapter 8 Advanced SQL.
Database Systems: Design, Implementation, and Management Tenth Edition
Query Optimization.
CPSC-608 Database Systems
Query Processing.
Pig Hive HBase Zookeeper
Big Data.
Presentation transcript:

Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)

2 Software and Services Group What SQL support is needed? More SQL-92 support for analytics Complete SQL data type system –Data types (e.g., Datetime, fixed precision numbers), type conversion rules & function (CAST), Datetime expressions and functions (e.g. extract, +/- interval), etc. Full subquery support –Subquery in WHERE clauses, correlated subquery, scalar subquery, etc. –New expressions (EXISTS, ALL, ANY, etc.) Complete Set operators –DISTINCT UNION, INTERSECT, EXCEPT, etc. Multiple-table SELECT statement Update/delete? –On HBase only? (Almost) SQL-92 compliance? How about transaction? 2

3 Software and Services Group What SQL support is needed (continued)? Additional analytics support (beyond SQL-92) Advanced OLAP functions for analysis & reporting –E.g., rank, rollup, cube, window function (SQL 2003), etc. Advanced SQL syntax –E.g. WITH clause (SQL-99) Procedural extensions –E.g., Begin, End, If…Then...Else, Loop/Exit/Continue, etc. 3

4 Software and Services Group Workload Analysis 4 TPC-HTPC-DS Complex SubqueryYY Multiple-table SELECTYY Set operatorsY SQL data types (especially Datetime) YY Advanced OLAP functions (e.g., rank, grouping and window functions) Y WITH clause (SQL-99)Y UPDATE/DELETEY

5 Software and Services Group Let’s Get Our Hands Dirty 5 Parser Semantic Analyzer (Optimizer) Execution Query AST (Abstract Syntax Tree) Execution Plan (Almost) SQL-compliant Hive parser A lot of work: SQL much more complex than HiveQL –HiveQL grammar file: ~61KB with 2487 lines –SQL (with PL/SQL extensions) grammar file: ~524KB with 8583 lines Also complex: many existing Hive grammar rules need to be changed –To support more complex SQL constructs (e.g., subquery) UDF/UDAF/UDTF For some operators (e.g., rank)

6 Software and Services Group Let’s Get Our Hands Dirty 6 Parser Semantic Analyzer (Optimizer) Execution Query AST (Abstract Syntax Tree) Execution Plan Analysis, transformation & optimization SQL data type system Subquery support (incl. subquery unnestting) Multiple-table SELECT Set operations Advanced OLAP functions …

7 Software and Services Group Project Panthera: Our open source efforts to enable better analytics capabilities on Hadoop/HBase How to Leverage Existing Works? 7 * Hive Parser Hive-AST HiveQL Driver Query (Open Source) SQL Parser* SQL- AST SQL-AST Analyzer & Translator Multi-Table SELECT Subquery Unnesting … Hive Semantic Analyzer INTERSECT Support MINUS Support … Hadoop MR SQL Hive- AST A SQL engine for Hive MapReduce Goal: full analytical SQL support for OLAP  Subquery in WHERE clause  Correlated subquery  Multiple-table SELECT statement  …

8 Software and Services Group NextR Hive UDFs UDFs for Oracle db extensions (rank, decode, nvl, etc.) SQL windowing functions for Hive How to Leverage Existing Works? 8

9 Software and Services Group 9