Page 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What's new in Hive 2.0 Sergey Shelukhin.

Slides:



Advertisements
Similar presentations
Oracle to MySQL Database Migration SQLWays - Migration Software Presentation Copyright (c) Ispirer Systems Ltd. All Rights Reserved.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Hive - A Warehousing Solution Over a Map-Reduce Framework.
The New Data Pump Caleb Small Next generation Import / Export New features Better performance Improved security Versatile interfaces.
SQL: Standards and Flavors A presentation for CS157B by David Wortham.
Spark 1.1 and Beyond Patrick Wendell.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
Optinuity Confidential. All rights reserved. C2O Configuration Requirements.
Putting the Sting in Hive Page 1 Alan F.
Gavin Payne Oracle for SQL Server DBAs. Why Oracle? Installation Physical Storage Backup and Recovery 20 slides in 50 minutes Inside the database Programmability.
Week 5 – Chap. 5 Data Transfer DBAs often must transfer data to and from text files, Excel spreadsheets, Access, Oracle or other SQL Server databases This.
Apache Jakarta Tomcat Suh, Junho. Road Map Tomcat Overview Tomcat Overview History History What is Tomcat? What is Tomcat? Servlet Container.
Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.
Hive – A Warehousing Solution Over a Map-Reduce Framework Presented by: Atul Bohara Feb 18, 2014.
SQL Server 2005 SP2 Israeli SQL Server User Group March 2005 Ami Levin
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Phil Brewster  One of the first steps – identify the proper data types  Decide how data (in columns) should be stored and used.
HADOOP ADMIN: Session -2
Software and Services Group SQL (92 and Beyond) Support for Hive Jason Dai Principal Engineer Intel SSG (Software and Services Group)
Chris Hyzer University of Pennsylvania
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
SQL Server to MySQL Database Migration SQLWays - Migration Software Presentation March 2009 Copyright (c) Ispirer Systems Ltd.
Database Design for DNN Developers Sebastian Leupold.
9/10/20151 Hyperion Enterprise 6.5 New Features & Functionality Robert Cybulski, CPA Finit Solutions.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
What’s new in Stack 3.2 Michael Youngstrom. Disclaimer This IS a presentation – So sit back and relax Please ask questions.
State of the Elephant Hadoop yesterday, today, and tomorrow Page 1 Owen
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
Hive Facebook 2009.
OpenACS: Porting Oracle Applications to PostgreSQL Ben Adida
CVSQL 2 The Revenge of the SQL. The present Read-only access to CVS repository logs Language is a subset of SQL XML interface for returning results Built-in.
Performance Dash A free tool from Microsoft that provides some quick real time information about the status of your SQL Servers.
© Hortonworks Inc MR / Tez Query Comparison Page 1 HW = 20 Node (48 GB RAM, 6x disk) SW = Hive Trunk (Nov ) + ORCFile + Vectorization Hive.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
Stored Procedures. Definition a stored procedure is a set of Structured Query Language (SQL) statements with an assigned name that's stored in the database.
DAT 332 SQL Server 2000 Data Transformation Services (DTS) Best Practices Euan Garden Product Unit Manager SQL Server Development Microsoft Corporation.
Hive. What is Hive? Data warehousing layer on top of Hadoop – table abstractions SQL-like language (HiveQL) for “batch” data processing SQL is translated.
Page 1 © Hortonworks Inc – All Rights Reserved Bay Area Hive Contributor Meetup 16-Nov-2015 LLAP: Live Long and Process!
Capybara Hive Integration Testing. Issues We’ve Seen at Hortonworks Many tests for different permutations –e.g. does it work with Orc, with Parquet, with.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Monitoring Hive: Metrics and WebUI
Lens Server REST API for querying and schema update JDBC Client Java Client CLI Applications – Reporting, Ad Hoc Queries OLAP Cube Metastore Hive (MR)
Cognos 8 BI Configuration, Administration, and Upgrade Cognos 8 BI.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
SQL Injection By Wenonah Abadilla. Topics What is SQL What is SQL Injection Damn Vulnerable Web App SQLI Demo Prepared Statements.
A S P. Outline  The introduction of ASP  Why we choose ASP  How ASP works  Basic syntax rule of ASP  ASP’S object model  Limitations of ASP  Summary.
16 Copyright © 2004, Oracle. All rights reserved. Testing the Migrated Oracle Database.
C Copyright © 2009, Oracle. All rights reserved. Using SQL Developer.
Session Name Pelin ATICI SQL Premier Field Engineer.
Everything you've ever wanted to know about using Control-M to integrate any application workload September 9, 2016 David Fernandez Senior Presales Consultant.
Cameron Blashka | Informer Implementation Specialist
Jean-Philippe Baud, IT-GD, CERN November 2007
HIVE A Warehousing Solution Over a MapReduce Framework
Web Technologies IT230 Dr Mohamed Habib.
Troubleshooting SQL Server When You Cannot Access The Machine
Hardening Hive John Pullokkaran 11/16/2015
Presented by: Warren Sifre
Introduction of Week 6 Assignment Discussion
T-SQL Transact - Structured Query Language (Microsoft)
Overview of Azure Data Lake Store
Advanced PL/SQL Programing
Server & Tools Business
SSDT and Database Project Basics
February 11-13, 2019 Raleigh, NC.
Dynamic Binary Translators and Instrumenters
Introduction to Azure Data Lake
Samir Behara, Senior Developer, EBSCO
Pig Hive HBase Zookeeper
Presentation transcript:

Page 1 © Hortonworks Inc – All Rights Reserved What's new in Hive 2.0 Sergey Shelukhin

Page 2 © Hortonworks Inc – All Rights Reserved What is Hive 2.0? Split in 2015 Hive 1.* is the "more stable" line Receives the bugfixes, some features and improvements Keeps everything backward compatible Hive 2.* is the "more ambitious" line Receives the bugfixes and improvements Also receives all the major new features Deprecates the support for some older features Doesn't mean Hive 2 is unstable Where is Hive 1.3?!!

Page 3 © Hortonworks Inc – All Rights Reserved When is Hive 2.0 coming? The original plan was Dec 2015 Unrealistic – too many blockers, too many features wanting to get in blocker left (hello Eugene )! Some features and improvements about to get in RC 0 expected this week

Page 4 © Hortonworks Inc – All Rights Reserved Hive 2.0 at a (rather blurry) glance project = HIVE AND fixVersion in (2.0.0, llap, spark-branch, hbase-metastore- branch) AND fixVersion not in (1.3.0, 1.2.2, 1.2.1, 1.0.1, 1.1.1, 1.2.0, 1.1.0) AND resolution = Fixed 764 tickets (Hive 2.0 only) – 333 Sub-tasks (remember all those new features?) – 313 bugs (but we mark everything as Bug) – 99 Improvements and Tasks project = HIVE AND fixVersion in (2.0.0, llap, spark-branch, hbase-metastore- branch) AND fixVersion not in (1.2.1, 1.0.1, 1.1.1, 1.2.0, 1.1.0) AND resolution = Fixed 1193 tickets (Hive future Hive 1.3/1.2.2)

Page 5 © Hortonworks Inc – All Rights Reserved Upgraded versions Log4j 1 -> Slf4j/log4j 2 (perf gain – logging doesn't block the thread!) Calcite 1.2 -> 1.5 (new features for CBO) Tez 0.5 -> (perf gains, new features, plugins) Spark > 1.5 (perf gains, new features) (also in Hive 1.3) DataNucleus 3 -> 4, Kryo 2 -> 3, Hbase > 1.1 Parquet 1.6 -> 1.8 (1.7 is also in Hive 1.3) Thrift > 0.9.3, Avro > (also in 1.3), etc.

Page 6 © Hortonworks Inc – All Rights Reserved Breaking things Java 6 no longer supported Hadoop 1 no longer supported on Hive 2 line (is it older than Java 6?) MR is deprecated, but still supported (use Spark or Tez!) Better defaults (enforce.bucketing, metastore schema verification, etc. on by default) Tightened safety settings (fails on some unsafe casts, etc.)

Page 7 © Hortonworks Inc – All Rights Reserved New features #1 HPLSQL LLAP (beta) HBase metastore (alpha) Improvements to Hive-on-Spark Improvements to CBO

Page 8 © Hortonworks Inc – All Rights Reserved New features #2 SQL Standard Auth is the default authorization (actually works) CLI mode for beeline (WIP to replace and deprecate CLI in Hive 2.*) Codahale-based metrics (also in 1.3) HS2 Web UI Stability Improvements and bugfixes for ACID (almost production ready now) Native vectorized mapjoin, vectorized reducesink, improved vectorized GBY, etc. Improvements to Parquet performance (PPD, memory manager, etc.) ORC schema evolution (beta) Improvement to windowing functions, refactoring ORC before split, SIMD optimizations, new LIMIT syntax, parallel compilation in HS2, improvements to Tez session management, many more Did I forget something?

Page 9 © Hortonworks Inc – All Rights Reserved HPLSQL HPL/SQL is a hybrid and heterogeneous language that understands syntaxes and semantics of almost any existing procedural SQL dialect Compatible with Oracle PL/SQL, ANSI/ISO SQL/PSM (IBM DB2, MySQL, Teradata etc.), PostgreSQL PL/pgSQL (Netezza), Transact-SQL (Microsoft SQL Server and Sybase) Key SQL features Flow of Control Statements Built-in Functions Stored Procedures, Functions and Packages Exception and Condition Handling Merged into Hive as hplsql module See hplsql command, docs at

Page 10 © Hortonworks Inc – All Rights Reserved LLAP (beta in 2.0) Sub-second query execution in Hive via persistent daemons Parallel execution and IO optimizations, JIT, etc. Reduces fixed costs like container scheduling Data caching Some limitations in 2.0 (mostly worked around gracefully) Not tested well in secure clusters Tez only (API and Spark integration in progress) User guide shortly after release Demo (in 25 seconds at the end)

Page 11 © Hortonworks Inc – All Rights Reserved HBase metastore (alpha in 2.0) Getting rid of DataNucleus/RDBMS Writes that actually scale! Reads that actually scale without "direct SQL"! No more bizarre errors from different RDBMSes and different JDBC drivers! No need for separate backup solution for metadata No need to maintain upgrade scripts in future New features in progress File metadata cache in HBase with PPD inside HBase, etc. Limitations on 2.0 – rough around the edges Major limitation - no cross-entity transactions (future work with Omid) See

Page 12 © Hortonworks Inc – All Rights Reserved Hive-on-Spark improvements Dynamic partition pruning Make use of spark persistence for self-join union Vectorized mapjoin and other mapjoin improvements Parallel order by Container pre-warm Did I miss anything?

Page 13 © Hortonworks Inc – All Rights Reserved CBO New optimizations More join improvements LIMIT pushdown CBO now supplants many native Hive optimizers PPD, constant propagation, etc. Performance improvements Calcite return path – avoid repeated op tree conversions (alpha)

Page 14 © Hortonworks Inc – All Rights Reserved 30-second demo (in case you missed the previous meetup)

Page 15 © Hortonworks Inc – All Rights Reserved Questions?