Hive Mr. Sriram Email: hadoopsrirama@gmail.com.

Slides:

Advertisements

Similar presentations

CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.

Advertisements

Hive - A Warehousing Solution Over a Map-Reduce Framework.

Introduction to Hive Liyin Tang

Hive: A data warehouse on Hadoop

Introduction to Structured Query Language (SQL)

Introduction to Structured Query Language (SQL)

A Guide to MySQL 3. 2 Objectives Start MySQL and learn how to use the MySQL Reference Manual Create a database Change (activate) a database Create tables.

CS525: Big Data Analytics MapReduce Languages Fall 2013 Elke A. Rundensteiner 1.

Copying, Managing, and Transforming Data With DTS.

A warehouse solution over map-reduce framework Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff.

Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.

Hive – A Warehousing Solution Over a Map-Reduce Framework Presented by: Atul Bohara Feb 18, 2014.

DATABASES AND SQL. Introduction Relation: Relation means table(data is arranged in rows and columns) Domain : A domain is a pool of values appearing in.

DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.

Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.

Session 5: Working with MySQL iNET Academy Open Source Web Development.

ASP.NET Programming with C# and SQL Server First Edition

Database Technical Session By: Prof. Adarsh Patel.

Hive : A Petabyte Scale Data Warehouse Using Hadoop

Cloud Computing Other High-level parallel processing languages Keke Chen.

Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.

Hive Facebook 2009.

CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.

NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.

CPS120: Introduction to Computer Science Lecture 19 Introduction to SQL.

Structure Query Language SQL. Database Terminology Employee ID 3 3 Last name Small First name Tony 5 5 Smith James

SQL SQL Server : Overview SQL : Overview Types of SQL Database : Creation Tables : Creation & Manipulation Data : Creation & Manipulation Data : Retrieving.

A NoSQL Database - Hive Dania Abed Rabbou.

6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.

A Guide to MySQL 3. 2 Introduction  Structured Query Language (SQL): Popular and widely used language for retrieving and manipulating database data Developed.

SQL Unit – 2 Base Knowledge Presented By Mr. R.Aravindhan.

1 DBS201: Introduction to Structure Query Language (SQL) Lecture 1.

SQL Fundamentals SQL: Structured Query Language is a simple and powerful language used to create, access, and manipulate data and structure in the database.

SQL Basics. What is SQL? SQL stands for Structured Query Language. SQL lets you access and manipulate databases.

Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.

SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,

AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.

Hive. What is Hive? Data warehousing layer on top of Hadoop – table abstractions SQL-like language (HiveQL) for “batch” data processing SQL is translated.

(SQL - Structured Query Language)

Distribution of Marks For Second Semester Internal Sessional Evaluation External Evaluation Assignment /Project QuizzesClass Attendance Mid-Term Test Total.

Starting with Oracle SQL Plus. Today in the lab… Connect to SQL Plus – your schema. Set up two tables. Find the tables in the catalog. Insert four rows.

7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.

Apache Hive CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.

 CONACT UC:  Magnific training   

CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri.

SQL Introduction SQL stands for “Structured Query Language” and can be pronounced as “SQL” or “sequel – (Structured English.

3 A Guide to MySQL.

HIVE A Warehousing Solution Over a MapReduce Framework

Mail call Us: / / Hadoop Training Sathya technologies is one of the best Software Training Institute.

Web Systems & Technologies

SQL Query Getting to the data ……..

Advanced Accounting Information Systems

Oracle & SQL Introduction

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

Sqoop Mr. Sriram

A Warehousing Solution Over a Map-Reduce Framework

Hadoop EcoSystem B.Ramamurthy.

MANAGING DATA RESOURCES

Introduction to PIG, HIVE, HBASE & ZOOKEEPER

Server & Tools Business

SQL Fundamentals in Three Hours

Pig - Hive - HBase - Zookeeper

CSE 491/891 Lecture 21 (Pig).

CSE 491/891 Lecture 24 (Hive).

Contents Preface I Introduction Lesson Objectives I-2

Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper

05 | Processing Big Data with Hive

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

Pig Hive HBase Zookeeper

Presentation transcript:

Hive Mr. Sriram Email: hadoopsrirama@gmail.com

Objectives Implement Joins in Hive Implement Dynamic Partitioning Hive Introduction Hive Concepts Understand What is Hive and its Use Cases Analyse difference between Hive & Pig Understand Hive Architecture and Hive Components Analyse limitations of Hive Implement Primitive and Complex types in Hive Understand Hive Data Model Perform basic Hive operations Execute Hive scripts and Hive UDFs Implement Joins in Hive Implement Dynamic Partitioning Hive Indexes and Views Analyze Custom Map/Reduce Scripts Create Hive UDF

Hive Introduction Hive Introduction Why Hive? Hive Architecture Hive Configuration Limitations HiveQL Data Types and Table types Managed Table External Table Storage Formats Queries View Hive Data Model The Metastore User Defined Functions Hive Introduction

Hive Introduction Hive is a Data warehousing infrastructure package built on top of Apache hadoop Hive is designed to enable Easy data summarization Ad-hoc querying Analysis of large volumes of data Hive provides a simple query language called Hive QL , Similar to SQL called Hive QL HiveQL allows traditional map/reduce programmers to be able to plug in their custom mappers and reduce Suitable for Structured data and Semi structured data Facebook product No need Java and Hadoop API To invoke the hive $ hive hive>

Hive Introduction – Why Hive / Metastore Need a multi petabyte warehouse Files are insufficient data abstractions - Need Tables, Schema, Partitions, Indices Need for an open data format RDBMS have a closed data format Flexible schema Metastore Hive stores the schema of hive tables in hive Metastore Metastore is used to hold all the information about the tables and partitions that are in the warehouse. By default Metastore is run on Hive us DERBY DB.

Hive Introduction - Architecture Hive Architecture Default DB is Derby Database | Other databases are MySQL, Oracle, SQL server

Hive Introduction – Architecture..

Hive Introduction – Architecture..

Hive Introduction – Architecture..

Hive Introduction - Configuration Configuring Hive Download a release at ftp://ftp.nextgen.com Unpack the tarball in a suitable place on your workstation %tar xzf hive-x.y.z-dev.tar.gz Put Hive on your class path %export HIVE_HOME=/home/EmpID/hive-x.y.z-dev %export PATH=$PATH:$HIVE_INSTALL/bin Type hive to launch the shell % hive hive>

Hive Introduction - Limitations Limitations of Hive Hive sub-query is not supported Hive supports overwriting or appending data but not update or delete Hive is designed for OLAP not OLTP Why hive is not? Hive is not designed for online transaction processing and does not offer real-time queries and row level updates Latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred mega bytes)

Hive Introduction - Limitations Limitations of Hive Hive sub-query is not supported Hive supports overwriting or appending data but not update or delete Hive is designed for OLAP not OLTP Why hive is not? Hive is not designed for online transaction processing and does not offer real-time queries and row level updates Latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred mega bytes)

Hive Introduction - HiveQL

Hive Introduction – HiveQL Data Types & Table Types Hive Data Types  Hive supports both complex and primitive datatypes. Primitive Data Types  Signed Integer - TINYINT, SMALLINT, INT, BIGINT  Floating Point - FLOAT, DOUBLE  BOOLEAN  STRING Complex Data Types  ARRAY,MAP and STRUCT Hive Table Types Hive Tables are of two types  Managed Tables  External Tables

Hive Introduction – HiveQL Managed / External Table

Hive Introduction – HiveQL DDL / DML DDL Commands CREATE, ALTER, DROP, TRUNCATE, SHOW, DESCRIBE DML Commands LOAD, INSERT, SELECT

Hive Introduction – HiveQL DDL Syntax Table Creation hive> CREATE TABLE <table name> (<column name> <data type>, ...) ROW FORMAT DELIMITED FIELDS TERMINATED BY '<character>'; Alter a Table hive> ALTER TABLE <table name> ADD COLUMN (<column name> <data type>); Drop a Table hive> DROP TABLE <table name>;

Hive Introduction – HiveQL DDL Syntax Describe table structure hive> DESCRIBE <table name> To show all tables in database hive> SHOW TABLES To load data Into Hive tables hive> LOAD DATA INPATH <file path> INTO TABLE <table name> To Retrieve Data From Hive Tables hive> SELECT * from <table name>

Hive Introduction – HiveQL Sub Query Hive supports sub queries only in the FROM clause. The columns in the sub query select list are available in the outer query just like columns of a table Example SELECT col FROM ( SELECT col1+col2 AS col FROM table1 ) table2

Hive Introduction – HiveQL Joins Hive supports only equality joins, outer joins, and left semi joins. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. More than two tables can be joined in Hive Example hive> SELECT table1.*, table2.* hive >FROM table1 JOIN table2 ON (table1.col1 = table2.col1) ;

Hive Introduction – HiveQL View A view is a sort of “virtual table” that is defined by a SELECT statement Views can be used to present data to users in a different way to the way it is actually stored on disk Syntax CREATE VIEW <TableName> AS SELECT * FROM <TableName> WHERE <Condition>;

Hive Introduction – Hive Data Model

Hive Introduction – Hive Data Model - Partitions

Hive Introduction – Hive Data Model - Buckets

Hive Introduction – Hive Data Model – Buckets..

Hive Introduction – Metastore

Hive Introduction – Configuring hive to have Mysql as Metastore DB

Hive Introduction – Configuring hive to have Mysql as Metastore DB

Hive Introduction – User Defined Functions (UDF)

Hive Introduction – User Defined Functions (UDF)

Hive Concepts Understand What is Hive and its Use Cases Analyse difference between Hive & Pig Understand Hive Architecture and Hive Components Analyse limitations of Hive Implement Primitive and Complex types in Hive Understand Hive Data Model Perform basic Hive operations Execute Hive scripts and Hive UDFs Hive Concepts

Hive Background

Hive Use Case @ Facebook

What is Hive?

What is Hive?..

Where to use Hive?..

Why go for Hive When Pig is there ?

Why go for Hive When Pig is there ?

Hive Architecture

Hive Components

MetaStore

Limitations of Hive

Abilities of Hive Query Language

Differences with Traditional RDBMS with Hive

Type System

Operators

Hive Data Models

Partitions

Buckets

Create Database and Table

Create Database and Table..

External Tables

Load Data

Queries

Managing Outputs

Hive Commands Data Definition Language (DDL ) DDL statements are used to build and modify the tables and other objects in the database. Example : CREATE, DROP, TRUNCATE, ALTER, SHOW, DESCRIBE Statements Data Manipulation Language (DML ) DML statements are used to retrieve, store, modify, delete, insert and update data in the database. Example :LOAD, INSERT Statements

Hive Commands Create Database hive> create database retail; Select Database hive> use retail; Create table for storing transactional records hive> create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile; hive> create external table externaltxnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile location '/hivetable'; To list the databases hive> Show databases;

Hive Commands Load the data into the table [ From Linux client and not HDFS] hive> LOAD DATA LOCAL INPATH '/home/edureka/txns' OVERWRITE INTO TABLE txnrecords; Describing metadata or schema of the table hive> describe txnrecords; Counting no of records hive> select count(*) from txnrecords; Counting total spending by category of products hive> select category, sum(amount) from txnrecords group by category; Limit 10 Customers hive> select custno, sum(amount) from txnrecords group by custno limit 10; Table A --> 1000 rows and has a column called region and the values of the region [ usa,uk,india]. hive> select col1,col2 from A where region='india';

Hive Commands Copy the input data to HDFS from local by using copyFromLocal command hive> Hadoop fs –copyFromLocal /home/cloudera/Desktop/txn1.txt / External Table create external keyword is used to create a table and provides a location where the table will create so that Hive does not use its default location for this table. An EXTERNAL table points to any HDFS location for its storage rather than default storage. create external table example_customer(custno string,firstname string, lastname string,age int,profession string) row format delimited fields terminated by '\t' LOCATION '/user/external'; Insert command Insert command is used to load the data hive table. Insertion can be done to a table or a partition. 2 ways INSERT OVERWRITE is used to overwrite the existing the data in the table or partition. INSERT INTO is used to append the data into existing data in a table. hive> from customer cus insert overwrite table example_customer select cus.custno,cus.firstname,cus.lastname,cus.age;

Hive Commands Partitioning & Clustering Partitioned by is used to divided the table into partition and can be divided into buckets using 'Clustered By‘ hive> create table txnrecbycat(txnno INT,txndate STRING,custno INT, amount DOUBLE,product STRING, city STRING, state STRING,spendby STRING) partitioned by (Category STRING) clustered by (state) INTO 10 buckets row format delimited fields terminated by ',' stored as textfile; hive> from txnrecords txn INSERT OVERWRITE TABLE record PARTITION(category) select txn.txnno,txn.txndate,txn.custno,txn.amount,...txn.category; hive>set hive.exec.dynamic.partition=true; hive> set hive.exec.dynamic.partition.mode=nonstrict Drop Table hive> drop table customer; drop deletes the data and metadata for a table. in case of external tables only meta data is deleted

Hive Commands Aggregation hive>Select count (DISTINCT category) from tablename; Grouping : Group command is used to group the result-set by one or more columns. hive>Select category, sum( amount) from txt records group by category; Duplicate Table: The result one table is stored in to another table. hive>Create table newtablename as select * from oldtablename;

Hive Commands Join Commands – First Create table employee | mailid & Load the data hive>create table employee(name string, salary float,city string) row format delimited fields terminated by ',‘; hive>load data local inpath 'emp.txt' into table employee; hive>select * from employee where name='tarun'; hive>create table mailid (name string, email string) row format delimited fields terminated by ','; load data local inpath 'email.txt' into table mailid; hive>load data local inpath 'email.txt' into table mailid; emp.txt swetha,250000,Chennai anamika,200000,Kanyakumari tarun,300000,Pondi anita,250000,Salem email.txt swetha,swetha@gmail.com tarun,tarun@edureka.in nagesh,nagesh@yahoo.com venkatesh,venki@gmail.com

Hive Commands Join select a.name,a.city,a.salary,b.email from employee a join mailid b on a.name = b.name; Left Outer Join select a.name,a.city,a.salary,b.email from employee a left outer join mailid b on a.name = b.name; Right Outer Join select a.name,a.city,a.salary,b.email from employee a right outer join mailid b on a.name = b.name; Full Outer Join select a.name,a.city,a.salary,b.email from employee a full outer join mailid b on a.name = b.name;

Hive Script

Hive Script..

Hive Script.. Step1: Writing a script Create a table ‘product’ in Hive: create table product ( productid: int, productname: string, price: float, category: string) rows format delimited fields terminated by ‘,’ ; Describe the Table : describe product; Load the data into the Table: To load the data into the table, create an input file which contains the records that needs to be inserted into the table. sudo gedit input.txt Create few records in the input text file load data local inpath ‘/home/cloudera/input.txt’ into table product;

Hive Script.. Retrieving the data: To retrieve the data use select command. command: select * from product; Execute the hive script using the following command: Command: hive –f /home/cloudera/sample.sql Step 2: Execute the Hive Script

Hive Script..

Joining Two Tables

Joining Two Tables..

Joining Two Tables..

Joining Two Tables..

Hive User Defined Functions (UDF) HIVE UDF

Revisiting Use Cases in Healthcare

Healthcare UDF

Healthcare UDF..

Healthcare UDF..

Healthcare UDF..

Healthcare UDF..

Healthcare UDF..

Assignment

Advanced Hive Implement Joins in Hive Implement Dynamic Partitioning Hive Indexes and Views Analyze Custom Map/Reduce Scripts Create Hive UDF Advanced Hive

HiveQL: Joining two tables – Sample Tables

HiveQL: Joining two tables – Sample Tables

HiveQL: Joining two tables – Sample Tables

HiveQL: Joining two tables – Sample Tables

HiveQL: Dynamic Partitioning– Configuration

HiveQL: Dynamic Partitioning – Example

HiveQL: Dynamic Partitioning – Example

HiveQL: Running Custom Map/Reduce Scripts

HiveQL: Running Custom Map/Reduce Scripts

HiveQL: Running Custom Map/Reduce Scripts

HiveQL: Running Custom Map/Reduce Scripts

HiveQL: Running Custom Map/Reduce Scripts

HiveQL: User Defined Function (UDF)

Hive Index

Hive Index

Hive Index

Hive Index

Hive Views

Hive Views

Hive Views

Hive: Java Client through Thrift Server

Project: Hive Scripting

Project: HDFS to MapReduce Phase

Project: MapReduce to Pig Phase

Project: Pig to Hive Phase

Thank You !!!!!!!!!!!