Hive Mr. Sriram Email: hadoopsrirama@gmail.com.

Hive Mr. Sriram

Objectives Implement Joins in Hive Implement Dynamic Partitioning
Hive Introduction Hive Concepts Understand What is Hive and its Use Cases Analyse difference between Hive & Pig Understand Hive Architecture and Hive Components Analyse limitations of Hive Implement Primitive and Complex types in Hive Understand Hive Data Model Perform basic Hive operations Execute Hive scripts and Hive UDFs Implement Joins in Hive Implement Dynamic Partitioning Hive Indexes and Views Analyze Custom Map/Reduce Scripts Create Hive UDF

Hive Introduction Hive Introduction Why Hive? Hive Architecture
Hive Configuration Limitations HiveQL Data Types and Table types Managed Table External Table Storage Formats Queries View Hive Data Model The Metastore User Defined Functions Hive Introduction

Hive Introduction Hive is a Data warehousing infrastructure package built on top of Apache hadoop Hive is designed to enable Easy data summarization Ad-hoc querying Analysis of large volumes of data Hive provides a simple query language called Hive QL , Similar to SQL called Hive QL HiveQL allows traditional map/reduce programmers to be able to plug in their custom mappers and reduce Suitable for Structured data and Semi structured data Facebook product No need Java and Hadoop API To invoke the hive $ hive hive>

Hive Introduction – Why Hive / Metastore
Need a multi petabyte warehouse Files are insufficient data abstractions - Need Tables, Schema, Partitions, Indices Need for an open data format RDBMS have a closed data format Flexible schema Metastore Hive stores the schema of hive tables in hive Metastore Metastore is used to hold all the information about the tables and partitions that are in the warehouse. By default Metastore is run on Hive us DERBY DB.

Hive Introduction - Architecture
Hive Architecture Default DB is Derby Database | Other databases are MySQL, Oracle, SQL server

Hive Introduction – Architecture..

Hive Introduction - Configuration
Configuring Hive Download a release at ftp://ftp.nextgen.com Unpack the tarball in a suitable place on your workstation %tar xzf hive-x.y.z-dev.tar.gz Put Hive on your class path %export HIVE_HOME=/home/EmpID/hive-x.y.z-dev %export PATH=$PATH:$HIVE_INSTALL/bin Type hive to launch the shell % hive hive>

Hive Introduction - Limitations
Limitations of Hive Hive sub-query is not supported Hive supports overwriting or appending data but not update or delete Hive is designed for OLAP not OLTP Why hive is not? Hive is not designed for online transaction processing and does not offer real-time queries and row level updates Latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred mega bytes)

Hive Introduction - HiveQL

Hive Introduction – HiveQL Data Types & Table Types
Hive Data Types  Hive supports both complex and primitive datatypes. Primitive Data Types  Signed Integer - TINYINT, SMALLINT, INT, BIGINT  Floating Point - FLOAT, DOUBLE  BOOLEAN  STRING Complex Data Types  ARRAY,MAP and STRUCT Hive Table Types Hive Tables are of two types  Managed Tables  External Tables

Hive Introduction – HiveQL Managed / External Table

Hive Introduction – HiveQL DDL / DML
DDL Commands CREATE, ALTER, DROP, TRUNCATE, SHOW, DESCRIBE DML Commands LOAD, INSERT, SELECT

Hive Introduction – HiveQL DDL Syntax
Table Creation hive> CREATE TABLE <table name> (<column name> <data type>, ...) ROW FORMAT DELIMITED FIELDS TERMINATED BY '<character>'; Alter a Table hive> ALTER TABLE <table name> ADD COLUMN (<column name> <data type>); Drop a Table hive> DROP TABLE <table name>;

Hive Introduction – HiveQL DDL Syntax
Describe table structure hive> DESCRIBE <table name> To show all tables in database hive> SHOW TABLES To load data Into Hive tables hive> LOAD DATA INPATH <file path> INTO TABLE <table name> To Retrieve Data From Hive Tables hive> SELECT * from <table name>

Hive Introduction – HiveQL Sub Query
Hive supports sub queries only in the FROM clause. The columns in the sub query select list are available in the outer query just like columns of a table Example SELECT col FROM ( SELECT col1+col2 AS col FROM table1 ) table2

Hive Introduction – HiveQL Joins
Hive supports only equality joins, outer joins, and left semi joins. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. More than two tables can be joined in Hive Example hive> SELECT table1.*, table2.* hive >FROM table1 JOIN table2 ON (table1.col1 = table2.col1) ;

Hive Introduction – HiveQL View
A view is a sort of “virtual table” that is defined by a SELECT statement Views can be used to present data to users in a different way to the way it is actually stored on disk Syntax CREATE VIEW <TableName> AS SELECT * FROM <TableName> WHERE <Condition>;

Hive Introduction – Hive Data Model

Hive Introduction – Hive Data Model - Partitions

Hive Introduction – Hive Data Model - Buckets

Hive Introduction – Hive Data Model – Buckets..

Hive Introduction – Metastore

Hive Introduction – Configuring hive to have Mysql as Metastore DB

Hive Introduction – User Defined Functions (UDF)

Hive Concepts Understand What is Hive and its Use Cases
Analyse difference between Hive & Pig Understand Hive Architecture and Hive Components Analyse limitations of Hive Implement Primitive and Complex types in Hive Understand Hive Data Model Perform basic Hive operations Execute Hive scripts and Hive UDFs Hive Concepts

Hive Background

Hive Use Case @ Facebook

What is Hive?

What is Hive?..

Where to use Hive?..

Why go for Hive When Pig is there ?

Hive Architecture

Hive Components

MetaStore

Limitations of Hive

Abilities of Hive Query Language

Differences with Traditional RDBMS with Hive

Type System

Operators

Hive Data Models

Partitions

Buckets

Create Database and Table

Create Database and Table..

External Tables

Load Data

Queries

Managing Outputs

Hive Commands Data Definition Language (DDL )
DDL statements are used to build and modify the tables and other objects in the database. Example : CREATE, DROP, TRUNCATE, ALTER, SHOW, DESCRIBE Statements Data Manipulation Language (DML ) DML statements are used to retrieve, store, modify, delete, insert and update data in the database. Example :LOAD, INSERT Statements

Hive Commands Create Database hive> create database retail;
Select Database hive> use retail; Create table for storing transactional records hive> create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile; hive> create external table externaltxnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile location '/hivetable'; To list the databases hive> Show databases;

Hive Commands Load the data into the table [ From Linux client and not HDFS] hive> LOAD DATA LOCAL INPATH '/home/edureka/txns' OVERWRITE INTO TABLE txnrecords; Describing metadata or schema of the table hive> describe txnrecords; Counting no of records hive> select count(*) from txnrecords; Counting total spending by category of products hive> select category, sum(amount) from txnrecords group by category; Limit 10 Customers hive> select custno, sum(amount) from txnrecords group by custno limit 10; Table A --> 1000 rows and has a column called region and the values of the region [ usa,uk,india]. hive> select col1,col2 from A where region='india';

Hive Commands Copy the input data to HDFS from local by using copyFromLocal command hive> Hadoop fs –copyFromLocal /home/cloudera/Desktop/txn1.txt / External Table create external keyword is used to create a table and provides a location where the table will create so that Hive does not use its default location for this table. An EXTERNAL table points to any HDFS location for its storage rather than default storage. create external table example_customer(custno string,firstname string, lastname string,age int,profession string) row format delimited fields terminated by '\t' LOCATION '/user/external'; Insert command Insert command is used to load the data hive table. Insertion can be done to a table or a partition. 2 ways INSERT OVERWRITE is used to overwrite the existing the data in the table or partition. INSERT INTO is used to append the data into existing data in a table. hive> from customer cus insert overwrite table example_customer select cus.custno,cus.firstname,cus.lastname,cus.age;

Hive Commands Partitioning & Clustering
Partitioned by is used to divided the table into partition and can be divided into buckets using 'Clustered By‘ hive> create table txnrecbycat(txnno INT,txndate STRING,custno INT, amount DOUBLE,product STRING, city STRING, state STRING,spendby STRING) partitioned by (Category STRING) clustered by (state) INTO 10 buckets row format delimited fields terminated by ',' stored as textfile; hive> from txnrecords txn INSERT OVERWRITE TABLE record PARTITION(category) select txn.txnno,txn.txndate,txn.custno,txn.amount,...txn.category; hive>set hive.exec.dynamic.partition=true; hive> set hive.exec.dynamic.partition.mode=nonstrict Drop Table hive> drop table customer; drop deletes the data and metadata for a table. in case of external tables only meta data is deleted

Hive Commands Aggregation
hive>Select count (DISTINCT category) from tablename; Grouping : Group command is used to group the result-set by one or more columns. hive>Select category, sum( amount) from txt records group by category; Duplicate Table: The result one table is stored in to another table. hive>Create table newtablename as select * from oldtablename;

Hive Commands Join Commands – First Create table employee | mailid & Load the data hive>create table employee(name string, salary float,city string) row format delimited fields terminated by ',‘; hive>load data local inpath 'emp.txt' into table employee; hive>select * from employee where name='tarun'; hive>create table mailid (name string, string) row format delimited fields terminated by ','; load data local inpath ' .txt' into table mailid; hive>load data local inpath ' .txt' into table mailid; emp.txt swetha,250000,Chennai anamika,200000,Kanyakumari tarun,300000,Pondi anita,250000,Salem .txt

Hive Commands Join select a.name,a.city,a.salary,b. from employee a join mailid b on a.name = b.name; Left Outer Join select a.name,a.city,a.salary,b. from employee a left outer join mailid b on a.name = b.name; Right Outer Join select a.name,a.city,a.salary,b. from employee a right outer join mailid b on a.name = b.name; Full Outer Join select a.name,a.city,a.salary,b. from employee a full outer join mailid b on a.name = b.name;

Hive Script

Hive Script..

Hive Script.. Step1: Writing a script
Create a table ‘product’ in Hive: create table product ( productid: int, productname: string, price: float, category: string) rows format delimited fields terminated by ‘,’ ; Describe the Table : describe product; Load the data into the Table: To load the data into the table, create an input file which contains the records that needs to be inserted into the table. sudo gedit input.txt Create few records in the input text file load data local inpath ‘/home/cloudera/input.txt’ into table product;

Hive Script.. Retrieving the data:
To retrieve the data use select command. command: select * from product; Execute the hive script using the following command: Command: hive –f /home/cloudera/sample.sql Step 2: Execute the Hive Script

Hive Script..

Joining Two Tables

Joining Two Tables..

Hive User Defined Functions (UDF)
HIVE UDF

Revisiting Use Cases in Healthcare

Healthcare UDF

Healthcare UDF..

Assignment

Advanced Hive Implement Joins in Hive Implement Dynamic Partitioning
Hive Indexes and Views Analyze Custom Map/Reduce Scripts Create Hive UDF Advanced Hive

HiveQL: Joining two tables – Sample Tables

HiveQL: Dynamic Partitioning– Configuration

HiveQL: Dynamic Partitioning – Example

HiveQL: Running Custom Map/Reduce Scripts

HiveQL: User Defined Function (UDF)

Hive Index

Hive Views

Hive: Java Client through Thrift Server

Project: Hive Scripting

Project: HDFS to MapReduce Phase

Project: MapReduce to Pig Phase

Project: Pig to Hive Phase

Thank You !!!!!!!!!!!

Hive Mr. Sriram Email: hadoopsrirama@gmail.com.

Similar presentations

Presentation on theme: "Hive Mr. Sriram Email: hadoopsrirama@gmail.com."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hive Mr. Sriram Email: hadoopsrirama@gmail.com.

Similar presentations

Presentation on theme: "Hive Mr. Sriram Email: hadoopsrirama@gmail.com."— Presentation transcript:

Similar presentations

About project

Feedback