Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hive Mr. Sriram Email: hadoopsrirama@gmail.com.

Similar presentations


Presentation on theme: "Hive Mr. Sriram Email: hadoopsrirama@gmail.com."— Presentation transcript:

1 Hive Mr. Sriram

2 Objectives Implement Joins in Hive Implement Dynamic Partitioning
Hive Introduction Hive Concepts Understand What is Hive and its Use Cases Analyse difference between Hive & Pig Understand Hive Architecture and Hive Components Analyse limitations of Hive Implement Primitive and Complex types in Hive Understand Hive Data Model Perform basic Hive operations Execute Hive scripts and Hive UDFs Implement Joins in Hive Implement Dynamic Partitioning Hive Indexes and Views Analyze Custom Map/Reduce Scripts Create Hive UDF

3 Hive Introduction Hive Introduction Why Hive? Hive Architecture
Hive Configuration Limitations HiveQL Data Types and Table types Managed Table External Table Storage Formats Queries View Hive Data Model The Metastore User Defined Functions Hive Introduction

4 Hive Introduction Hive is a Data warehousing infrastructure package built on top of Apache hadoop Hive is designed to enable Easy data summarization Ad-hoc querying Analysis of large volumes of data Hive provides a simple query language called Hive QL , Similar to SQL called Hive QL HiveQL allows traditional map/reduce programmers to be able to plug in their custom mappers and reduce Suitable for Structured data and Semi structured data Facebook product No need Java and Hadoop API To invoke the hive $ hive hive>

5 Hive Introduction – Why Hive / Metastore
Need a multi petabyte warehouse Files are insufficient data abstractions - Need Tables, Schema, Partitions, Indices Need for an open data format RDBMS have a closed data format Flexible schema Metastore Hive stores the schema of hive tables in hive Metastore Metastore is used to hold all the information about the tables and partitions that are in the warehouse. By default Metastore is run on Hive us DERBY DB.

6 Hive Introduction - Architecture
Hive Architecture Default DB is Derby Database | Other databases are MySQL, Oracle, SQL server

7 Hive Introduction – Architecture..

8 Hive Introduction – Architecture..

9 Hive Introduction – Architecture..

10 Hive Introduction - Configuration
Configuring Hive Download a release at ftp://ftp.nextgen.com Unpack the tarball in a suitable place on your workstation %tar xzf hive-x.y.z-dev.tar.gz Put Hive on your class path %export HIVE_HOME=/home/EmpID/hive-x.y.z-dev %export PATH=$PATH:$HIVE_INSTALL/bin Type hive to launch the shell % hive hive>

11 Hive Introduction - Limitations
Limitations of Hive Hive sub-query is not supported Hive supports overwriting or appending data but not update or delete Hive is designed for OLAP not OLTP Why hive is not? Hive is not designed for online transaction processing and does not offer real-time queries and row level updates Latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred mega bytes)

12 Hive Introduction - Limitations
Limitations of Hive Hive sub-query is not supported Hive supports overwriting or appending data but not update or delete Hive is designed for OLAP not OLTP Why hive is not? Hive is not designed for online transaction processing and does not offer real-time queries and row level updates Latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred mega bytes)

13 Hive Introduction - HiveQL

14 Hive Introduction – HiveQL Data Types & Table Types
Hive Data Types  Hive supports both complex and primitive datatypes. Primitive Data Types  Signed Integer - TINYINT, SMALLINT, INT, BIGINT  Floating Point - FLOAT, DOUBLE  BOOLEAN  STRING Complex Data Types  ARRAY,MAP and STRUCT Hive Table Types Hive Tables are of two types  Managed Tables  External Tables

15 Hive Introduction – HiveQL Managed / External Table

16 Hive Introduction – HiveQL DDL / DML
DDL Commands CREATE, ALTER, DROP, TRUNCATE, SHOW, DESCRIBE DML Commands LOAD, INSERT, SELECT

17 Hive Introduction – HiveQL DDL Syntax
Table Creation hive> CREATE TABLE <table name> (<column name> <data type>, ...) ROW FORMAT DELIMITED FIELDS TERMINATED BY '<character>'; Alter a Table hive> ALTER TABLE <table name> ADD COLUMN (<column name> <data type>); Drop a Table hive> DROP TABLE <table name>;

18 Hive Introduction – HiveQL DDL Syntax
Describe table structure hive> DESCRIBE <table name> To show all tables in database hive> SHOW TABLES To load data Into Hive tables hive> LOAD DATA INPATH <file path> INTO TABLE <table name> To Retrieve Data From Hive Tables hive> SELECT * from <table name>

19 Hive Introduction – HiveQL Sub Query
Hive supports sub queries only in the FROM clause. The columns in the sub query select list are available in the outer query just like columns of a table Example SELECT col FROM ( SELECT col1+col2 AS col FROM table1 ) table2

20 Hive Introduction – HiveQL Joins
Hive supports only equality joins, outer joins, and left semi joins. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. More than two tables can be joined in Hive Example hive> SELECT table1.*, table2.* hive >FROM table1 JOIN table2 ON (table1.col1 = table2.col1) ;

21 Hive Introduction – HiveQL View
A view is a sort of “virtual table” that is defined by a SELECT statement Views can be used to present data to users in a different way to the way it is actually stored on disk Syntax CREATE VIEW <TableName> AS SELECT * FROM <TableName> WHERE <Condition>;

22 Hive Introduction – Hive Data Model

23 Hive Introduction – Hive Data Model - Partitions

24 Hive Introduction – Hive Data Model - Buckets

25 Hive Introduction – Hive Data Model – Buckets..

26 Hive Introduction – Metastore

27 Hive Introduction – Configuring hive to have Mysql as Metastore DB

28 Hive Introduction – Configuring hive to have Mysql as Metastore DB

29 Hive Introduction – User Defined Functions (UDF)

30 Hive Introduction – User Defined Functions (UDF)

31 Hive Concepts Understand What is Hive and its Use Cases
Analyse difference between Hive & Pig Understand Hive Architecture and Hive Components Analyse limitations of Hive Implement Primitive and Complex types in Hive Understand Hive Data Model Perform basic Hive operations Execute Hive scripts and Hive UDFs Hive Concepts

32 Hive Background

33 Hive Use Case @ Facebook

34 What is Hive?

35 What is Hive?..

36 Where to use Hive?..

37 Why go for Hive When Pig is there ?

38 Why go for Hive When Pig is there ?

39 Hive Architecture

40 Hive Components

41 MetaStore

42 Limitations of Hive

43 Abilities of Hive Query Language

44 Differences with Traditional RDBMS with Hive

45 Type System

46 Operators

47 Hive Data Models

48 Partitions

49 Buckets

50 Create Database and Table

51 Create Database and Table..

52 External Tables

53 Load Data

54 Queries

55 Managing Outputs

56 Hive Commands Data Definition Language (DDL )
DDL statements are used to build and modify the tables and other objects in the database. Example : CREATE, DROP, TRUNCATE, ALTER, SHOW, DESCRIBE Statements Data Manipulation Language (DML ) DML statements are used to retrieve, store, modify, delete, insert and update data in the database. Example :LOAD, INSERT Statements

57 Hive Commands Create Database hive> create database retail;
Select Database hive> use retail; Create table for storing transactional records hive> create table txnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile; hive> create external table externaltxnrecords(txnno INT, txndate STRING, custno INT, amount DOUBLE, category STRING, product STRING, city STRING, state STRING, spendby STRING) row format delimited fields terminated by ',' stored as textfile location '/hivetable'; To list the databases hive> Show databases;

58 Hive Commands Load the data into the table [ From Linux client and not HDFS] hive> LOAD DATA LOCAL INPATH '/home/edureka/txns' OVERWRITE INTO TABLE txnrecords; Describing metadata or schema of the table hive> describe txnrecords; Counting no of records hive> select count(*) from txnrecords; Counting total spending by category of products hive> select category, sum(amount) from txnrecords group by category; Limit 10 Customers hive> select custno, sum(amount) from txnrecords group by custno limit 10; Table A --> 1000 rows and has a column called region and the values of the region [ usa,uk,india]. hive> select col1,col2 from A where region='india';

59 Hive Commands Copy the input data to HDFS from local by using copyFromLocal command hive> Hadoop fs –copyFromLocal /home/cloudera/Desktop/txn1.txt / External Table create external keyword is used to create a table and provides a location where the table will create so that Hive does not use its default location for this table. An EXTERNAL table points to any HDFS location for its storage rather than default storage. create external table example_customer(custno string,firstname string, lastname string,age int,profession string) row format delimited fields terminated by '\t' LOCATION '/user/external'; Insert command Insert command is used to load the data hive table. Insertion can be done to a table or a partition. 2 ways INSERT OVERWRITE is used to overwrite the existing the data in the table or partition. INSERT INTO is used to append the data into existing data in a table. hive> from customer cus insert overwrite table example_customer select cus.custno,cus.firstname,cus.lastname,cus.age;

60 Hive Commands Partitioning & Clustering
Partitioned by is used to divided the table into partition and can be divided into buckets using 'Clustered By‘ hive> create table txnrecbycat(txnno INT,txndate STRING,custno INT, amount DOUBLE,product STRING, city STRING, state STRING,spendby STRING) partitioned by (Category STRING) clustered by (state) INTO 10 buckets row format delimited fields terminated by ',' stored as textfile; hive> from txnrecords txn INSERT OVERWRITE TABLE record PARTITION(category) select txn.txnno,txn.txndate,txn.custno,txn.amount,...txn.category; hive>set hive.exec.dynamic.partition=true; hive> set hive.exec.dynamic.partition.mode=nonstrict Drop Table hive> drop table customer; drop deletes the data and metadata for a table. in case of external tables only meta data is deleted

61 Hive Commands Aggregation
hive>Select count (DISTINCT category) from tablename; Grouping : Group command is used to group the result-set by one or more columns. hive>Select category, sum( amount) from txt records group by category; Duplicate Table: The result one table is stored in to another table. hive>Create table newtablename as select * from oldtablename;

62 Hive Commands Join Commands – First Create table employee | mailid & Load the data hive>create table employee(name string, salary float,city string) row format delimited fields terminated by ',‘; hive>load data local inpath 'emp.txt' into table employee; hive>select * from employee where name='tarun'; hive>create table mailid (name string, string) row format delimited fields terminated by ','; load data local inpath ' .txt' into table mailid; hive>load data local inpath ' .txt' into table mailid; emp.txt swetha,250000,Chennai anamika,200000,Kanyakumari tarun,300000,Pondi anita,250000,Salem .txt

63 Hive Commands Join select a.name,a.city,a.salary,b. from employee a join mailid b on a.name = b.name; Left Outer Join select a.name,a.city,a.salary,b. from employee a left outer join mailid b on a.name = b.name; Right Outer Join select a.name,a.city,a.salary,b. from employee a right outer join mailid b on a.name = b.name; Full Outer Join select a.name,a.city,a.salary,b. from employee a full outer join mailid b on a.name = b.name;

64 Hive Script

65 Hive Script..

66 Hive Script.. Step1: Writing a script
Create a table ‘product’ in Hive: create table product ( productid: int, productname: string, price: float, category: string) rows format delimited fields terminated by ‘,’ ; Describe the Table : describe product; Load the data into the Table: To load the data into the table, create an input file which contains the records that needs to be inserted into the table. sudo gedit input.txt Create few records in the input text file load data local inpath ‘/home/cloudera/input.txt’ into table product;

67 Hive Script.. Retrieving the data:
To retrieve the data use select command. command: select * from product; Execute the hive script using the following command: Command: hive –f /home/cloudera/sample.sql Step 2: Execute the Hive Script

68 Hive Script..

69 Joining Two Tables

70 Joining Two Tables..

71 Joining Two Tables..

72 Joining Two Tables..

73 Hive User Defined Functions (UDF)
HIVE UDF

74 Revisiting Use Cases in Healthcare

75 Healthcare UDF

76 Healthcare UDF..

77 Healthcare UDF..

78 Healthcare UDF..

79 Healthcare UDF..

80 Healthcare UDF..

81 Assignment

82 Advanced Hive Implement Joins in Hive Implement Dynamic Partitioning
Hive Indexes and Views Analyze Custom Map/Reduce Scripts Create Hive UDF Advanced Hive

83 HiveQL: Joining two tables – Sample Tables

84 HiveQL: Joining two tables – Sample Tables

85 HiveQL: Joining two tables – Sample Tables

86 HiveQL: Joining two tables – Sample Tables

87 HiveQL: Dynamic Partitioning– Configuration

88 HiveQL: Dynamic Partitioning – Example

89 HiveQL: Dynamic Partitioning – Example

90 HiveQL: Running Custom Map/Reduce Scripts

91 HiveQL: Running Custom Map/Reduce Scripts

92 HiveQL: Running Custom Map/Reduce Scripts

93 HiveQL: Running Custom Map/Reduce Scripts

94 HiveQL: Running Custom Map/Reduce Scripts

95 HiveQL: User Defined Function (UDF)

96 Hive Index

97 Hive Index

98 Hive Index

99 Hive Index

100 Hive Views

101 Hive Views

102 Hive Views

103 Hive: Java Client through Thrift Server

104 Project: Hive Scripting

105 Project: HDFS to MapReduce Phase

106 Project: MapReduce to Pig Phase

107 Project: Pig to Hive Phase

108 Thank You !!!!!!!!!!!


Download ppt "Hive Mr. Sriram Email: hadoopsrirama@gmail.com."

Similar presentations


Ads by Google