Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro.

Similar presentations


Presentation on theme: "Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro."— Presentation transcript:

1 Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro

2 Installation Guide(1) How to install Hive v0.13.1 Requirements Java 1.6 (example use java-7-openjdk) Hadoop 0.20.x, 0.23.x, or 2.0.x (example use Hadoop 2.5.1 in pseudo mode)

3 Installation Guide(2) Download Hive from a Stable Release http://apache.mirror.cdnetworks.com/hive/hive-0.13.1/apache-hive-0.13.1- bin.tar.gz http://apache.mirror.cdnetworks.com/hive/hive-0.13.1/apache-hive-0.13.1- bin.tar.gz Extract the tar files and move it to preferred location (example use /usr/local/hive) tar –xvzf hive-x.y.z.tar.gz mv hive-x.y.z /usr/local/hive Modify ~/.bashrc and add the following statement in the last line Export HIVE_HOME=/usr/local/hive Export PATH=$HIVE_HOME/bin:$PATH source ~/.bashrc

4 Configuration Guide(1) Hive uses Hadoop, so modify ~/.bashrc to add Hadoop in the path or add the following statement export HADOOP_HOME= (example use /usr/local/hadoop) Start hadoop dfs and yarn start-dfs.sh start-yarn.sh

5 Configuration Guide(2) Create /tmp and /user/hive/warehouse in the HDFS and set them chmod g+2 hadoop fs –mkdir /tmp hadoop fs –mkdir /user/hive/warehouse hadoop fs –chmod g+w /tmp hadoop fs –chmod g+w /user/hive/warehouse

6 Configuration Guide(3) Go to /usr/local/hive/conf cd /usr/local/hive/conf Change the name of these configuration files template hive-env.sh.template  hive-env.sh hive-default.xml.template  hive-default.xml hive-exec-log4j.properties.template  hive-exec-log4j.properties hive-log4j.properties.template  hive-log4j.properties

7 Configuration Guide(4) Create new file, add these statement below and save as hive-site.xml fs.defaultFS hdfs://localhost:9000 mapred.job.tracker localhost:50030

8 Configuration Guide(5) Open file hive-env.sh Uncomment HADOOP_HOME and HIVE_CONF_DIR and modify it like below export HADOOP_HOME=/usr/local/hadoop export HIVE_CONF_DIR=/usr/local/hive/conf Run hive CLI Hive Note : if the configuration is correct, all table created will exist in HDFS /user/hive/warehouse

9 Practical Example(1) Download example data from http://seanlahman.com/files/database/lahman591-csv.zip Extract the file, we will use Batting.csv data Copy the data into HDFS hadoop fs -put /home/hduser/Downloads/Batting.csv /user/hive Enter hive cli

10 Practical Example(2) Create table temp_batting create table temp_batting(col_value string); Load data from Batting.csv to temp_batting load data inpath ’user/hive/Batting.csv’ overwrite into table temp_batting; To see the data format select * from temp_batting;

11 Practical Example(2) Create new table batting create table batting(player_id string, year int, runs int); Extract information from temp_batting to batting insert overwrite table batting select regexp_extract(col_value, ‘^(?:([^,]*)\,?){1}’, 1) player_id, regexp_extract(col_value, ‘^(?:([^,]*)\,?){2}’, 1) year, regexp_extract(col_value, ‘^(?:([^,]*)\,?){9}’, 1) runs from temp_batting; View the resulting table select * from batting;

12 Practical Example(3) Find the highest run for each year select year, max(runs) from batting group by year; Find the corresponding player for highest run each year select a.year, a.player_id, a.runs from batting a join (select year,max(runs) runs from batting group by year) b on (a.year = b.year and a.runs = b.runs) ; Delete table temp_batting drop table temp_batting;

13 Screenshot of Practical Example

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28


Download ppt "Hive Installation Guide and Practical Example Lecturer : Prof. Kyungbaek Kim Presenter : Alvin Prayuda Juniarta Dwiyantoro."

Similar presentations


Ads by Google