Presentation is loading. Please wait.

Presentation is loading. Please wait.

Server & Tools Business

Similar presentations


Presentation on theme: "Server & Tools Business"— Presentation transcript:

1 Server & Tools Business
11/21/2018 Microsoft Big Data Essentials Module 2 - Introduction to Hive and HiveQL Saptak Sen, Microsoft Bill Ramos, Advaiya Hello, this is Saptak Sen again. In this presentation, you’ll learn how to take advantage of your knowledge of Transact-SQL or other SQL languages to run MapReduce jobs on an HDInsight cluster using Hive. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 Server & Tools Business
11/21/2018 Agenda Hive architecture Hive operations Demos Let’s get started with an overview of the Hive architecture and how it works on top of a Hadoop cluster. After that we’ll go over the data model for Hive tables. Then, we’ll follow up with the common operations you can accomplish with Hive and how to connect client tools like Microsoft Excel using the Hive ODBC driver. Finally, we’ll run through a series of demos that will help put everything into perspective. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Server & Tools Business
11/21/2018 Working with Hive In this demo, we’ll show you how to create a Mapper program in C#. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Hive architecture Built on top of Hadoop to provide data management, querying, and analysis Access and query data through simple SQL-like statements, called Hive queries In short, Hive complies, Hadoop executes Hive ODBC JDBC Hive web interface (HWI) Command line interface (CLI) Metastore Thrift server Compiler, Optimizer, Executor Hadoop Talk Track: Hive was initially developed by Facebook so that their developers could process data across their Hadoop file system using a SQL-like query language that they called HiveQL. HiveQL looks a lot like ANSI SQL. That means if you know Transact-SQL you’ll feel comfortable learning Hive. Because the results look like a standard relational database result set, various vendors have created ODBC drivers that interact with Hive results. With Hive, you can execute statements using either the Hive command-line interface on the Hadoop cluster or via an interactive web console like the Hive Interactive console in HDInsight. Applications can send queries and return results via either ODBC or JDBC drivers. Internally, the Hive compile, optimizer, and executor translate HiveQL statements into a directed graph of MapReduce jobs that are submitted to the Hadoop cluster’s head node for execution. As part of the optimization phase, developers can create customized mappers and reducers to extend the functionality of HiveQL. In short, Hive compiles and Hadoop executes. Key Points: HiveQL (Hive Query language): T-SQL like language to query Hadoop data in Hive. Can store/access data in text filed directly in the Azure Storage Account. References: Using Hive with HDInsight: hive-with-hdinsight/ Head node Name node Data nodes/task nodes

5 Create, load, and query Hive tables
HiveQL includes data definition language, data import/export and data manipulation language statements See display/Hive/LanguageManual Create table Import data into Hive table Talk Track: Now let’s take a look at a standard workflow for using Hive. First, you create a table on top of your data. If you want your data to be preserved at the location of the source table, you can create an EXTERNAL table. If you exclude the EXTERNAL clause, the data for the table is moved into the Hive data warehouse. Data can be serialized as delimited text files or as binary sequence files. The advantage to using text files is that tools like Excel can directly load the data from the Hadoop file system or Azure Blob Storage used by the HDInsight cluster. Sequence files provide compression and performance advantages over the text files, but they can only be consumed by external programs using the JDBC or ODBC interface. For .NET developers, you can also consume Hive data in the form of a table using the LINQ to Hive library. HiveQL also has CREATE VIEW syntax to help simplify developer queries that can be referenced in a SELECT command like a table. Data can be imported and exported into the table with IMPORT and EXPORT commands. You can also use the LOAD DATA INPATH command to associate a table with data in your Hadoop cluster. Finally, Hive offers a rich set of SELECT command features for querying data such as the ORDER BY, GROUP BY, JOIN, UNION and sub query clauses. For guidance on syntax and usage, refer to the Hive language manual on Apache.org. Key Points: In Hive EXTERNAL tables, data remains outside the Hadoop cluster. T-SQL skills can be used to query Hive Data. References: Using Hive with HDInsight: hive-with-hdinsight/ Query data using SQL-like statement

6 Demo 1: Create and Load Hive Tables
Server & Tools Business 11/21/2018 Demo 1: Create and Load Hive Tables Batch layer Speed layer Serving layer Windows Azure HDInsight Hive HDInsight Hive Console Talk Track: In this demo, Bill will show you how to use Hive and HDInsight as part of the Batch layer of the Lambda architecture to create a new master dataset and then validate the results using the HDInsight Hive console. [---CLICK---] End-user can create the Hive tables using the Hive Interactive console of the HDInsight cluster. New Hive tables gets created inside the HDInsight cluster. Based on requirements, Hive table can be either partitioned hive tables, or can use CASE Statements, or can be in form of Bucketed tables. The results are visible on the Hive Terminal (Remote access to HDInsight Cluster) or the HIVE Interactive Console (HDInsight Web-Interface) Table partitioning Partitioned Hive table Hive table CASE statement Query results Bucketed table “Cluster by” clause Join Query results Hive table © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Server & Tools Business
11/21/2018 Connecting Hive Data to Excel We’ll now look at working with Hive data in Excel © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Using the Hive ODBC driver
Connector to HDInsight Hive available as part of HDInsight Hadoop clusters Enable business intelligence, analytics, and reporting on data in Hive Configure Hive ODBC driver Hive ODBC data source Talk Track: The Hive ODBC Driver is a software library that implements the Open Database Connectivity (ODBC) API standard for the Hive database management system. This enables ODBC compliant applications to interact seamlessly with Hive through a standard interface. The Microsoft Hive ODBC driver is a connector to Hive running on HDInsight clusters. The Microsoft ODBC driver for Hive enables Business Intelligence, Analytics, and Reporting on data in Apache Hive. Now let’s check out a demo. Key Points: The Hive ODBC driver allows access to Hive data via ODBC Connections. References: How to Connect Excel to Windows Azure HDInsight via HiveODBC: Load Hive tables into PowerPivot for Excel

9 Demo 2: Using the Hive ODBC driver
Server & Tools Business 11/21/2018 Demo 2: Using the Hive ODBC driver Batch Layer Speed Layer Serving Layer Hive Microsoft Excel PowerPivot Talk Track: Now that we have our data in a Hive table, let’s see how to install, configure, and use the Hive ODBC driver to load a table into PowerPivot for Excel. [---CLICK---] After establishing the ODBC connection, end users can request to access the Hive data using PowerPivot (from within the Excel workbook). Hive data moved into the Excel workbook via Hive ODBC connection. PowerPivot for Excel Hive table Hive ODBC connection © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Demo 3: Using Power Query with Hive Results
Server & Tools Business 11/21/2018 Demo 3: Using Power Query with Hive Results Batch Layer Speed Layer Serving Layer Azure Blob storage Microsoft Excel Power Query Talk Track: The new Power Query preview for Excel lets you to import data from a variety of sources and shape it so that you can analyze the results using familiar Excel features. In this example, I’ll show how you can use the Power Query to import Hive data stored on the Windows Azure Blob storage into Excel. [---CLICK---] After establishing the ODBC connection, end users can request to access the Hive data using Power Query (from within the Excel worksheet). Hive data moved into the Excel workbook via ODBC connection. Azure Blob storage files Power Query for Excel with PivotChart © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 Server & Tools Business
11/21/2018 Learn more HDInsight Interactive JavaScript and Hive Console How to Connect Excel to Windows Azure HDInsight via HiveODBC Talk Track: Check out the link to learn more about and get started using HDInsight Interactive JavaScript and Hive Consoles. More information is also available on connecting Excel to Windows Azure HDInsight via HiveODBC. References: HDInsight Interactive JavaScript and Hive Consoles: us/manage/services/hdinsight/interactive-javascript-and-hive-consoles/#createhivetable How to Connect Excel to Windows Azure HDInsight via HiveODBC : © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 Questions?

13


Download ppt "Server & Tools Business"

Similar presentations


Ads by Google