07 | Analyzing Big Data with Excel Graeme Malcolm | Data Technology Specialist, Content Master Pete Harris | Learning Product Planner, Microsoft
Module Overview Microsoft Excel as a Data Analysis Tool The Hive ODBC Driver Importing Data from HDFS with Power Query Transferring Data to a Database with Sqoop
Why Excel? World’s most popular data analysis tool Built-in data modelling support and PowerPivot Supports a HUGE range of data sources for mash-up analysis Great data visualization tools Data bars and conditional formatting Charts PivotTables and PivotCharts Slicers and timelines Power View and Power Map Enterprise and cloud sharing capabilities with Power BI
Excel Data Models All Excel workbooks support a tabular data model PowerPivot enables complex self-service data modeling PowerPivot workbooks can be shared: SharePoint Server Office 365 Power BI
Power View Interactive data visualization Charts update dynamically as data is added or filters applied
Power Map Animated Tours of geographic data Show data changes over time
Importing Data from HDInsight Database HDFS Hive Power Query OLE DB / Power Query Excel ODBC Query Hive tables using ODBC Import data from HDFS with Power Query Export data from HDInsight to a relational database
Hive ODBC Driver Download and install the Hive ODBC Driver for HDInsight Create a data source name (DSN) for your HDInsight cluster Use the Data Connection Wizard in Excel to import data
Demo: Hive ODBC In this demonstration, you will see how to: Create an ODBC DSN for Hive Import a Hive table into a PowerPivot Data Model Visualize Data from Hive with Power View
What is Power Query? An Excel add-in that enables users to: Find and import data from external sources Search public data Combine and shape data from multiple sources Filter, sort, and group data Add data to a workbook data model Save queries in a workbook for reuse Share queries with other Power BI users* * Requires a Microsoft Office 365 Power BI account
Using Power Query with HDInsight Windows Azure HDInsight source Browses Windows Azure Storage Cluster does not need to be active Typically used to access output files from Map/Reduce processing Further filtering and shaping can be performed in Power Query
Demo: Power Query In this demonstration, you will see how to: Import Data from HDInsight with Power Query Visualize Data from HDInsight with Power Map
What is Sqoop? Database integration service in Windows Azure HDInsight Open Source Hadoop technology Uses JDBC to connect to databases Initiate Sqoop jobs from: Hadoop command line PowerShell .NET SDK for Hadoop Actions in Oozie workflows
Demo: Exporting Data with Sqoop In this demonstration, you will see how to: Provision Windows Azure SQL Database Export Data with Sqoop Import Data from SQL Database into Excel
Module Summary Excel is a comprehensive tool for self-service data analysis Import data from Hive using the ODBC driver Import data from HDFS using Power Query Use Sqoop to transfer data from HDFS to a relational database