Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.

Slides:



Advertisements
Similar presentations
April 10-12, Chicago, IL Ensuring Compliance of Patient Data with Big Data and BI Ayad Shammout & Denny Lee.
Advertisements

Senior Project Manager & Architect Love Your Data.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
ETM Hadoop. ETM IDC estimate put the size of the “digital universe” at zettabytes in forecasting a tenfold growth by 2011 to.
Microsoft Big Data Essentials Module 1 - Introduction to Big Data
Hadoop Ecosystem Overview
Business Intelligence Overview Marc Schöni Technical Solution Professional | Business Intelligence Microsoft Switzerland.
Hadoop on Azure 101 What is the Big Deal? Dennis Mulder Solution Architect Microsoft Corporation.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Microsoft Azure Introduction ISYS 512. Microsoft Azure Microsoft Azure is a cloud.
SQL Server 2014: The Data Platform for the Cloud.
Server Files Server RUNTIME Code.
Introduction to Hadoop and HDFS
Distributed Systems Fall 2014 Zubair Amjad. Outline Motivation What is Sqoop? How Sqoop works? Sqoop Architecture Import Export Sqoop Connectors Sqoop.
Hive Facebook 2009.
April 10-12, Chicago, IL Driving Smarter Decisions with Microsoft Big Data Tim Mallalieu Group Program Manager, HDInsight.
An Introduction to HDInsight June 27 th,
Fitting Microsoft Hadoop Into Your Enterprise BI Strategy Cindy Gross | SQLCAT PM
Sofia Event Center ноември 2013 г. Маги Наумова/ Боряна Петрова.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Impala. Impala: Goals General-purpose SQL query engine for Hadoop High performance – C++ implementation – runtime code generation (using LLVM) – direct.
Microsoft Cloud Solution.  What is the cloud?  Windows Azure  What services does it offer?  How does it all work?  How to go about using it  Further.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
MSBIC Hadoop Series Hadoop & Microsoft BI Bryan Smith
Andy Roberts Data Architect
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
An Introduction To Big Data For The SQL Server DBA.
Apache Hadoop on Windows Azure Avkash Chauhan
Redmond Protocols Plugfest 2016 Casey Karst PolyBase in SQL Server 2016.
Microsoft Partner since 2011
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith
Microsoft Ignite /28/2017 6:07 PM
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Energy Management Solution
#SQLSat266.
Aga Private computer Institute Prepared by: Srwa Mohammad
Connected Infrastructure
Leveraging a Hadoop Cluster from SQL Server Integration Services
Connected Living Connected Living What to look for Architecture
Data Platform and Analytics Foundational Training
Spark Presentation.
Hadoopla: Microsoft and the Hadoop Ecosystem
Connected Living Connected Living What to look for Architecture
Connected Infrastructure
Energy Management Solution
Azure Machine Learning & ML Studio
Cloudy with a Chance of Data
02 | Design and implement database
HDInsight makes Hadoop Easy
A developers guide to Azure SQL Data Warehouse
07 | Analyzing Big Data with Excel
Cloudy with a Chance of Data
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Server & Tools Business
A developers guide to Azure SQL Data Warehouse
Microsoft Connect /24/ :05 AM
Managing batch processing Transient Azure SQL Warehouse Resource
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Big-Data Analytics with Azure HDInsight
05 | Processing Big Data with Hive
Moving your on-prem data warehouse to cloud. What are your options?
06 | Automating Big Data Processing
Presentation transcript:

Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory Data Warehouse Cloud Scale Real-Time Batch Machine Learning Self-Service Dremel Big Query Hadoop DB Unstructured Reporting Ad-Hoc Pivot Drill Social Data Mining Text Analytics Data Science The Digital Shoebox

Data Refinement Aggregation/Compression/Transformation/Extraction Data Consumption Analysis/Modeling/Query/Reporting/Visualizatio n Data Acquisition Streaming/Trickle/Bulk Transfer

Windows Azure Blob Storage SQL Server Analysis ServicesWindows Azure HDInsight Service Excel, Power View, PowerPivot, Data Explorer Gzip Files & Transfer to ASV via AZCopy ODBC Analyzing Flight Delays Hive HTTP

Distributed Storage (HDFS) Hadoop architecture. Distributed Processing (Map Reduce)

Storage Infrastructure HDInsight Compute Nodes (Large VMs) Azure Storage Vault (ASV) Azure Blob Storage Azure Flat Network Storage

Storage Infrastructure HDInsight Compute Nodes (Large VMs) Azure Storage Vault (ASV) Azure Blob Storage Azure Flat Network Storage Stream data to compute Push data back to storage mapsortshufflereduce

fs.azure.account.key..blob.c ore.microsoft.com

Thur 1pm – 2:15pm DBI-B334 Data Management in Microsoft HDInsight: How to Move and Store Your Data

Data Movement to the Cloud Compress Files [Session Code] Saves about 80-90% space HDInsight supports Gzip, BZ2, and Deflate (Hive) Reduces disk I/O and network traffic Costs less for direct storage costs Microsoft Whitepaper on Compression in HDInsight

Data Movement to the Cloud Move Files [Session Code] Microsoft Solutions AZCopy Portal UI (Small Files) Hadoop Command Line Interface (CLI) Third Party Aspera Attunity CloudBeam

Windows Azure Blob Storage SQL Server Analysis ServicesWindows Azure HDInsight Service Excel, Power View, PowerPivot, Data Explorer Gzip Files & Transfer to ASV via AZCopy ODBC Analyzing Flight Delays Hive HTTP

Data Preparation with Hive & Pig Create structure over files Process and refine data with SQL syntax Generates/runs MapReduce “Data Warehouse” focused Process & shape data Scripting language for ETL/ELT Generates/runs MapReduce

HIVE ARCHITECTURE Hive Hadoop

Data Preparation with Hive Use EXTERNAL when Data used outside Hive You need data to be updatable in real time Data needed when you drop the cluster or the table Hive should not own data and control settings, dirs, etc. Use INTERNAL when You want Hive to manage the data and storage Short term usage Creating table based on existing table (AS SELECT) CREATE EXTERNAL TABLE flights(…column definitions…) fields terminated by ',' lines terminated by '\n' stored as textfile location 'asv://cluster.blob.core.windows.net/flights_raw';

Data Preparation with Hive set mapred.output.compress=true; set hive.exec.compress.output=true; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; CREATE EXTERNAL TABLE flights(…column definitions…) partitioned by (Year string) fields terminated by ',' lines terminated by '\n' stored as textfile LOCATION 'asv://storage.blob.core.windows.net/flights ALTER TABLE flights ADD PARTITION (Month= ‘10’) LOCATION 'asv://storage.blob.core.windows.net/flights/flights_raw_10'; ALTER TABLE flights ADD PARTITION (Year = ‘11’) LOCATION 'asv://storage.blob.core.windows.net/flights/flights_raw_11'; … Statement Level Compression Partition

Hive Best Practices Performance Fewer, larger files are better Partition for range searches Order of tables, columns in queries can affect performance, largest table last! Indexes may help some queries, but have limitations Compress where possible… but be sure that user tools will read compression Operations Supports textfile, sequence file, RCfile, avro Use Hive and XML File Processing for XML filesHive and XML File Processing Remove headers before loading Partition for loading Configuration Configure your SmallFileSize and number of reducers to match your workload

Tuning Your Hive Know before you go! Leverage Best Practices (partitioning, compression, etc.. ) Know your Join Types What size tables are you joining? Did you update your configuration files correctly? How to Use Explain Add EXPLAIN before the last query you run in your batch to generate Abstract Syntax Tree Be careful with LIMIT statement Watch for bottlenecks between MAPS/Reducers – You may need more nodes

Windows Azure Blob Storage SQL Server Analysis ServicesWindows Azure HDInsight Service Excel, Power View, PowerPivot, Data Explorer Gzip Files & Transfer to ASV via AZCopy ODBC Analyzing Flight Delays Hive HTTP

Performance Guide: performance-guide.aspxhttp://sqlcat.com/sqlcat/b/whitepapers/archive/2011/10/10/analysis-services-2008-r2- performance-guide.aspx Operations Guide: services-operations-guide.aspxhttp://sqlcat.com/sqlcat/b/whitepapers/archive/2011/06/01/sql-server-2008r2-analysis- services-operations-guide.aspx

PowerShell to Create a Azure VM Running SQL Server Business Intelligence

34

Microsoft Big Data Denny Lee Carl Nolan Cindy Gross Big Data Resources Hadoop: The Definitive Guide by Tom White SQL Server Sqoop JavaScript Twitter Hive Excel to Hadoop via Hive ODBC Hadoop On Azure Videos Klout HortonWorks Sandbox Azure Data Marketplace ttp://datamarket.azure.com/ttp://datamarket.azure.com/ Top 50 Big Data Influencers PASS Big Data Virtual Chapter

Windows Azure