Apache Hadoop on Windows Azure Avkash Chauhan

Slides:



Advertisements
Similar presentations
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
Advertisements

A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Power BI Sites and Mobile BI. What You Will Learn Sharing and Collaboration Introducing Power BI Exploring Power BI Features and Services Partner Opportunities.
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
MICROSOFT BIG DATA. WHAT IS BIG DATA? How do I optimize my fleet based on weather and traffic patterns? SOCIAL & WEB ANALYTICS LIVE DATA FEEDS ADVANCED.
FAST FORWARD WITH MICROSOFT BIG DATA Vinoo Srinivas M Solutions Specialist Windows Azure (Hadoop, HPC, Media)
19 % System Center FY14 Revenue Growth Large enterprises actively using SC 63% SC customers actively using SCOM 30% SC customers still using.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Running Hadoop-as-a-Service in the Cloud
Transform + analyze Visualize + decide Capture + manage Dat a.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Hadoop Ecosystem Overview
MICROSOFT CONFIDENTIAL Sept 2009 | Page 1 | BDM Presentation.
Introduction to Big Data and Hadoop Name Title Microsoft Corporation.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Cross Platform Mobile Backend with Mobile Services James
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Hadoop and HDFS
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
An Introduction to HDInsight June 27 th,
Amazon Web Services MANEESH MOHANAVILASAM. OLD IS GOLD?...NOT Predicting peaks Developing partnerships Buying and maintaining hardware Upgrading hardware.
Fitting Microsoft Hadoop Into Your Enterprise BI Strategy Cindy Gross | SQLCAT PM
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Windows Azure for IT Pros Kurt CLAEYS (TSP Windows Azure, Microsoft EMEA)
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
Hadoop implementation of MapReduce computational model Ján Vaňo.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Microsoft Cloud Solution.  What is the cloud?  Windows Azure  What services does it offer?  How does it all work?  How to go about using it  Further.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
SQL Server 2012 Session: 1 Session: 4 SQL Azure Data Management Using Microsoft SQL Server.
Azure HDInsight And Excel Analyze unstructured data at scale, then visualize! George Walters Sr. Technical Solutions Professional, Data Platform Microsoft.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Andy Roberts Data Architect
Copyright © New Signature Who we are: Focused on consistently delivering great customer experiences. What we do: We help you transform your business.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Microsoft Partner since 2011
Microsoft Ignite /28/2017 6:07 PM
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Energy Management Solution
BUILD BIG DATA ENTERPRISE SOLUTIONS FASTER ON AZURE HDINSIGHT
Data Platform and Analytics Foundational Training
SAS users meeting in Halifax
Data Platform and Analytics Foundational Training
Hadoop Aakash Kag What Why How 1.
Apache hadoop & Mapreduce
Creating Enterprise Grade BI Models with Azure Analysis Services
Hadoopla: Microsoft and the Hadoop Ecosystem
Energy Management Solution
Design and Implement Cloud Data Platform Solutions
9/13/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Enterprise security for big data solutions on Azure HDInsight
07 | Analyzing Big Data with Excel
Welcome! Power BI User Group (PUG)
Server & Tools Business
Welcome! Power BI User Group (PUG)
Introduction to Apache
Developing for Windows Azure
Big-Data Analytics with Azure HDInsight
02 | Getting Started with HDInsight
Microsoft Virtual Academy
Moving your on-prem data warehouse to cloud. What are your options?
06 | SQL Server and the Cloud
Presentation transcript:

Apache Hadoop on Windows Azure Avkash Chauhan

Agenda Presentation and Demos –Apache Hadoop Scaling in the Cloud –Apache Hadoop on Windows Azure Architecture Demo –Connecting Hadoop using HiveODBC Excel PowerPivot Demo Apache™ Hadoop™ – based services on Windows Azure ™

Project’s Current Status and availability: Limited CTP Release (refresh2) of Apache Hadoop on Windows Azure is available now. Visit: Hadooponazure.com There are no further details about any release date at this time. The details you will see here are part of limited CTP release available for limited users depend on available resources. You might have heard about Apache Hadoop on Windows Server however that is not part of this presentation

Apache Hadoop: Data Hadoop Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Low Cost Deployed on commodity hardware & open source platform Low Cost Deployed on commodity hardware & open source platform Fault Tolerant** Continue working event if node(s) go down Fault Tolerant** Continue working event if node(s) go down Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage & processing, distributed across machines. Intelligence

How Hadoop Works? Data Local FS Amazon S3 Azure Blob Data Local FS Amazon S3 Azure Blob NameNode DataNode JobTracker TaskTracker DataNode TaskTracker DataNode TaskTracker Hadoop Common HDFS Map/Reduce Intelligence

Scaling in Cloud Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis

Usage Scenarios: Data Scientist Administration and Monitoring Visual Studio Windows Deployment Windows Azure Cloud Data Market Services Windows Deployment Windows Azure Cloud Data Market Services Analytics and Data Warehousing & PDW Private Cloud Clusters Active Directory SCOM EXCEL & PowerPivot Enterprise Integration Developer Business User EXCEL Web Shell for Hadoop Write Hadoop Jobs in Shell Visualization Interactive JS Visualization Interactive JS Consume Cloud/Storage Consume BI Platform Large Developer Toolset & Ecosystem Infrastructure Support Large Developer Toolset & Ecosystem Infrastructure Support

Components: Apache Hadoop on Windows Azure Azure Blob Store Amazon S3 HiveODBC Office Excel PowerPivot Hive Pig Mahout Zookeeper Flume Web Shell Monitoring by SCOM Integration with AD Interactive JS Sqoop Avro SQL Azure SQL Server

Running Apache Hadoop in Cloud Reduce the complexity Dramaticall y lower costs Enable flexible connectivity and delivery Business can focus on data and logic not infrastructure Instant Availability Leverage existing cloud services

Running Apache Hadoop in Windows Azure Apache Hadoop on Windows Azure Portal ( hadooponaz ure.com) hadooponaz ure.com Customer login and ask for a Hadoop Cluster with N Nodes: Customer can connect to existing Cloud services Customers can connect different data sources i.e. S3, Azure Storage or copy the data to HDFS HOT Instances are ready to use Hadoop service provision N nodes Hadoop Cluster in X amount of time Upload the Map/Reduce job to the cluster and start the Job

Apache Hadoop on Azure Portal:

Apache Hadoop on Windows Azure Demo

Connecting Excel to Hadoop Cluster on Windows Azure Have HadooponAz ure cluster ready Install HiveODBC Driver (64Bit) on a machine which will connect to Hadoop Cluster Configure Hadoop Cluster in System DSN Be sure to have HiveODBC port Open in Hadoop Cluster Verify that Microsoft Excel Bit shows Hive Panel in Data Tab Connect Hadoop Cluster from Excel 2010

Connecting Excel to Hadoop Cluster on Windows Azure

Connecting PoverPivot to Hadoop Cluster on Windows Azure Have HadooponAz ure cluster ready Install HiveODBC Driver (64Bit) on a machine which will connect to Hadoop Cluster Configure Hadoop Cluster Connection Using HiveODBC 64bit Driver Be sure to have HiveODBC port Open in Hadoop Cluster Launch PowerPivot from Microsoft Excel /64Bit Import Hive Tables to PowerPivot

Connecting PoverPivot to Hadoop Cluster on Windows Azure

Resources: Hadoop-based Services For Windows (en-US) on Technet: s/articles/6204.hadoop-based-services-for- windows-en-us.aspx My Hadoop on Azure Specific Blog on MSDN: an/archive/tags/hadoop/ Denny Lee Blog: twitter: