Presentation is loading. Please wait.

Presentation is loading. Please wait.

Apache Hadoop on Windows Azure Avkash Chauhan

Similar presentations


Presentation on theme: "Apache Hadoop on Windows Azure Avkash Chauhan"— Presentation transcript:

1 Apache Hadoop on Windows Azure Avkash Chauhan (@avkashchauhan)

2 Agenda Presentation and Demos –Apache Hadoop Scaling in the Cloud –Apache Hadoop on Windows Azure Architecture Demo –Connecting Hadoop using HiveODBC Excel PowerPivot Demo Apache™ Hadoop™ – based services on Windows Azure ™

3 Project’s Current Status and availability: Limited CTP Release (refresh2) of Apache Hadoop on Windows Azure is available now. Visit: Hadooponazure.com There are no further details about any release date at this time. http://blogs.technet.com/b/dataplatforminsider/archive/2012/04/02/sql-server-2012-is-generally-available.aspx The details you will see here are part of limited CTP release available for limited users depend on available resources. You might have heard about Apache Hadoop on Windows Server however that is not part of this presentation

4 Apache Hadoop: Data Hadoop Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Low Cost Deployed on commodity hardware & open source platform Low Cost Deployed on commodity hardware & open source platform Fault Tolerant** Continue working event if node(s) go down Fault Tolerant** Continue working event if node(s) go down Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage & processing, distributed across machines. Intelligence

5 How Hadoop Works? Data Local FS Amazon S3 Azure Blob Data Local FS Amazon S3 Azure Blob NameNode DataNode JobTracker TaskTracker DataNode TaskTracker DataNode TaskTracker Hadoop Common HDFS Map/Reduce Intelligence

6 Scaling in Cloud Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis Resources Data Analysis

7 Usage Scenarios: Data Scientist Administration and Monitoring Visual Studio Windows Deployment Windows Azure Cloud Data Market Services Windows Deployment Windows Azure Cloud Data Market Services Analytics and Data Warehousing & PDW Private Cloud Clusters Active Directory SCOM EXCEL & PowerPivot Enterprise Integration Developer Business User EXCEL Web Shell for Hadoop Write Hadoop Jobs in Shell Visualization Interactive JS Visualization Interactive JS Consume Cloud/Storage Consume BI Platform Large Developer Toolset & Ecosystem Infrastructure Support Large Developer Toolset & Ecosystem Infrastructure Support

8 Components: Apache Hadoop on Windows Azure Azure Blob Store Amazon S3 HiveODBC Office Excel PowerPivot Hive Pig Mahout Zookeeper Flume Web Shell Monitoring by SCOM Integration with AD Interactive JS Sqoop Avro SQL Azure SQL Server

9 Running Apache Hadoop in Cloud Reduce the complexity Dramaticall y lower costs Enable flexible connectivity and delivery Business can focus on data and logic not infrastructure Instant Availability Leverage existing cloud services

10 Running Apache Hadoop in Windows Azure Apache Hadoop on Windows Azure Portal (http://www. hadooponaz ure.com)http://www. hadooponaz ure.com Customer login and ask for a Hadoop Cluster with N Nodes: Customer can connect to existing Cloud services Customers can connect different data sources i.e. S3, Azure Storage or copy the data to HDFS HOT Instances are ready to use Hadoop service provision N nodes Hadoop Cluster in X amount of time Upload the Map/Reduce job to the cluster and start the Job

11 Apache Hadoop on Azure Portal:

12 Apache Hadoop on Windows Azure Demo

13 Connecting Excel to Hadoop Cluster on Windows Azure Have HadooponAz ure cluster ready Install HiveODBC Driver (64Bit) on a machine which will connect to Hadoop Cluster Configure Hadoop Cluster in System DSN Be sure to have HiveODBC port 10000 Open in Hadoop Cluster Verify that Microsoft Excel 2010 64Bit shows Hive Panel in Data Tab Connect Hadoop Cluster from Excel 2010

14 Connecting Excel to Hadoop Cluster on Windows Azure

15 Connecting PoverPivot to Hadoop Cluster on Windows Azure Have HadooponAz ure cluster ready Install HiveODBC Driver (64Bit) on a machine which will connect to Hadoop Cluster Configure Hadoop Cluster Connection Using HiveODBC 64bit Driver Be sure to have HiveODBC port 10000 Open in Hadoop Cluster Launch PowerPivot from Microsoft Excel 2010 32/64Bit Import Hive Tables to PowerPivot

16 Connecting PoverPivot to Hadoop Cluster on Windows Azure http://dennyglee.com/2012/01/21/connecting-powerpivot-to-hadoop-on-azure-self-service-bi-to-big-data-in-the-cloud/

17 Resources: Hadoop-based Services For Windows (en-US) on Technet: http://social.technet.microsoft.com/wiki/content s/articles/6204.hadoop-based-services-for- windows-en-us.aspx My Hadoop on Azure Specific Blog on MSDN: http://blogs.msdn.com/b/avkashchauh an/archive/tags/hadoop/ Denny Lee Blog: http://dennyglee.com/ email: avkashc@microsoft.com twitter: @avkashchauhanavkashc@microsoft.com

18


Download ppt "Apache Hadoop on Windows Azure Avkash Chauhan"

Similar presentations


Ads by Google