Hadoop and the Modern Data Architecture

Slides:



Advertisements
Similar presentations
Power BI Sites and Mobile BI. What You Will Learn Sharing and Collaboration Introducing Power BI Exploring Power BI Features and Services Partner Opportunities.
Advertisements

HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Setting Big Data Capabilities Free How to Make Business on Big Data? Stig Torngaard, Partner Platon.
A Fast Growing Market. Interesting New Players Lyzasoft.
19 % System Center FY14 Revenue Growth Large enterprises actively using SC 63% SC customers actively using SCOM 30% SC customers still using.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Running Hadoop-as-a-Service in the Cloud
Introduction to Building a BI Solution 권오주 OLAPForum
Microsoft Cloud Services Training and Certification Presented by Name Goes Here, Title.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Page 1 © Hortonworks Inc – All Rights Reserved Hortonworks Naser Ali UK Building Energy Management Group Hadoop: A Data platform for businesses.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Analytics Map Reduce Query Insight Hive Pig Hadoop SQL Map Reduce Business Intelligence Predictive Operational Interactive Visualization Exploratory.
SharePoint 2010 Business Intelligence Module 2: Business Intelligence.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Virtual techdays INDIA │ November 2010 PowerPivot for Excel 2010 and SharePoint 2010 Joy Rathnayake │ MVP.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to Hadoop and HDFS
Increasing Manufacturing Uptime Is Made Easier with RtTech’s Industrial Facilities Application RtDuet, Powered by the Microsoft Azure Cloud MICROSOFT AZURE.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
SAM for SQL Workloads Presenter Name.
CloudWay.ro Gives Clients Fast Invoicing, Stock Management, and Resource Planning via Microsoft Azure and Azure SQL Database MICROSOFT AZURE ISV PROFILE:
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
The VERSO Product Returns Portal Incorporates Office 365 Outlook and Excel Add-Ins to Create Seamless Workflow for All Participating Users OFFICE 365 APP.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Andy Roberts Data Architect
Slide 1 © 2016, Lera Technologies. All Rights Reserved. SAP BO vs SPLUNK vs OBIEE By Lera Technologies.
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
MICROSOFT AZURE APP BUILDER PROFILE: RAVERUS LTD. Raverus is a customer-driven company engaged in providing software applications designed to improve and.
An Introduction To Big Data For The SQL Server DBA.
BIG DATA/ Hadoop Interview Questions.
Apache Hadoop on Windows Azure Avkash Chauhan
Microsoft Partner since 2011
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
Microsoft Cognitive Services and Cortana Analytics
Microsoft Ignite /28/2017 6:07 PM
SQL Server 2008 R2 Report Builder 3.0 SQL Server 2008 Feature Pack Report Builder 2.0 SQL Server 2008 General Availability Authoring & Collaboration (Acquisition:
BI 202 Data in the Cloud Creating SharePoint 2013 BI Solutions using Azure 6/20/2014 SharePoint Fest NYC.
Business Insights Play briefing deck.
Connected Infrastructure
Data Platform and Analytics Foundational Training
Connected Living Connected Living What to look for Architecture
Smart Building Solution
Parcel Tracking Solution Parcel Tracking What to look for Architecture
Hybrid Management and Security
Smart Building Solution
Hadoopla: Microsoft and the Hadoop Ecosystem
Connected Living Connected Living What to look for Architecture
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Connected Infrastructure
Remote Monitoring solution
Pentaho 7.1.
Cloudy with a Chance of Data
Enterprise security for big data solutions on Azure HDInsight
Migrating Your BI Platform To Azure
Technical Capabilities
Big-Data Analytics with Azure HDInsight
Customer 360.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Hadoop and the Modern Data Architecture Josh Fennessy Principal, BlueGranite © 2016 BlueGranite, Inc. All rights reserved.

Agenda The Modern Data Architecture What is Hadoop? HDInsight 101 Customer Use Cases Tips from the Real World Q & A © 2016 BlueGranite, Inc. All rights reserved.

Traditional data architecture 4/27/2017 Traditional data architecture Server logs Medical Devices Plant-floor Equipment Search Sales Finance Inventory CRM HR Social Email Comments Documents Resumes © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Modern data architecture 4/27/2017 Modern data architecture Server logs Medical Devices Plant-floor Equipment Search Sales Finance Inventory CRM HR Social Email Comments Documents Resumes © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Traditional data architecture 4/27/2017 Traditional data architecture Operational BI OLAP Data Marts Self-service BI Governance EDW Integration Staging Structured Data Unstructured Data Streaming Data Sales Finance Inventory CRM HR Social Email Comments Documents Resumes Server logs Medical Devices Plant-floor Equipment Search © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Modern data architecture Governance Integration 4/27/2017 Modern data architecture Self-service BI Operational BI OLAP Data Marts EDW Self-service BI Hadoop Data Lake Staging Data Science Governance Operational Integration Structured Data Unstructured Data Streaming Data Curated Active Archive Analytics Sandbox Persistent © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

What is Hadoop? Distributed Processing Framework Storage, processing and analytics for large-scale data problems Optimized for petabyte scale Designed to handle all data Schema on read vs. schema on write Enable all users to get to insights faster Built to run on commodity hardware From a few to 000s of servers Fault tolerant © 2016 BlueGranite, Inc. All rights reserved.

4/27/2017 7:04 PM Why learn Hadoop? © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Infrastructure as a Service Deployment Options Apache Hadoop Commercial Distribution (Hortonworks / Cloudera / MapR) On-premises Microsoft HDInsight Amazon Elastic MapReduce Google Cloud Managed Cloud Service Microsoft Azure Amazon Web Services Cloudbreak Infrastructure as a Service © 2016 BlueGranite, Inc. All rights reserved.

Cloud vs. On-premises Why go Cloud? Why NOT go Cloud? Simple, fast configuration Cost management Dev/Test Proof-of-Concept Needs-based deployment Integration Why NOT go Cloud? Slower release cycle Reduced control of configuration Data security / security policies Connectivity Cost Performance © 2016 BlueGranite, Inc. All rights reserved.

Cloud vs. On-premises Why go Cloud? Why NOT go Cloud? Simple, fast configuration Cost management Dev/Test Proof-of-Concept Needs-based deployment Integration Why NOT go Cloud? Slower release cycle Reduced control of configuration Data security / security policies Connectivity Cost Performance © 2016 BlueGranite, Inc. All rights reserved.

HDInsight 101 What © 2016 BlueGranite, Inc. All rights reserved.

Hadoop History 2004 – Two research papers written at Google Big Table – A method for storing very large data tables as collections of key-value paris Map Reduce – A method for scaling out processing operations over many servers 2006 – Yahoo Develops the Hadoop framework 2008 – Hadoop becomes a top-level Apache project 2009 – Cloudera announces the first commercial distribution of Hadoop 2011 – Hortonworks formed to provide pure open-source Hadoop for the enterprise 2012 – Microsoft and Amazon release cloud-based Hadoop distributions © 2016 BlueGranite, Inc. All rights reserved.

HDInsight – What It Is Hortonworks HDP (2.2 or 2.3) © 2016 BlueGranite, Inc. All rights reserved.

HDInsight – What It Is Key Points Fully managed environment. No need to administer individual cluster machines Optimized for Azure Storage Blobs or Data Lake Store Can run on Windows or Linux cluster Linux preferred Can deploy non-standard configurations and customize cluster Must script with PowerShell (Windows) or BASH (Linux) Online Scale Out Add new machines to the cluster without downtime. However, individual machine scale-up (more memory/procs) requires downtime © 2016 BlueGranite, Inc. All rights reserved.

HDInsight 101 What How © 2016 BlueGranite, Inc. All rights reserved.

HDInsight – How to Get It © 2016 BlueGranite, Inc. All rights reserved.

HDInsight – How to Get It (Recap) Login to Windows Azure Portal (New) Click on Create -> Data + Analytics Select HDInsight Enter configuration options Cluster Name Cluster Type Cluster OS Wait for a bit Log in to Ambari and get to work! © 2016 BlueGranite, Inc. All rights reserved.

HDInsight 101 What How Why © 2016 BlueGranite, Inc. All rights reserved.

Why Use HDInsight Ease of Use Virtually No Administration Very fast deployment No need to learn new OS Price Cost relatively low compared to on premises Can control costs based on processing needs Maintenance All maintenance handled by Microsoft No downtime for upgrades Self-healing cluster and hardware © 2016 BlueGranite, Inc. All rights reserved.

HDInsight 101 What How Why Uses © 2016 BlueGranite, Inc. All rights reserved.

Data Warehouse Extension How I’ve Used HDInsight with My Customers Data Warehouse Extension Constrained by limits in their Netezza Appliance Generate 3 - 5 TB per day for their largest customers. 25 - 30 TB total per day. Only able to keep about 3 months raw transactions online © 2016 BlueGranite, Inc. All rights reserved.

3 years transaction data! 150TB+ How I’ve Used HDInsight with My Customers Data Warehouse Extension SUCCESS! 3 years transaction data! 150TB+ Raw Data Files © 2016 BlueGranite, Inc. All rights reserved.

Manufacturing Optimization How I’ve Used HDInsight with My Customers Manufacturing Optimization Fortune 100 Auto Manufacturer looking to optimize manufacturing process Scrap – Machines pick up parts using a vacuum tool. Except when they don’t. Missing a part pickup costs a lot of money at aggregate Cycle Time – Analysis of how much time it takes the machine to start and finish one process on one part. © 2016 BlueGranite, Inc. All rights reserved.

How I’ve Used HDInsight with My Customers SUCCESS! Identified many opportunities to improve processes! With $1,000,000s in savings Manufacturing Optimization Machine Events © 2016 BlueGranite, Inc. All rights reserved.

HDInsight 101 What How Why Uses Tips from the real world © 2016 BlueGranite, Inc. All rights reserved.

Script Cluster Creation Tip #1 Script Cluster Creation You may think you have your cluster configured correctly the first time. You don’t Script it with PowerShell to make edits and redeployments a breeze Added bonus – use Azure Automation to drop cluster during unused times. The data is still available. © 2016 BlueGranite, Inc. All rights reserved.

Cluster Customizations Need to Be Scripted Tip #2 Cluster Customizations Need to Be Scripted Just because you CAN log in to a server and make a change… Doesn’t mean you should Use Script Actions to make your custom configurations during cluster execution. If you don’t. Your configurations WILL be lost at some point. © 2016 BlueGranite, Inc. All rights reserved.

Use Linux Tip #3 HDInsight is available in Windows or Linux Ignore Windows Linux deployment is much more complete. You get Ambari. No *useable* UI with Windows deployment Not just my tip. The HDInsight Product Team and Premier Support suggest it as well. © 2016 BlueGranite, Inc. All rights reserved.

Remember Tip #3 HDInsight is Hadoop It’s not a Microsoft tool It’s not an RDBMS Use it for what you can’t do easily with an RDBMS. Don’t replace your RDBMS with it. It won’t perform as well. © 2016 BlueGranite, Inc. All rights reserved.

Files copied and Distributed to multiple nodes Simplified Hadoop Architecture Files copied and Distributed to multiple nodes Data Nodes Name Node Add file to HDFS Network Transfer © 2016 BlueGranite, Inc. All rights reserved.

MapReduce The basis of how data is retrieved from Hadoop. All (most) other protocols translate to MapReduce Mapper 1 Reducer 1 I 1 would 1 not 1 like 1 them 1 here 1 nor 1 there 1 I would not like them here nor there. I would not like them anywhere. I 2 would 2 not 2 like 2 them 2 here 1 nor 1 there 1 anywhere 1 Mapper 1 I 1 would 1 not 1 like 1 them 1 anywhere 1 © 2016 BlueGranite, Inc. All rights reserved.

MapReduce vs. Tez Tez MapReduce Group t2 by a Group t1 by a Reducer Mapper HDFS Group t1 by a Group t2 by a Join t1 t2 Order by Tez Mapper Reducer Order by Join t1 t2 Group t1 by a Group t2 by a © 2016 BlueGranite, Inc. All rights reserved.

It can get a little confusing… Atlas Sentry Ambari Zookeeper Drill Falcon Lucene LLAMA Tez Mahout Cascading Knox Ganglia Oozie MapReduce Solr Crunch Ranger Parquet Kite Impala Presto HDFS Hbase Flume Hue Sqoop Pig Hive Spark Jupyter YARN Slider Flink Accumulo DataFu Storm Hcatalog Avro Kafka Nagios Phoenix Zeppelin © 2016 BlueGranite, Inc. All rights reserved.

Advanced Use Cases Financial Services Predict trading risk Customer sentiment analysis Fraud Detection Retail 360-degree view of the customer Personalized promotions Supply-chain optimization Healthcare Real-time patient monitoring Genomic Analysis Outbreak identification Manufacturing Scrap analysis Failure prediction Quality Control © 2016 BlueGranite, Inc. All rights reserved.

Demo – Twitter Data Analytics Introduction to Hive Parsing JSON data Shaping and querying data Using BI tools © 2016 BlueGranite, Inc. All rights reserved.

Hive Overview Hive properties SQL for Hadoop! PB Scale 4/27/2017 7:04 PM Hive Overview SQL for Hadoop! PB Scale If you know SQL, you know Hive! Hive properties Easier than Map Reduce Schema on read Supports user code Reads multiple formats HiveQL Ad-hoc querying SerDes No OLTP Best use: Batch over large data sets © 2016 BlueGranite, Inc. All rights reserved. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Hive Known as the data warehouse of Hadoop, Hive is a protocol and metadata layer that allows for the creation of MapReduce code with a SQL like language called HiveQL. Hive Table Query Submission Metadata Partitions Indexes Views UDFs Usage MapRed Translation Execution /Results HDFS or WASB © 2016 BlueGranite, Inc. All rights reserved.

Comparing RDBMS and Hive Structure Schema On Write Schema On Read Access SQL Indexes Yes Updates Yes (new in 0.14) Referential Integrity No Cost Based Query Optimization OLTP © 2016 BlueGranite, Inc. All rights reserved.

This is an AMA section. Questions are free range. Any topic. Q & A This is an AMA section. Questions are free range. Any topic. © 2016 BlueGranite, Inc. All rights reserved.

Thank you! Josh Fennessy Principal, BlueGranite © 2016 BlueGranite, Inc. All rights reserved.

Helping clients gain insights from their data Business Insights. Delivered. BlueGranite provides end-to-end business analytics solutions using the Microsoft platform. Data Management Business Analytics Data Science Helping clients gain insights from their data Enable the organization to store and analyze large volumes of structured and non-structured data with optimized systems that can scale to meet demand. Help your team understand your performance and prescribe actions through interactive dashboards, reports and predictive analysis. Capitalize on your data assets using the latest machine learning and predictive techniques to garner new insights faster. Founded in 1997, BlueGranite partners with Microsoft and HP to deploy analytics platform, data analytics, and predictive solutions BlueGranite delivers the modern data platform for business analytics built on SQL Server, Analytics Platform System and Apache Hadoop Our clients utilize robust data visualizations to collaborate on data discoveries, integrating analytics into existing business applications We have multiple team members with the Microsoft Partner TSP (P-TSP) designation that work out of Microsoft Technology Centers Our team’s certifications include SCRUM Master, SQL Business Intelligence and Data Platform, and Microsoft SQL Server MVP as well as Apache Hadoop Developer and Administrator Serving the U.S. with offices in Midwest, Heartland, Northeast, South Central, and Southeast Districts 877-817-0736 www.blue-granite.com

System Center Marketing 4/27/2017 Engage with BlueGranite Strategic Roadmap Our senior team of architects and consultants will lead you through discovery and planning workshops to design a strategy for analytics across the organization. Our experienced consultants have learned to navigate through complex requirements, system dependencies, and data quality issues that may be facing your organization. Analytics solutions Our team of data scientists and BI developers work directly with your team to design, build, and deploy analytics and reporting solutions using the latest BI technologies. We create robust, automated reporting and analysis tools that help our clients harness the full potential of their data, providing insights your team needs to perform their best. Training Workshops Through briefings, discussions, scripted labs and live analysis of your own data, we will teach your team the skills they need to gain insights more quickly and efficiently. Workshop components include both technical and business oriented content, helping teams become self-reliant with their BI tools. Managed Services Expert support for data warehouse, BI and analytics systems. We provide an efficient, reliable, lower cost alternative to maintain and enhance your portfolio of solutions. Our onsite team responds to requests within 1 hour, and our automated support management system tracks activities and status to ensure you have visibility into our shared success. BlueGranite, Inc. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.