Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hadoop and the Modern Data Architecture

Similar presentations


Presentation on theme: "Hadoop and the Modern Data Architecture"— Presentation transcript:

1 Hadoop and the Modern Data Architecture
Josh Fennessy Principal, BlueGranite © 2016 BlueGranite, Inc. All rights reserved.

2 Agenda The Modern Data Architecture What is Hadoop? HDInsight 101
Customer Use Cases Tips from the Real World Q & A © 2016 BlueGranite, Inc. All rights reserved.

3 Traditional data architecture
4/27/2017 Traditional data architecture Server logs Medical Devices Plant-floor Equipment Search Sales Finance Inventory CRM HR Social Comments Documents Resumes © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 Modern data architecture
4/27/2017 Modern data architecture Server logs Medical Devices Plant-floor Equipment Search Sales Finance Inventory CRM HR Social Comments Documents Resumes © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5

6 Traditional data architecture
4/27/2017 Traditional data architecture Operational BI OLAP Data Marts Self-service BI Governance EDW Integration Staging Structured Data Unstructured Data Streaming Data Sales Finance Inventory CRM HR Social Comments Documents Resumes Server logs Medical Devices Plant-floor Equipment Search © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Modern data architecture
Governance Integration 4/27/2017 Modern data architecture Self-service BI Operational BI OLAP Data Marts EDW Self-service BI Hadoop Data Lake Staging Data Science Governance Operational Integration Structured Data Unstructured Data Streaming Data Curated Active Archive Analytics Sandbox Persistent © 2016 BlueGranite, Inc. All rights reserved. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 What is Hadoop? Distributed Processing Framework
Storage, processing and analytics for large-scale data problems Optimized for petabyte scale Designed to handle all data Schema on read vs. schema on write Enable all users to get to insights faster Built to run on commodity hardware From a few to 000s of servers Fault tolerant © 2016 BlueGranite, Inc. All rights reserved.

9 4/27/2017 7:04 PM Why learn Hadoop? © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 Infrastructure as a Service
Deployment Options Apache Hadoop Commercial Distribution (Hortonworks / Cloudera / MapR) On-premises Microsoft HDInsight Amazon Elastic MapReduce Google Cloud Managed Cloud Service Microsoft Azure Amazon Web Services Cloudbreak Infrastructure as a Service © 2016 BlueGranite, Inc. All rights reserved.

11 Cloud vs. On-premises Why go Cloud? Why NOT go Cloud?
Simple, fast configuration Cost management Dev/Test Proof-of-Concept Needs-based deployment Integration Why NOT go Cloud? Slower release cycle Reduced control of configuration Data security / security policies Connectivity Cost Performance © 2016 BlueGranite, Inc. All rights reserved.

12 Cloud vs. On-premises Why go Cloud? Why NOT go Cloud?
Simple, fast configuration Cost management Dev/Test Proof-of-Concept Needs-based deployment Integration Why NOT go Cloud? Slower release cycle Reduced control of configuration Data security / security policies Connectivity Cost Performance © 2016 BlueGranite, Inc. All rights reserved.

13 HDInsight 101 What © 2016 BlueGranite, Inc. All rights reserved.

14 Hadoop History 2004 – Two research papers written at Google
Big Table – A method for storing very large data tables as collections of key-value paris Map Reduce – A method for scaling out processing operations over many servers 2006 – Yahoo Develops the Hadoop framework 2008 – Hadoop becomes a top-level Apache project 2009 – Cloudera announces the first commercial distribution of Hadoop 2011 – Hortonworks formed to provide pure open-source Hadoop for the enterprise 2012 – Microsoft and Amazon release cloud-based Hadoop distributions © 2016 BlueGranite, Inc. All rights reserved.

15 HDInsight – What It Is Hortonworks HDP (2.2 or 2.3)
© 2016 BlueGranite, Inc. All rights reserved.

16 HDInsight – What It Is Key Points
Fully managed environment. No need to administer individual cluster machines Optimized for Azure Storage Blobs or Data Lake Store Can run on Windows or Linux cluster Linux preferred Can deploy non-standard configurations and customize cluster Must script with PowerShell (Windows) or BASH (Linux) Online Scale Out Add new machines to the cluster without downtime. However, individual machine scale-up (more memory/procs) requires downtime © 2016 BlueGranite, Inc. All rights reserved.

17 HDInsight 101 What How © 2016 BlueGranite, Inc. All rights reserved.

18 HDInsight – How to Get It
© 2016 BlueGranite, Inc. All rights reserved.

19 HDInsight – How to Get It (Recap)
Login to Windows Azure Portal (New) Click on Create -> Data + Analytics Select HDInsight Enter configuration options Cluster Name Cluster Type Cluster OS Wait for a bit Log in to Ambari and get to work! © 2016 BlueGranite, Inc. All rights reserved.

20 HDInsight 101 What How Why © 2016 BlueGranite, Inc. All rights reserved.

21 Why Use HDInsight Ease of Use Virtually No Administration
Very fast deployment No need to learn new OS Price Cost relatively low compared to on premises Can control costs based on processing needs Maintenance All maintenance handled by Microsoft No downtime for upgrades Self-healing cluster and hardware © 2016 BlueGranite, Inc. All rights reserved.

22 HDInsight 101 What How Why Uses
© 2016 BlueGranite, Inc. All rights reserved.

23 Data Warehouse Extension
How I’ve Used HDInsight with My Customers Data Warehouse Extension Constrained by limits in their Netezza Appliance Generate TB per day for their largest customers TB total per day. Only able to keep about 3 months raw transactions online © 2016 BlueGranite, Inc. All rights reserved.

24 3 years transaction data! 150TB+
How I’ve Used HDInsight with My Customers Data Warehouse Extension SUCCESS! 3 years transaction data! 150TB+ Raw Data Files © 2016 BlueGranite, Inc. All rights reserved.

25 Manufacturing Optimization
How I’ve Used HDInsight with My Customers Manufacturing Optimization Fortune 100 Auto Manufacturer looking to optimize manufacturing process Scrap – Machines pick up parts using a vacuum tool. Except when they don’t. Missing a part pickup costs a lot of money at aggregate Cycle Time – Analysis of how much time it takes the machine to start and finish one process on one part. © 2016 BlueGranite, Inc. All rights reserved.

26 How I’ve Used HDInsight with My Customers
SUCCESS! Identified many opportunities to improve processes! With $1,000,000s in savings Manufacturing Optimization Machine Events © 2016 BlueGranite, Inc. All rights reserved.

27 HDInsight 101 What How Why Uses Tips from the real world
© 2016 BlueGranite, Inc. All rights reserved.

28 Script Cluster Creation
Tip #1 Script Cluster Creation You may think you have your cluster configured correctly the first time. You don’t Script it with PowerShell to make edits and redeployments a breeze Added bonus – use Azure Automation to drop cluster during unused times. The data is still available. © 2016 BlueGranite, Inc. All rights reserved.

29 Cluster Customizations Need to Be Scripted
Tip #2 Cluster Customizations Need to Be Scripted Just because you CAN log in to a server and make a change… Doesn’t mean you should Use Script Actions to make your custom configurations during cluster execution. If you don’t. Your configurations WILL be lost at some point. © 2016 BlueGranite, Inc. All rights reserved.

30 Use Linux Tip #3 HDInsight is available in Windows or Linux
Ignore Windows Linux deployment is much more complete. You get Ambari. No *useable* UI with Windows deployment Not just my tip. The HDInsight Product Team and Premier Support suggest it as well. © 2016 BlueGranite, Inc. All rights reserved.

31 Remember Tip #3 HDInsight is Hadoop It’s not a Microsoft tool
It’s not an RDBMS Use it for what you can’t do easily with an RDBMS. Don’t replace your RDBMS with it. It won’t perform as well. © 2016 BlueGranite, Inc. All rights reserved.

32 Files copied and Distributed to multiple nodes
Simplified Hadoop Architecture Files copied and Distributed to multiple nodes Data Nodes Name Node Add file to HDFS Network Transfer © 2016 BlueGranite, Inc. All rights reserved.

33 MapReduce The basis of how data is retrieved from Hadoop. All (most) other protocols translate to MapReduce Mapper 1 Reducer 1 I 1 would 1 not 1 like 1 them 1 here 1 nor 1 there 1 I would not like them here nor there. I would not like them anywhere. I 2 would 2 not 2 like 2 them 2 here 1 nor 1 there 1 anywhere 1 Mapper 1 I 1 would 1 not 1 like 1 them 1 anywhere 1 © 2016 BlueGranite, Inc. All rights reserved.

34 MapReduce vs. Tez Tez MapReduce Group t2 by a Group t1 by a
Reducer Mapper HDFS Group t1 by a Group t2 by a Join t1 t2 Order by Tez Mapper Reducer Order by Join t1 t2 Group t1 by a Group t2 by a © 2016 BlueGranite, Inc. All rights reserved.

35 It can get a little confusing…
Atlas Sentry Ambari Zookeeper Drill Falcon Lucene LLAMA Tez Mahout Cascading Knox Ganglia Oozie MapReduce Solr Crunch Ranger Parquet Kite Impala Presto HDFS Hbase Flume Hue Sqoop Pig Hive Spark Jupyter YARN Slider Flink Accumulo DataFu Storm Hcatalog Avro Kafka Nagios Phoenix Zeppelin © 2016 BlueGranite, Inc. All rights reserved.

36 Advanced Use Cases Financial Services Predict trading risk
Customer sentiment analysis Fraud Detection Retail 360-degree view of the customer Personalized promotions Supply-chain optimization Healthcare Real-time patient monitoring Genomic Analysis Outbreak identification Manufacturing Scrap analysis Failure prediction Quality Control © 2016 BlueGranite, Inc. All rights reserved.

37 Demo – Twitter Data Analytics
Introduction to Hive Parsing JSON data Shaping and querying data Using BI tools © 2016 BlueGranite, Inc. All rights reserved.

38 Hive Overview Hive properties SQL for Hadoop! PB Scale
4/27/2017 7:04 PM Hive Overview SQL for Hadoop! PB Scale If you know SQL, you know Hive! Hive properties Easier than Map Reduce Schema on read Supports user code Reads multiple formats HiveQL Ad-hoc querying SerDes No OLTP Best use: Batch over large data sets © 2016 BlueGranite, Inc. All rights reserved. © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

39 Hive Known as the data warehouse of Hadoop, Hive is a protocol and metadata layer that allows for the creation of MapReduce code with a SQL like language called HiveQL. Hive Table Query Submission Metadata Partitions Indexes Views UDFs Usage MapRed Translation Execution /Results HDFS or WASB © 2016 BlueGranite, Inc. All rights reserved.

40 Comparing RDBMS and Hive
Structure Schema On Write Schema On Read Access SQL Indexes Yes Updates Yes (new in 0.14) Referential Integrity No Cost Based Query Optimization OLTP © 2016 BlueGranite, Inc. All rights reserved.

41 This is an AMA section. Questions are free range. Any topic.
Q & A This is an AMA section. Questions are free range. Any topic. © 2016 BlueGranite, Inc. All rights reserved.

42 Thank you! Josh Fennessy Principal, BlueGranite
© 2016 BlueGranite, Inc. All rights reserved.

43 Helping clients gain insights from their data
Business Insights. Delivered. BlueGranite provides end-to-end business analytics solutions using the Microsoft platform. Data Management Business Analytics Data Science Helping clients gain insights from their data Enable the organization to store and analyze large volumes of structured and non-structured data with optimized systems that can scale to meet demand. Help your team understand your performance and prescribe actions through interactive dashboards, reports and predictive analysis. Capitalize on your data assets using the latest machine learning and predictive techniques to garner new insights faster. Founded in 1997, BlueGranite partners with Microsoft and HP to deploy analytics platform, data analytics, and predictive solutions BlueGranite delivers the modern data platform for business analytics built on SQL Server, Analytics Platform System and Apache Hadoop Our clients utilize robust data visualizations to collaborate on data discoveries, integrating analytics into existing business applications We have multiple team members with the Microsoft Partner TSP (P-TSP) designation that work out of Microsoft Technology Centers Our team’s certifications include SCRUM Master, SQL Business Intelligence and Data Platform, and Microsoft SQL Server MVP as well as Apache Hadoop Developer and Administrator Serving the U.S. with offices in Midwest, Heartland, Northeast, South Central, and Southeast Districts

44 System Center Marketing
4/27/2017 Engage with BlueGranite Strategic Roadmap Our senior team of architects and consultants will lead you through discovery and planning workshops to design a strategy for analytics across the organization. Our experienced consultants have learned to navigate through complex requirements, system dependencies, and data quality issues that may be facing your organization. Analytics solutions Our team of data scientists and BI developers work directly with your team to design, build, and deploy analytics and reporting solutions using the latest BI technologies. We create robust, automated reporting and analysis tools that help our clients harness the full potential of their data, providing insights your team needs to perform their best. Training Workshops Through briefings, discussions, scripted labs and live analysis of your own data, we will teach your team the skills they need to gain insights more quickly and efficiently. Workshop components include both technical and business oriented content, helping teams become self-reliant with their BI tools. Managed Services Expert support for data warehouse, BI and analytics systems. We provide an efficient, reliable, lower cost alternative to maintain and enhance your portfolio of solutions. Our onsite team responds to requests within 1 hour, and our automated support management system tracks activities and status to ensure you have visibility into our shared success. BlueGranite, Inc. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "Hadoop and the Modern Data Architecture"

Similar presentations


Ads by Google