Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data with Azure where to begin?

Similar presentations


Presentation on theme: "Big Data with Azure where to begin?"— Presentation transcript:

1 Big Data with Azure where to begin?
25th February 2017 Pordenone, Italy Concepts and best practices Satya SK Jayanty Principal Architect & Managing Consultant

2 Organizers

3 Speaking Engagements

4 Author’d http://www.manning.com/delaney/

5 Agenda… .what agenda? .. … …. ….. ……. no agenda!
..…... you like: small data… big data… all data! …….….that’s why you are here today

6 What differentiates today’s thriving organizations?
System Center Marketing 9/10/2018 What differentiates today’s thriving organizations? Data. Key Points: Data is currency in the twenty-first century Companies that take advantage of data opportunities have the potential to outperform those that do not Talk Track: What asset is most leveraged by today’s thriving companies? Data. We believe data will be a key differentiator for businesses today and in the future. You constantly hear in the news about new ways in which businesses are using data as a competitive advantage. You hear how people in those organizations are making fast, informed decisions like never before possible. So the question is, what are these thriving companies doing with data? Data in all forms & sizes is being generated faster than ever before Capture & combine it for new insights & better, faster decisions © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 Strategic opportunity with Big Data…
9/10/2018 Strategic opportunity with Big Data… Mobile Big data Cloud Social Business Customer growth Embrace new models Increased productivity Real-time insights to architect business innovation? IT How do you use technology innovation… ? Why this Slide: You are trying to setup the struggle between IT and Business – you have to run a huge portfolio of apps, the business always wants more apps but you are struggling to just keep running what you have. In part this is also to cement that you are an expert and you understand their challenges. Key Points: Setup CHANGE is constant – you are being asked to do more with less Business Pressure to innovate IT Challenge of just operating IT takes most of the time/budget/resources Transition to NEXT Slide: How will you put yourself in a position to rapidly deploy new tech to drive bus innovation? © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 Be prepared to blow your mind!?!

9 Big Data Eco-system Copyright: IBM

10 Big Data Components with Hadoop

11 Big Data Mountain

12 Handle traditional data to big data?
Petabytes Big Data Log files Spatial and GPS coordinates Data market feeds eGov feeds Weather Text/image Click stream Wikis/blogs Sensors RFID Devices Social sentiment Audio/video Terabytes Web 2.0 Web Logs Digital Marketing Search Marketing Recommendations Advertising Mobile Collaboration eCommerce Gigabytes ERP/CRM Payables Payroll Inventory Contacts Deal Tracking Sales Pipeline Megabytes Data Complexity: Variety and Velocity

13 Data Warehouse traditional approach
9/10/2018 Data Warehouse traditional approach Source Systems ETL Data warehouse BI & analytics Staging OLTP ERP CRM LOB Dashboards Reporting © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 Peak points of traditional approach
9/10/2018 Peak points of traditional approach 50x Data growth 1Trillion Web pages 40ZB Digital Universe 2020 Increasing data volumes 1 Source Systems ETL Data warehouse BI & analytics Staging OLTP ERP CRM LOB Dashboards Reporting © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 Breaking points of traditional approach
9/10/2018 Breaking points of traditional approach 204M s sent every minute 231B US Ecommerce in 2012 340M Tweets sent every day Increasing data volumes 1 Real-time data 2 Source Systems ETL Data warehouse BI & analytics Staging OLTP ERP CRM LOB Dashboards Reporting © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Added points of traditional approach
9/10/2018 Added points of traditional approach Increasing data volumes 1 Real-time data 2 Source systems ETL Data warehouse BI & analytics Staging OLTP ERP CRM LOB Dashboards Reporting New data Devices Web Sensors Social New data types 3 15x Machine generated data 2020 2.4M Facebook content per minute 1.3M Hours on Skype per hour © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 Added points of traditional approach
9/10/2018 Added points of traditional approach Increasing data volumes 1 Real-time data 2 Source systems OLTP ERP CRM LOB ETL Data warehouse BI & analytics Staging New data Devices Web Sensors Social Dashboards Reporting New data types 3 Cloud-born data 4 $100B spend on cloud 40% CRM sold are SaaS 50% large orgs have hybrid by 2017 © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 Evolving Approaches to Analytics
9/10/2018 Evolving Approaches to Analytics Traditional Extract Transform Load EDW (SQL Svr, Teradata, etc) OLTP ETL Tool (SSIS, etc) BI Tools ERP LOB Original Data Transformed Data Data Marts Data Lake(s) Dashboards Ingest (EL) Social Devices Scale-out Storage & Compute (HDFS, Blob Storage, etc) Apps Sensors Original Data Web Streaming data Big Data Transform & Load © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Introducing Big Data “Big data is a collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization.” – Wikipedia Enormous amounts of data . online behavior social networking users . .. samples of medical ailments .. … purchasing habits of grocery shoppers … …. crime statistics of cities …. ….. “internet of things” IoT….. …… 24/7 out-patient monitor …… ……. real-time tele-metric devices ……. Device Explosion > 5.5 billion (> 70% of global population) Social Networks > 2 billion users Cheap Storage $100 gets you 3 million times more storage in 30 years Ubiquitous Connection Web traffic 2010: 130 exabyte (10 E18) 2015: 1.6 zettabyte (10 E21) Sensor Networks > 10 billion Inexpensive Computing 1980: 10 MIPS/$ 2005: 10M MIPS/$ 90% Of data in the world, has been created in the last 2 years

20 5 V’s

21 Cloud Computing Patterns
Compute Inactivity Period On and Off – Start/End Semester On & off workloads (e.g. batch job) Over provisioned capacity is wasted Time to market can be cumbersome t Growing Fast – Research Project Successful services needs to grow/scale Keeping up w/ growth is big IT challenge Cannot provision hardware fast enough Compute t Unpredictable Bursting – Web demand Unexpected/unplanned peak in demand Sudden spike impacts performance Can’t over provision for extreme cases Compute t Predictable Burst – Registration Services with micro seasonality trends Peaks due to periodic increased demand IT complexity and wasted capacity Compute

22 Application building blocks
Big data Database Storage Traffic Caching Messaging Identity Application building blocks Media CDN Networking Speaking Points: This is a small sampling .. We’ve talked about a few of these building block services. In addition to Database, Storage, Caching, Messaging, and Identity… Big data - We also have services for managing big data… Traffic Manager - … Media Services - … Provides a managed service that allows you to create, manage, and distribute media. You can target any type of device We’ll provide full analtyics on top of it. CDN – A content delivery network for putting your content closer to end users. We’ll drill into more details on several of these services later today and You will see this list grow in the months and weeks ahead

23 Introducing Apache Hadoop
Apache Open Source Project Highly scalable distributed file system (HDFS) Distributed processing on data nodes Hadoop stores files in a distributed file system Storage and computation is distributed across many servers Files can be spread out over multiple nodes Hadoop can store very large amounts of data Combined storage resource can grow with demand from a few nodes to thousands of nodes Scales out linearly Very large files supported including those larger than the capacity of a single node Data volumes Data velocity Data variety

24 Industry use cases of Hadoop
Financial services Retail Telecom Manufacturing New account risk screens Fraud prevention Trading risk Maximize deposit spread Insurance underwriting Accelerate loan processing 360° view of the customer Analyze brand sentiment Localized, personalized promotions Website optimization Optimal store layout Call detail records (CDRs) Infrastructure investment Next product to buy (NPTB) Real-time bandwidth allocation New product development Supplier consolidation Supply chain and logistics Assembly line quality assurance Proactive maintenance Crowd source quality assurance Healthcare Utilities, oil and gas Public sector Genomic data for medical trials Monitor patient vitals Reduce re-admittance rates Store medical research data Recruit cohorts for pharmaceutical trials Smart meter stream analysis Slow oil well decline curves Optimize lease bidding Compliance reporting Proactive equipment repair Seismic image processing Analyze public sentiment Protect critical networks Prevent fraud and waste Crowd source reporting for repairs to infrastructure Fulfill open records requests

25 Introducing Hadoop Comparison to Traditional RDBMS
Data Size Gigabytes (Terabytes) Petabytes (even Exabytes) Access Interactive and Batch Batch Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear DBA Ratio 1:40 1:3000

26 Data variety Hadoop stores files (non-relational store) Sentiment
Files could have a variety of semi-structured or unstructured data Previously, these files may not have been seen as providing value or insights Today, new business questions and insights are being uncovered through data science Sentiment Understand how your customers feel about your brand and products— right now Clickstream Capture and analyze website visitors’ data trails and optimize your website Sensors Discover patterns in data streaming automatically from remote sensors and machines Geographic Analyze location-based data to manage operations where they occur Server logs Research logs to diagnose process failures and prevent security breaches Unstructured Understand patterns in files across millions of web pages, s, and documents

27 Data velocity Hadoop can stream live data and process them in real-time Hadoop can act as scalable event stream ingestion Hadoop can do near real-time in-stream processing Data input Event broker Stream processing Outgoing Applications Devices HTTP Incoming Outgoing

28 Hadoop is a platform with portfolio of projects
Governance and integration Data workflow, lifecycle and governance Falcon Sqoop Flume NFS WebHDFS YARN: data operating system Script Pig Search Solr SQL Hive/Tez, HCatalog Nosql Hbase Accumulo Stream Storm Others Spark, in-memory, ISV engines 1 N Batch Map reduce Data access HDFS (Hadoop Distributed File System) Data management Authentication Authorization Accounting Data protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Security Operations Provision, manage, and monitor Ambari Zookeeper Scheduling Oozie Governed by Apache Software Foundation (ASF) Comprises core services of MapReduce, HDFS, and YARN In addition to the core, includes functions across: Data services which allow you to manipulate and move data (Hive, HBase, Pig, Flume, Sqoop) Operational services which help manage the cluster (Ambari, Falcon, and Oozie) Hadoop common – utilities to support modules HDFS (Hadoop Distributed File System) – high throughput YARN – job scheduling and cluster RM MapReduce – YARN-based for parallel processing Spark – compute engine Pig – data-flow language & execution framework Oozie – workflow scheduler Ambari – provisioning, managing and monitoring clusters Sqoop – bulk data transfer between Hadoop & Relational DB Batch processing centric – using a “Map-Reduce” processing paradigm

29 A look on SQL and NoSQL

30 Getting Started with HDInsight
Introducing Azure HDInsight 100% Apache Hadoop Powered by the cloud Immersive insights

31 Position in Cloud PaaS At the risk of repeating a slide …
HDInsight provides a Platform as a Service in that it provides infrastructure and capacity for your own variant computing requirements. It is not IaaS as you do not have to manage configuration or OS level integrity. It is not SaaS since the workload is variant and user specified PaaS

32 A Holistic Big Data Solution from Microsoft Spanning relational and non-relational worlds
INSIGHTS Self-service Operational Predictive Mobile Real-time Collaborative MARKETPLACE External Data and Services Share and govern Discover and recommend Transform and clean DATA ENRICHMENT Non-relational DATA MANAGEMENT Relational Multidimensional Streaming 1

33 Hadoop on Windows Insights to all users by activating new types of data
Differentiation Integrate with Microsoft Business Intelligence INSIGHTS Choice of deployment on Windows Server + Windows Azure Integrate with Windows Components (AD, Systems Center) ENTERPRISE READY Easy installation and configuration of Hadoop on Windows Simplified programming with . Net & Javascript integration Integrate with SQL Server Data Warehousing BROADER ACCESS Contributions proposed back to community distribution

34 HDInsight supports Hive
SQL-like queries on Hadoop data in HDInsight HDInsight provides easy-to-use graphical query interface for Hive HiveQL is a SQL-like language (subset of SQL) Hive structures include well-understood database concepts such as tables, rows, columns, partitions Compiled into MapReduce jobs that are executed on Hadoop Dramatic performance gains with Stinger/Tez Stinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with Hive Brings query execution engine technology from Microsoft SQL Server to Hive Performance gains up to 100x Microsoft contribution to Apache code Hadoop 2.0 1400s 44.3s 35.1s Sample Query Hive 10 HDP 1.3 / Hive 11 HDP 2.0 32x Speedup 40X Speedup HDP 2.1 15s 100x

35 HDInsight supports HBase
NoSQL database on data in HDInsight Columnar, NoSQL database Runs on top of the Hadoop Distributed File System (HDFS) Provides flexibility in that new columns can be added to column families at any time Data Node Task Tracker Name Node Job Tracker HMaster Coordination Region Server

36 HDInsight supports Mahout
Machine learning library A library of machine learning algorithms to execute on data in HDFS Algorithms are not dependent on size of data and can scale with large datasets Library includes: Collaborative Filtering, Classification, Clustering, Dimensionality Reduction, Topic Models

37 HDInsight supports Storm
Stream analytics for near-real-time processing Consumes millions of real-time events from a scalable event broker (i.e.; Apache Kafka, Azure Event Hub) Performs time-sensitive computation Output to persistent stores, dashboards or devices Bolt Spout

38 TCO, Deployment & Geo-Redundancy
HDInsight in the cloud bypasses deployment expertise Hadoop is non-trivial to install and get up and running Education gap in IT community regarding Hadoop HDInsight is deployed in minutes Spin up any number of Hadoop nodes on-demand Up and running in a few clicks (and within minutes) Auto Geo-Redundant Deployed in minutes $£€¥ HDInsight is billed by usage Billed for usage Clusters can be deleted when no longer used No additional price for support Azure Support includes Hadoop support What usually costs thousands of dollars per node is included HDInsight Auto Replicates Data Automatic geo-replication of data Data only replicates within the same geo-political (i.e., country, region

39 Connect cloud Hadoop with on-premises
HDInsight Cloud On-premises Hadoop Software Appliances APS Hybrid = On-premises + Cloud Hortonworks On-Prem Hadoop Moves Data To HDInsight Analytics Platform System can query HDInsight and join with on- prem

40 Hybrid Compatibility Microsoft are the only vendor with enterprise on premises and cloud big data offerings Hadoop On Premises HDInsight in Azure Name=Andy Pnid=123456 Convert the Name to a Private name identifier, use a VPN to transmit the data to Windows Azure, Cloud service assembles into blobs HDInsight works with anonymised data only Data sits at rest in the cloud with no possibility of sensitive data leak

41 Just as Big Data is more than Hadoop;
Windows Azure is more than HDInsight HDP Spark Storm…

42 The overall architectural challenge of Big Data is that just as Data can vary, so must architectures. Batch Interactive Real Time Hadoop can operate with O(n) over Petabytes of data Drill, Stinger and Tez bring must quicker querying Storm allows enormous scalable throughput

43 Bringing Hadoop to a billion people
Excel as the BI tool for everyone Power BI for collaboration & new experiences 1 Billion Microsoft Office users Connect to HDInsight Analyze Visualize Office 365 is our fastest-growing commercial product ever Share Ask Access Scalable, manageable, trusted

44 Introducing the zoo: HDInsight/Hadoop Eco system Relational
Pipeline / workflow (Oozie) PowerShell C#, F#, .NET Graph (Pegasus) Stats processing (RHadoop) Machine Learning (Mahout) ( ODBC / SQOOP/ REST) Data Integration Relational (SQL Server) Legend Red = Core Hadoop Blue = Data processing Green = Packages Purple = Microsoft integration points and value adds Orange = Data Movement Metadata (HCatalog) APS (formerly PDW) Polybase Event Driven Processing NoSQL Database (HBase) Scripting (Pig) Query (Hive) Event Pipeline (Flume) Distributed Processing (MapReduce) (Excel, Power View, SSAS) Business Intelligence Distributed Storage (HDFS) 1). The core of Hadoop: distributed storage and compute 2). Layer of abstraction between you and the native Java Map\Reduce jobs that enables metadata cataloging with HCatalog, T-SQL like queries using Hive, and scripting with Pig. 3). Oozie enables workflows, Flume enables the aggregation of log files, and NoSQL Hbase enables analytics. 4). Predictive analytics with machine learning using Mahout, highly customizable analytic “R” language queries using Rhadoop, and Graph mining with Pegasus that enables you to map relationships in social networking data. 5). Tie is all together with data integration capabilities of ODBC, SQOOP and the REST services API. 6). Microsoft differentiates its Hadoop offering by offering integration with PDW (PolyBase), 7). Active Directory and System Center offer tight A&A integration along with the automation needed to provision and de-provision clusters at will. Monitoring & Deployment (System Center) World's Data (Azure Data Marketplace) Azure Storage Vault (ASV) Active Directory (Security)

45 Programming HDInsight
Since HDInsight is a service-based implementation, you get immediate access to the tools you need to program against HDInsight/Hadoop Hive, Pig, Sqoop, Mahout, Cascading, Scalding, Scoobi, Pegasus, etc. Existing Ecosystem C#, F# Map/Reduce, LINQ to Hive, .Net Management Clients, etc. .NET JavaScript Map/Reduce, Browser-hosted Console, Node.js management clients JavaScript PowerShell, Cross-Platform CLI Tools DevOps/IT Pros:

46 Other Microsoft data science tools
HDInsight Hadoop in the cloud + Storm (real-time analytics) + HBase (NoSQL) + Mahoot (ML!) Azure Stream Analytics Streaming data originating in the cloud Based on HDInsight/Hadoop Also useful: Power BI: Power Query, Power View, and Dashboards Excel Azure Data Factory (ETL in the cloud) Analytics Platform System (SQL Server on steroids + Hadoop + hardware)

47 Azure ML Machine Learning platform in Azure cloud Free Paid
Pre-process data Engineer features Modelling ≣ machine learning ≣ data mining Run R Run Python Experiments (modelling) + Web Services (deployment) Free Limited: data size, experiment duration, scalability, speed Paid Relatively inexpensive, can be free

48 Challenges with implementing Hadoop
Up-front HW costs Capacity planning Hadoop expertise Barriers to Hadoop: Skills gap Weak business support Security concerns Data management hurdles Tool deficiencies Containing costs Big Data on-premise concerns include: •Hardware costs •IT and operational costs in setting up a machine cluster and supporting it •Cost of personnel to work on the ecosystem

49 Why Hadoop in the cloud? Benefits of Cloud No HW costs Unlimited scale
$0 Unlimited scale Pay what you need Deployed in minutes Benefits of Cloud Unlimited elastic scale Auto geo redundancy No hardware costs Pay only for what you need

50 Data Insights Conversation
9/10/2018 VISUALIZE + DECIDE Mobile Reports Natural language query Dashboards Applications Streaming CAPTURE + MANAGE Relational Internal & external Non-relational NoSQL { } TRANSFORM + ANALYZE Orchestration Machine learning Modeling Information management Complex event processing Data TRANSFORM + ANALYZE VISUALIZE + DECIDE CAPTURE + MANAGE The Microsoft data platform © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

51 Cortana Analytics Suite Transform data into intelligent action
Build 2015 9/10/2018 5:28 PM Cortana Analytics Suite Transform data into intelligent action Information Management Azure Data Factory Data Catalog Event Hub Big Data Stores Azure Data Lake store SQL Data Warehouse Machine Learning and Analytics Azure Machine Learning Azure HDInsight (Hadoop and Spark) Stream Analytics Azure Data Lake analytics service Dashboards and Visualizations Power BI Business apps Custom apps Sensors and devices Personal Digital Assistant Cortana People Perceptual Intelligence Face, vision Speech, text Cortana Analytics Suite delivers an end-to-end platform with integrated and comprehensive set of tools and services to help you build intelligent applications that let you easily take advantage of Advanced Analytics. First Cortana Analytics Suite provides services to bring data in, so that you can analyze it.  It provides information management capabilities like Azure Data Factory so that you can pull data from any source (relational DB like SQL or non-relational ones like your Hadoop cluster) in an automated and scheduled way, while performing the necessary data transforms (like setting certain data colums as dates vs. currency etc).  Think ETL (Extract, Transform, Load) in the cloud. Event hub does the same for IoT type ingestion of data that streams in from lots of end points. The data brought in then can be persisted in flexible big data storage services like Data Lake and Azure SQL DW. You can then use a wide range of analytics services from Azure ML to Azure HDInsight to Azure Stream Analytics to analyze the data that are stored in the big data storage.  This means you can create analytics services and models specific to your business need (say real time demand forecasting). The resultant analytics services and models created by taking these steps can then be surfaced as interactive dashboards and visualizations via Power BI These same analytics services and models created can also be integrated into various different UI (web apps or mobile apps or rich client apps) as well as via integrations with Cortana, so end users can naturally interact with them via speech etc., and so that end users can get proactively be notified by Cortana if the analytics model finds a new anomaly (unusual growth in certain product purchases- in the case of real time demand forecasting example given above) or whatever deserves the attention of the business users. Automated Systems Business Scenarios Recommendations, customer churn, forecasting, etc. DATA INTELLIGENCE ACTION © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

52 Azure Data Factory A managed cloud service for building & operating data pipelines Part of the Cortana Analytics Suite

53 PolyBase PolyBase and queries RDBMS Hadoop
9/10/2018 5:28 PM PolyBase and queries RDBMS Hadoop PolyBase Provides a scalable, T-SQL-compatible query processing framework for combining data from both universes Access any data © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

54 Agnostic architecture
PolyBase is agnostic = No vendor lock in PolyBase supports Hadoop on Linux & Windows PolyBase integrates with the cloud PolyBase supports HDInsight in APS & external Hadoop clusters

55 PolyBase builds the bridge
Just-in-Time data integration Across relational and non-relational data High performance parallel architecture Fast, simple data loading Best of both worlds Uses computational power at source for both relational data & Hadoop Opportunity for new types of analysis Uses existing analytical skills Familiar SQL semantics & behaviour Query with familiar tools SSDT PolyBase = run time integration Includes Power BI

56 PolyBase External Table PDW Engine Service PDW Bridge User Perspective
External Data Source External File Format Systems Perspective PDW Engine Service PDW Bridge

57 What is R? High accuracy ML classifiers In-memory analytics
9/10/2018 What is R? High accuracy ML classifiers Talented community of contributors In-memory analytics Extensible via packages Big data analytics Open source implementation Amazing data-visualization capabilities Top tool for machine learning Industry standard for computational mining OOL for statistical computing © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

58 What is R? Better IDE: RStudio Open source, free, multiplatform
Language, interpreter, poor IDE 5000+ packages of statistical software Better IDE: RStudio Rattle makes it even easier Open source, free, multiplatform Core R: the purest version: Revolution Analytics: parallelism & performance: Azure ML: built-in

59 Why R is famous? R plotting Box plot Bar plot Histogram Contour
9/10/2018 Why R is famous? R plotting Box plot Bar plot Histogram Contour Dot plot Mosaic Scatter Latticist © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

60 Revolution R Enterprise and SQL
Big data analytics platform Based on open source R High-performance, scalable, full-featured Statistical and machine-learning algorithms are performant, scalable, and distributable Write once, deploy anywhere Scripts and models can be executed on a variety of platforms, including non-Microsoft (Hadoop, Teradata in-DB) Integration with the R Ecosystem Analytic algorithms accessed via R function with similar syntax for R users. Arbitrary R functions/packages can be used in conjunction Advanced analytics

61 SQL Server 2016 R integration scenario
Exploration Use RRE from R IDE to analyze large datasets and build predictive and embedded models with the compute happening on the SQL Server machine (SQL Server compute context) Operationalization Developer can operationalize R script/model over SQL Server data by using T-SQL constructs DBA can manage resource, secure, and govern R runtime execution in SQL Server

62 R script library in Microsoft Azure Marketplace
Server & Tools Business 9/10/2018 R script library in Microsoft Azure Marketplace Microsoft Azure Machine Learning Marketplace New R scripts Extensibility Benefits Faster deployment of ML models Faster performance (moves compute close to the data) Improved scalability Fraud detection Customer-churn analysis Product recommendations Example solutions Fraud detection Sales forecasting Warehouse efficiency Predictive maintenance R R Integration 010010 100100 010101 010010 100100 010101 010010 100100 010101 Launch External Process Analytic library T-SQL interface Relational data Data Scientist Interacts directly with data Data Developer/DBA Manages data and analytics together 010010 100100 010101 Benefits Faster deployment of ML models Faster performance (Move compute close to the data) Improved scalability Near-DB Analytic Scenarios Fraud detection Customer churn analysis Product recommendations 010010 100100 010101 Built into SQL Server Advanced analytics © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

63 Machine learning tools
Open source R – considered best fit Python Monte Carlo Machine Learning Library H2O Weka Octave-Forge Commercial Microsoft Azure Machine Learning SAS Enterprise Miner IBM SPSS Modeler RapidMiner Apache Mahout MATLAB Oracle Data Mining

64 Why Should I Use Microsoft Azure ?
Heterogeneity Languages CMS Devices Databases Operating Systems Rich Services IaaS / PaaS / SaaS Integrate with on-premises Extend your network to cloud with site-to-site VPN Manage private & public cloud with App Controller Lower Your Risk High available & redundant No data lost with 3 local & 3 geo copies Hi-tech security %99,95 monthly SLA

65 Scaling 2 6 92 Scale when needed Auto / Manual Scale

66 Azure – in hawk-eye mode
Platform Services Security & Management Services Compute Cloud Services Batch RemoteApp Service Fabric Web and Mobile Web Apps Mobile Apps API Management API Apps Logic Apps Notification Hubs Data SQL Database DocumentDB Redis Cache Azure Search Storage Tables Warehouse Hybrid Operations Portal Azure AD Health Monitoring Azure Active Directory AD Privileged Identity Management Azure AD B2C Integration BizTalk Services Hybrid Connections Service Bus Storage Queues Domain Services Multi-Factor Authentication Analytics & IoT HDInsight Machine Learning Stream Analytics Data Factory Event Hubs Mobile Engagement Lake IoT Hub Catalog Backup Automation Operational Analytics Developer Services Visual Studio App Insights Azure SDK VS Online Scheduler Import/Export Key Vault Media & CDN Content Delivery Network (CDN) Media Services Store/ Marketplace Azure Site Recovery StorSimple VM Image Gallery & VM Depot Infrastructure Services Why this Slide: It shows we have a very broad platform. It about BOTH IaaS and PaaS, that these work together. It shows that we continue to lead in world class IT capabilities and that there’s really nothing missing. Key Points: We have already seen how the Azure Platform is IaaS + Pass – but I want you to understand that this is a huge number of capabilities – IT building blocks if you will. Every one of these blocks you provision anytime, self-service anywhere in the world 24x7. You pay for what you use, you can get more or less anytime and you can fully automate everything… DON’T spent too much time on this slide – you are going to DEMO (aren’t you!!!)… DON’T go through each block… Transition to NEXT Slide: Make the build go backwards to show JUST IaaS and then you will go to the demo to show it. OS/Server Compute Storage Networking Virtual Machines Container Service BLOB Storage Azure Files Premium Storage Virtual Network Load Balancer DNS Express Route Traffic Manager VPN Gateway App Gateway Datacenter Infrastructure (24 Regions, 22 Online)

67

68 Summary Big Data refers to data sets so large and/or complex that they become awkward to work with in conventional ways Hadoop and HDInsight = Microsoft’s answer to Big Data Hadoop can store petabytes of data reliably and execute huge distributed computations However – Big Data query results often involve significant latency Power BI includes authoring add-ins to query, analyze and visualize data sourced from Azure HDInsight Preload data in advance of business user queries Big Data is just another data source!

69 Resources Microsoft Big Data web site Azure HDInsight web site
Azure HDInsight web site Hortonworks tutorials Numerous tutorials are available to learn about Big Data by using the Hortonworks Sandbox Follow me @SQLMaster

70 Sponsors

71 #sqlsat589 Q&A Thanks!


Download ppt "Big Data with Azure where to begin?"

Similar presentations


Ads by Google