Download presentation
Presentation is loading. Please wait.
Published byEgbert Dennis Modified over 9 years ago
1
Microsoft Big Data Essentials Module 1 - Introduction to Big Data
Server & Tools Business 4/19/2017 Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos, Advaiya My name is Saptak Sen and welcome this introduction session for the Microsoft Big Data Boot Camp. This session sets the stage for the three days of training. Each session follows a similar format where I’ll introduce the topic and then provide a set of demonstrations on how the technology works. Let’s get started. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
Server & Tools Business
4/19/2017 Agenda Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service In this introduction session, I’m going to first give you a broad overview of the Microsoft Cloud OS data platform story and walk through the three pillars for the upcoming SQL Server 2014 release along with the new features that relate to the Big Data story. Next, I’ll introduce the Lambda Architecture. This is community driven architecture that helps provide a framework for how various Big Data components work together for specific scenarios. I’ll also show how the various Microsoft Big Data platform components like HDInsight fit into the Lambda Architecture. I’ll next go over the Windows Azure’s high level architecture and components and then give an overview of the Table and Blog storage components that relate to Big Data solutions. At then end, I’ll demo how to create a Windows Azure storage account and HDInsight cluster. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
The Business Imperative
1. 2. 3. 4. Human Fault Tolerance Minimize CapEx Hyper Scale on Demand Low Learning Curve
4
CAP Theorem Consistency C Partition Tolerance P Availability A
5
Server & Tools Business
4/19/2017 Big Data Lambda Architecture Let’s now look at the Bid Data Lambda Architecture © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6
Big Data Lambda Architecture
Batch layer Stores master dataset Compute arbitrary views Speed layer Fast, incremental algorithms Batch layer eventually overrides speed layer Serving layer Random access to batch views Updated by batch layer Batch Layer Speed Layer Talk Track: In order to make sense of how various Big Data technologies fit together, the Open Source community has developed what is know as the Big Data Lambda Architecture. The “lambda architecture” provides an architectural model that scales and which has both the advantages of long-term batch processing and the freshness of a real-time system, with data updated in seconds time. The lambda architecture solves the problem of computing arbitrary functions on arbitrary data in real time by decomposing the problem into three layers: the batch layer, the speed layer, and the serving layer. Let’s take a look at each of the three layers. The Batch Layer stores the Master Dataset for you solution – typically in append mode – that is handles new data coming in. The Batch layer is usually: Read only database. No random writes required. It is Horizontal scalable with unrestrained computation and High Latency. Speed Layer - Stream processing and Continuous computation. It provides fast incremental algorithms. Batch layer eventually overrides speed layer. All the complexity is isolated in the Speed layer. If anything goes wrong, it’s auto-corrected. The views are stored in Read & Write database. • MS SQL Server • Column Store • Cassandra • … • Much more complex than a read only view. Service Layer: The service layer provides the merged outcome of data streams coming from the Batch layer and the speed .This layer queries the Batch & Real Time views and merges it. PolyBase is a great fit. Key Points: Lambda Architecture with three layer The Batch Layer -Stores Master Dataset The Speed layer –Stream Processing for real time view The Service Layer-merged outcome of data streams coming from the Batch layer and the speed layer References: Big Data Lambda Architecture: Speaker notes from: Serving Layer
7
The Batch Layer Stores master dataset (in append mode)
Unrestrained computation Horizontally scalable High latency Batch views Master dataset Incoming data streams Talk Track: The portion of the lambda architecture that precomputes the batch views is called the batch layer. The batch layer stores the master copy of the dataset and precomputes batch views on that master dataset. The master dataset can be thought of us a very large list of records. The batch layer needs to be able to do two things to do its job: store an immutable, constantly growing master dataset, and compute arbitrary functions on that dataset. The key word here is arbitrary. If you’re going to precompute views on a dataset, you need to be able to do so for any view and any dataset. The nice thing about the batch layer is that it’s simple to use. Batch computations are written like single-threaded programs yet automatically parallelize across a cluster of machines. This implicit parallelization makes batch layer computations scale to datasets of any size. It’s easy to write robust, highly scalable computations on the batch layer. The batch view enables you to get the values you need from it very quickly because it’s indexed. Think of technologies like Hadoop and Pig/Hive for use on the Batch layer. Data warehouse database technologies can also be associated with the Batch layer. Key Points: The Batch Layer -Stores Master Dataset and precomputes batch views on that master dataset Store an immutable, constantly growing master dataset, and compute arbitrary functions on that dataset Read only database. No random writes required. References: Big Data Lambda Architecture: lambda-architecture/
8
The Speed Layer Stream processing of data
Stores a limited window of data Dynamic computation Real-time views Incoming data streams Talk Track: You can think of the speed layer as similar to the batch layer in that it produces views based on data it receives. There are some key differences, though. One big difference is that, in order to achieve the fastest latencies possible, the speed layer doesn’t look at all the new data at once. Instead, it updates the real-time view as it receives new data instead of recomputing them like the batch layer does. The speed layer requires typically requires databases that support random reads and random writes. Because these databases support random writes, they are more complex than the databases you use in the serving layer, both in terms of implementation and operation. Most of the application complexity tends to be isolated in the Speed layer. Technologies typically considered for the speed layer include in-memory transaction databases and complex event processing engines. Key Points: Stream processing. Continuous computation Transactional. Storing a limited window of data. Compensating for the last few hours of data. All the complexity is isolated in the Speed layer. If anything goes wrong, it’s auto-corrected. Some algorithms are hard to implement in real time References: Big Data Lambda Architecture: lambda-architecture/ Process stream Increment views Real-time increments
9
The Serving Layer Queries the batch and real-time views
Merges the results Batch views Output Querying and merging Talk Track: Finally, the serving layer indexes the batch view and loads it up so it can be efficiently queried to get particular values out of the view. The serving layer is typically considered as a specialized distributed database that loads in batch views, makes them able to be queried, and continuously swaps in new versions of a batch view as they’re computed by the batch layer. A serving layer database only requires batch updates and random reads. Most notably, it does not need to support random writes. The serving layer job is to queries the Batch & Real Time views and merges it. Typically the technologies associated with the serving layer include on-line analytic processing databases like Analysis Services and PowerPivot. It can also be considered as the “last mile” technology for producing usable results for your solutions. Key Points: Service Layer queries the Batch & Real Time views and merges it References: Big Data Lambda Architecture: lambda-architecture/ Real-time views
10
Microsoft Lambda Architecture Support
Server & Tools Business 4/19/2017 Microsoft Lambda Architecture Support Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage MapReduce, Hive, Pig, Oozie, SSIS Federations in Windows Azure SQL Database Azure tables Memcached/MongoDB SQL Server database engine SQL Server VM: Columnstore indexes Analysis Services StreamInsight Azure Storage Explorer Microsoft Excel Power Query PowerPivot Power View Power Map Reporting Services LINQ to Hive Analysis Services Talk Track: The Microsoft’s Data Platform stack fully supports each of the layers in the Big Data Lambda Architecture. For the batch layer, Microsoft provides multiple options for the storage and processing of batch oriented data. These include Windows Azure HDInsight and Azure Blob Storage to hold the input data. The SQL Server data warehousing capabilities can also be associated with the batch layer. For processing the data and view management, Microsoft supports processing of Hadoop data through MapReduce jobs along with Hive, Pig, and Oozie. For data warehousing, you can use traditional SQL views and stored procedures. For the speed layer, Microsoft supports real-time processing of data through technologies like Federations in Windows Azure SQL Database, Azure Tables, Memcached/MongoDB, SQL Server database engine and SQL Server VM along with Columnstore Indexes, Analysis Services, StreamInsight. Finally, with the serving layer, which provides the merged outcome of data streams coming from the Batch layer and the speed layer, you can use tools like PowerPivot, Power View, Power Query, Power Map, Reporting Services, LINQ to Hive and Analysis Services technologies. Key Points: Microsoft provides a complete BI solution, which can be entirely aligned with all the three layers of the Lambda Architecture. References: Big Data Lambda Architecture: lambda-architecture/ © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
11
Server & Tools Business
4/19/2017 Yahoo! Batch Layer Speed Layer Serving Layer Apache Hadoop Staging Database SQL Server Analysis Service (SSAS) Microsoft Excel and PowerPivot Other BI Tools and Custom Applications Hadoop Data SQL Server Connector (Hadoop Hive ODBC) Talk Track: Using SQL Server 2008 R2, Yahoo! enhanced its Targeting, Analytics and Optimization (TAO) infrastructure (a powerful, scalable advertising analytics tool), which now takes data from a Hadoop cluster into a third-party database, where it is loaded into a SQL Server 2008 R2 Analysis Services cube. The cube then connects to client applications such as Tableau Desktop business analytics software and in-house custom applications. Employees use this software to create interactive data dashboards and perform ad hoc analysis. Microsoft has developed the SQL Server Connector for Apache Hadoop, which is designed to facilitate efficient data transfer between Hadoop and SQL Server 2008 R2. Key Points: With Big Data technology, Yahoo experienced the following benefits: Improved ad campaign effectiveness and increased advertiser spending. Cube producing 24 terabytes of data quarterly, making it the world’s largest SQL Server Analysis Services cube. Ability to handle more than 3.5 billion daily ad impressions, with hourly refresh rates. References: Microsoft case study: Yahoo! Improves Campaign Effectiveness, Boosts Ad Revenue with Big Data Solution: Third Party Database SQL Server Analysis Services (SSAS Cube) + Custom Applications Microsoft Excel & PowerPivot for Excel © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
12
Ferranti Computer Systems
Server & Tools Business 4/19/2017 Ferranti Computer Systems Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Reactive Extensions (Rx) SQL Server Database (In- Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services Talk Track: Ferranti and Microsoft designed a solution that uses Windows Azure HDInsight Service and nonrelational technologies to perform fast searches on business data and provide the information to the business processes in MECOMS™ (a business support system for the energy and utility industry) and Microsoft Dynamics AX. Searches of the memory-optimized tables are distributed between groups of computers, called clusters, which are managed by HDInsight. In-Memory OLTP makes access to SQL Server databases dramatically faster by optimizing queries and procedures, and moving heavily used tables into application memory—referred to as memory- optimized tables. Reactive Extension (Rx) was implemented to verify and process the incoming raw data, and then to send the aggregated data to SQL Server for quick storage in memory-optimized tables. SQL Server analyzes the aggregated data, and sends the results of the analysis to Microsoft Dynamics AX for demand-side business processes such as scheduling service calls, terminating service, and invoicing. HDInsight also offers full compatibility with Microsoft business intelligence technology such as SQL Server 2012 Analysis Services and SQL Server 2012 Reporting Services. Key Points: With Big Data technology, Ferranti experienced the following benefits: Increased Sustained Database Write Speed to 200 Million Rows in 15 Minutes Discovered ways to access and analyze more of the data generated by the smart meters, providing new business opportunities References: Microsoft Case Studies: Ferranti Computer Systems - Utilities ISV Scales to Meet Customer Needs for Storage and Analysis of Big Data /Ferranti-Computer-Systems/Utilities-ISV-Scales-to-Meet-Customer-Needs-for-Storage-and- Analysis-of-Big-Data/ Reactive Extensions (Rx) Data Feed from Smart Meters Windows Azure HDInsight SQL Server (In-Memory OLTP) Microsoft Dynamics AX SQL Server Analysis Services SQL Server Reporting Services © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Server & Tools Business
4/19/2017 Windows Azure Storage Let’s now look at Windows Azure storage. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Demo 1: Setting up the Windows Azure storage account
Server & Tools Business 4/19/2017 Demo 1: Setting up the Windows Azure storage account Batch Layer Speed Layer Serving Layer Azure Blob storage Azure Storage Explorer Talk Track: That’s enough talk for now. Let’s get to this sessions demo. For each of the boot camp demos, I’ll put the technologies that I’ll show off in context with the Big Data Lamba architecture. At the end of each presentation, you will get a chance to try out the demos yourself as hands-on-lab exercises. Here, we will setup a Windows Azure storage account that will be used for the batch layer. The blob store information will be served up using the Azure Storage Explorer available on Codeplex. I’ll then show how to access the storage account using the Azure Storage Explorer. In this demo, you will setup a Windows Azure Storage account for your storage related activities. You will also discover some of the new features that Windows Azure Storage Account has to offer. Besides, you will also learn using Azure Storage Explorer for exploring the Windows Azure Storage. Here end-users interact with the Windows Azure Blob storage via the Azure Storage Explorer tool as a front end interface. Azure Storage Explorer Windows Azure Blob storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
15
Blob Storage Concepts Store large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system Blobs can be exposed publically over HTTP Securely lock down permissions to blobs Blob Container Account Images PIC01.JPG Video VID1.AVI Pages/ Blocks Block/Page PIC02.JPG Contoso Talk Track: Let’s now take a look at the hierarchy of Blob storage The Blob service provides storage for entities, such as binary files and text files. The REST API for the Blob service exposes two resources: Containers Blobs. A container is a set of blobs; every blob must belong to a container. The Blob service defines two types of blobs: Block blobs, which are optimized for streaming. Page blobs, which are optimized for random read/write operations and which provide the ability to write to a range of bytes in a blob. Blobs can be read by calling the Get Blob operation. A client may read the entire blob, or an arbitrary range of bytes. Block blobs less than or equal to 64 MB in size can be uploaded by calling the Put Blob operation. Block blobs larger than 64 MB must be uploaded as a set of blocks, each of which must be less than or equal to 4 MB in size. Page blobs are created and initialized with a maximum size with a call to Put Blob. To write content to a page blob, you call the Put Page operation. The maximum size currently supported for a page blob is 1 TB. Codeplex tools like the Azure Storage Explorer make managing blobs easy. There is also a rich API build to manage storage with PowerShell via the Rest based API. Key Points: The Blob service defines two types of blobs: Block blobs, and Page blobs Accessible via REST APIs, Windows Azure Storage Client library or using Windows Azure drives Stores large amounts of unstructured text or binary data with the fastest read performance Highly scalable, durable, and available file system References: Data Management and Business Analytics: storage/#blob
16
Server & Tools Business
4/19/2017 Getting started with HDInsight Service Let’s now look at how to get started with the Windows Azure HDInsight Service © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
17
Demo 2: Setting up the Windows Azure HDInsight cluster
Server & Tools Business 4/19/2017 Demo 2: Setting up the Windows Azure HDInsight cluster Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console Talk Track: In this demo, I’ll show you how easy it is to setup an HDInsight cluster that uses the Blob Storage as a Hadoop File System. Here, the HDInsight cluster will be part of the Batch layer and I’ll show you the essentials for accessing the cluster using the HDInsight console. A Microsoft HDInsight cluster is associated with a Windows Azure Storage account or some affinity group. End users can use the HDInsight Console to interact with the HDInsight cluster and also the Windows Azure Storage account associated with this cluster. HDInsight Console Windows Azure HDInsight Windows Azure Blob storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
18
Demo 3: Loading data into Windows Azure storage for use with HDInsight
Server & Tools Business 4/19/2017 Demo 3: Loading data into Windows Azure storage for use with HDInsight Batch Layer Speed Layer Serving Layer Windows Azure HDInsight Azure Blob storage HDInsight Console Talk Track: In the last demo for this presentation, I’ll show how you can prepare and upload data into the Hadoop cluster – specifically the Windows Azure Blob storage that is associated with our HDInsight cluster. As described in earlier demo, the HDInsight cluster is associated with a Windows Azure Storage account or some affinity group. End users can use the HDInsight Console to interact with the HDInsight cluster and also the Windows Azure Storage account associated with this cluster. HDInsight Console Windows Azure HDInsight CSV files from local disk Windows Azure Blob storage © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
19
Server & Tools Business
4/19/2017 Easy Access to Data, Big & Small Let’s now see how Microsoft Big Data solutions allow you to work with any data. © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
20
Easy Access to Data, Big & Small
Server & Tools Business 4/19/2017 Easy Access to Data, Big & Small Search, Access & Shape Simplify access to public & corporate data Easily preview, shape, & format your data Key Features Power Query Windows Azure Marketplace Windows Azure HDInsight Service Parallel Data Warehouse with Polybase Combine with Unstructured Combine and refine data across multiple sources Gain insight across relational, unstructured, & semi-structured data Talk Track: Lets now talk about how technologies like Data Explore in Excel, Window Azure Marketplace and HDInsight service, and Polybase can provide you with easy access to all data – both big and small. With Power Query, you have an intuitive and consistent experience for discovering, combining, and refining any data, including relational, structured and semi-structured, OData, Web, Hadoop, Azure Marketplace, and more. Power Query also provides you with the ability to search for public data from sources such as Wikipedia. The Windows Azure HDInsight Service makes Apache Hadoop available as a service in the cloud, provides a software framework designed to manage, analyze and report on Big Data. As a cloud-based service, it makes these resources available in a simpler, more scalable, and cost efficient environment. As a part of Microsoft’s overall Big Data strategy, SQL Server 2012 Parallel Data Warehouse includes PolyBase, a new breakthrough technology that dramatically simplifies combining non- relational data and traditional relational data for analysis. PolyBase seamlessly provides the benefits of “Big Data” without the complexities. Normally, organization would need to burden IT with pre-populating the data warehouse with Hadoop data, or undergo extensive training on MapReduce in order to query non-relational data. With Polybase, this is made easy, enabling you to rapidly query massive data sets by combining MPP data warehousing performance with Hadoop. Key Points: Power Query: Discover, Search, Transform and Combine data (relational, structured and semi- structured) from across multiple sources. Windows Azure HDInsight Service: Framework to manage, analyze and report on Big Data, using Apache Hadoop services in the cloud. SQL Server 2012 Parallel Data Warehouse (Polybase): Faster ways to combine non-relational data and traditional relational data for analysis. References: Easily Manage & Query Common management of structured & unstructured data Query across relational DB & Hadoop with single T-SQL Query © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
21
Server & Tools Business
4/19/2017 Learn more Getting Started with HDInsight Azure HDInsight and Azure Storage /21/azure-hdinsight-and-azure-storage.aspx Talk Track: That’s it for this session. To learn more about what I just showed in this session, check out these to resource links for Getting Started with HDInsight and Azure HDInsight and Azure Storage Thank you! END OF PRESENTATION © 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
22
Questions?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.