Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Analytics Platform System

Similar presentations


Presentation on theme: "Microsoft Analytics Platform System"— Presentation transcript:

1 Microsoft Analytics Platform System
12/3/2017 Data Warehousing Technical data deck © 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

2 Contents Microsoft Analytics Platform System overview
Enterprise-ready big data Next-generation performance at scale The modern data warehouse Reliability through security Conclusion

3 Microsoft Analytics Platform System
No-compromise modern data warehouse solution Enterprise-ready Hadoop with Azure HDInsight and the simplicity of PolyBase Enterprise-ready big data Optimized performance with MPP technology and in-memory columnstore Performance at scale Improved data access, querying, and movement Data structures made simple Enhanced value through high availability and disaster recovery Reliability through security Key Points The Microsoft Analytics Platform System (APS) is the only no-compromise, modern data warehouse solution that brings together both Hadoop and a relational database management system (RDBMS) in a single, pre-built appliance with tier-one performance, the lowest TCO in the industry, and accessibility to all of their users through some of the most widely used BI tools in the industry. Microsoft APS was built to scale to handle the highest data requirements and the newest data types stored in Hadoop while delivering performance that meets today’s near real-time requirements. Microsoft APS combines the Microsoft industry-leading RDBMS platform and the Microsoft SQL Server Parallel Data Warehouse (PDW) appliance with Microsoft Hadoop distribution and Azure HDInsight for non-relational data to offer an all-in-one big data analytics appliance. Tying together and integrating the worlds of relational and non-relational data is PolyBase, the Microsoft-integrated query tool available only in APS. Supporting Points A modern data warehouse is progressive, meeting broad needs and requirements: Hadoop integrates and operates seamlessly with your relational data warehouses. Data is easily queried by SQL users, without additional skills or training. Enterprise-ready, meaning it is secure and easily managed by IT. Insights accessible to everyone.

4 Microsoft Analytics Platform System
No-compromise modern data warehouse solution Enterprise-ready Hadoop with Azure HDInsight and the simplicity of PolyBase Enterprise-ready big data Optimized performance with MPP technology and in-memory columnstore Performance at scale Improved data access, querying, and movement Data structures made simple Enhanced value through high availability and disaster recovery Reliability through security Key Points A high-performance massively parallel processing (MPP) relational data warehouse, SQL Server PDW, and the Azure HDInsight Hadoop solution together in the same appliance. Query Hadoop data without IT having to pre-load data first into the warehouse. Supporting Points Native Microsoft BI integration that allows analysis of relational and non-relational data with familiar tools like Microsoft Excel. Standard SQL queries (instead of MapReduce) to access and join Hadoop data with relational data. PolyBase provides a fundamental breakthrough in data processing by enabling seamless integration between traditional data warehouses and “big data” deployments.

5 Big data insights for anyone
New insights with familiar tools through native Microsoft BI integration Everyone else using Microsoft BI tools Takes advantage of high adoption of Microsoft Excel, Power View, PowerPivot, and SQL Server Analysis Services Minimizes IT intervention for discovering data with tools such as Excel Offers Hadoop 2.31 tools like MapReduce, Hive, and Pig for data scientists Enables DBA and power users to join relational and Hadoop data with T-SQL Key Points Big data adds value to the business when it is accessible to BI users with tools that are easy to use and consume for IT and business users alike. While some Hadoop solutions provide BI tools or require customers to find third-party BI solutions, these often result in a low adoption rate due to learning curves. Surveys from Gartner, the BI Survey, and Intelligent Enterprise have found abysmal BI adoption of current solutions (about 8 percent) due to complaints of the complexity of the tools and the cost of the solution. The BI solution must be provided to users in tools they already know and can consume. APS is the only data warehouse and Hadoop solution that has native end-to-end Microsoft BI integration with PolyBase, allowing users to create new insights themselves by using tools they already know. Every Microsoft BI client, SSAS, SSRS, PowerPivot, and Power View all have native integration with APS and ubiquitous connectivity across the entire SQL Server ecosystem. With native BI integration, Microsoft is unique in offering an end-to-end big data solution where there are no barriers in the journey from acquiring raw data of all types to displaying high-value insights to all users. By providing the customer with the capability to connect to Hadoop in APS, with PolyBase for querying and joining any type of data in T-SQL―and by democratizing access to data insight through familiar BI tools―Microsoft is prepared to provide big data insights to any user. Power users Data scientist

6 End-to-end solution Big data sources (raw, unstructured) ERP CRM LOB
Data and compute-intensive app Summarized and loaded via PolyBase Microsoft APS Alerts, notifications Business insights Sensors Hortonworks or Cloudera Hadoop Azure HDInsight on Windows Server SQL Server Anlalytical Platform Server SQL Server FTDW data marts Interactive reports Devices HDInsight on Windows Azure SQL Server Reporting Services Performance scorecards Key Points The role of big data and Azure HDInsight within the Microsoft data platform. The figure does not include all of the Microsoft data-related products, and it doesn’t attempt to show physical dataflow. For example, data can be ingested into Azure HDInsight without going through an integration process, and a data store could be the data source for another process. Instead, the figure illustrates as layers the applications, services, tools, and frameworks that work together to allow you to capture, store, and process data, and then visualize the information it contains. Notice that the big data technologies span both the Integration and Data stores layers. Supporting Points Microsoft implements Hadoop-based big data solutions using the Hortonworks Data Platform (HDP), which is built on open source components in conjunction with Hortonworks. HDP is 100 percent compatible with Apache Hadoop and with open source community distributions. All components are tested in typical scenarios to ensure that they work together correctly, and that there are no versioning or compatibility issues. Developments are fed back into community through Hortonworks to maintain compatibility and support the open source effort. Microsoft and Hortonworks offer three distinct solutions based on HDP: Azure HDInsight. This is a cloud-hosted service available to Azure subscribers that uses Azure clusters to run HDP and integrates with Azure storage. For more information about Azure HDInsight, see What is Microsoft Azure HDInsight? and the Azure HDInsight page on the Azure website. Hortonworks Data Platform (HDP) for Windows. This is a complete package that you can install on Windows Server to build your own fully configurable big data clusters based on Hadoop. It can be installed on physical on-premises hardware, or in virtual machines in the cloud. For more information, see Microsoft Server and Cloud Platform on the Microsoft website and Hortonworks Data Platform. Microsoft Analytics Platform System. This is a combination of the massively parallel processing (MPP) engine in Microsoft Parallel Data Warehouse (PDW) with Hadoop-based big data technologies. It uses HDP to provide an on-premises solution that contains a region for Hadoop-based processing, together with PolyBase—a connectivity mechanism that integrates the MPP engine with HDP, Cloudera, and remote Hadoop-based services such as Azure HDInsight. It allows data in Hadoop to be queried and combined with on-premises relational data, and data to be moved into and out of Hadoop. For more information, see Microsoft Analytics Platform System. Source: Bots Data Management Gateway/ Enterprise Gateway Integrate/enrich SQL Server Analysis Services Crawlers Power BI ETL with SSIS, DQS, MDS ERP CRM LOB APPS Source systems

7 APS delivers enterprise-ready Hadoop with Azure HDInsight
Manageable, secured, and highly available Hadoop integrated into appliance SQL Server Parallel Data Warehouse Hadoop PolyBase End-user authentication with Active Directory Accessible insights for everyone with Microsoft BI tools High performance and tuned within appliance Key Points Communicate what Hadoop is, Hadoop with HDInsight capabilities in Azure, and how it integrates with APS. Supporting Points Azure HDInsight is an enterprise-ready, Hadoop-based distribution from Microsoft that brings a 100 percent Apache Hadoop solution to the data warehouse. APS gives customers Hadoop with the simplicity of a single appliance, and Microsoft integrates Hadoop data processing directly into the architecture of the appliance for optimum performance. The Azure HDInsight node has “shared nothing” access to CPU, memory, and storage. Azure HDInsight for APS is the most enterprise-ready Hadoop distribution in the market. Azure HDInsight offers enterprise-class security, scalability, and manageability. Thanks to a dedicated secure node, Azure HDInsight helps you secure your Hadoop cluster. Azure HDInsight also simplifies management through System Center, and organizations can provide multiple users with simultaneous access to Azure HDInsight within the appliance deploys with Active Directory. Managed and monitored using Microsoft System Center 100 percent Apache Hadoop

8 SQL Server analysis cubes
Can configure an SSAS cube to use: Multidimensional online analytical processing (MOLAP) storage Relational online analytical processing (ROLAP) storage MOLAP: SSAS extracts data and stores it in separate structures ROLAP: SSAS leaves data in the source database (MPP DWH). Optimizing for the ROLAP storage model may conflict with APS best practice of using minimal indexes. Some ROLAP features are not supported, such as materialized views. Fact tables (distributed – real time): Use ROLAP storage mode Fact tables (distributed – historical): Use aggregated tables in MOLAP storage mode (during drill- through, logic could change to use raw data in ROLAP mode) Dimension tables (replicated): Use MOLAP storage mode Dimension tables (large or slowly changing): Distribution: Round robin distributed (>=AU4) OR determine on individual basis by testing (<AU4) Storage mode: ROLAP, or possibly MOLAP if not updated often, honoring requirements for “how current” report needs to be and also by performance Use multiple small-measure groups: Use “proactive caching” (polling mode) Use MOLAP storage mode for “proactively cached data” Key Points One customer found that, to meet the load speed requirements, loading data for large dimensions into a distributed table was needed. However, for the CUBE processing performance, it was best to have this table as a REPLICATED table in MOLAP storage mode. He loaded this large dimension into a distributed table and CTASed the table into a replicated table that had been used for the CUBE in MOLAP storage mode. By doing so, he could achieve all his requirements! Supporting Points Dimension tables: If dimension tables are ROLAP, you will see that for each query, SSAS will send multiple queries to the APS: ONE for the FACT table’s result where all needed dimension tables are joined for filtering ONE for EACH needed dimension Using MOLAP for the dimensions reduces the number of queries sent to the APS quite a bit Slowly changing or large dimensions: By testing you will find the right balance between the number of queries sent to the APS and the desire to have very up-to-date reports (real-time reports).

9 SSAS recommendations – Partitioned
Use combination of MOLAP and ROLAP ROLAP for near real-time MOLAP for historical data summarized in SSAS (don’t try to store level of detail that is stored in APS/SQL DW in cube) Key Points SSAS supports three standard storage modes (MOLAP, ROLAP, HOLAP) while supporting proactive caching, which enables you to combine the best of both worlds (ROLAP and MOLAP storage) for both frequency of data refresh and OLAP query performance. MOLAP: This is the default and most frequently used storage mode. In this mode when you process the cube, the source data is pulled from the relational store, the required aggregation is then performed and finally the data is stored in the Analysis Services server in a compressed and optimized multidimensional format. After processing, once the data from the underlying relational database is retrieved there exists no connection to the relational data stores. So, if there are any subsequent changes in the relational data after processing, that will not be reflected in the cube unless the cube is reprocessed. This is called offline data-set mode. ROLAP: In comparison with MOLAP, ROLAP does not pull data from the underlying relational database source to the OLAP server, but rather both cube detail data and aggregation stay at the relational database source. In order to store the calculated aggregation, the database server creates additional database objects (indexed views). In other words, the ROLAP mode does not copy the detail data to the OLAP server. When a query result cannot be obtained from the query cache, the created indexed views are accessed to provide the results. Remember: Proactive caching to keep near-real-time partition updated Use drill-through reporting to get to more detailed data in APS/SQL DW

10 Data Management/Enterprise Gateway ‒ APS
Microsoft Data Management/Enterprise Gateway is pre-installed on control node (since AU3) Gateway to Power BI cloud admin can be added using generated gateway key Key Points Data Management/Enterprise Gateway for APS Since for AU3 - Connecting APS data with Power BI for Office 365 users Client software for connecting on-premises data sources, including APS, to cloud services Value proposition Allows hybrid data access Behaves as a security proxy Enables monitoring and logging Features Refreshes workbook data Connects to various data sources Supports Power Query mashup No change to corporate firewall Requires corpnet when configuring the credential Encrypts the credential with the certificate you own Moves data efficiently Supporting Points How to add a new data source Name Description Gateway Data source type: SQL Server Connection provider: SQL Server Native Client Server name: PDW IP, Port(17001) Database name

11 Unifying all of your data assets, cloud and on-premises
Office 365 Azure Data Management/Enterprise Gateway for APS Power BI site reports Azure DB Azure Machine Learning OData feeds Connecting APS data with Power BI for Office 365 users Tier 1 enterprise data hub Query on-premises data via Power BI Advanced analytics with Power BI combined with APS PolyBase Improved resiliency, high availability, and management Azure HDinsight Azure Stream Analytics Data Management/Enterprise Gateway service APS Other on-premises data sources Data Management/ Enterprise Gateway OData feeds Key Points Tier 1 enterprise data hub Provides secure connections among your on-premises data, including the Analytics Platform System SQL Server Parallel Data Warehouse, SQL Server, and Oracle databases. Secured credentials can be stored on your Office 365 tenant if you need to switch your on-premises Data Management/Enterprise Gateway client to another server. Query on-premises data via Power BI With the Data Management/Enterprise Gateway, your Power BI reports can use Power Query to access your three types of database servers or any on-premises OData feed (sources like web services connecting to Hadoop). Advanced analytics with Power BI combined APS PolyBase With APS, you can leverage PolyBase access of Azure HDInsight Hadoop data on your appliance, on-premises clusters of HortonWorks or Cloudera Hadoop data, and delimited files in Azure storage. Then you can join the data with your PDW data, and your Power BI reports can access results through persisted views on PDW. Improved resiliency, high availability, and management Your IT organization has all the tools available to make sure that your Power BI solutions are always available within Office 365. Supporting Points Expose on-premises SQL tables or views as OData feeds Requires a primary key or unique index Automatically registered in the data catalog Create a cloud endpoint Currently used for discovery and authentication only Serves actual OData results from connection to Data Management Gateway Make SQL Server and Oracle data consumable from Power Query Provides organizational ID authentication and redirection from cloud to on-premises PDW Hadoop SQL Server Oracle Azure HDInsight Intranet

12 Connecting islands of data with PolyBase
Bringing Hadoop point solutions and the data warehouse together for users and IT Select… Result set Provides a single T-SQL query model for PDW and Hadoop with rich features of T- SQL, including joins without ETL Uses the power of MPP to enhance query execution performance Supports Windows Azure HDInsight to enable new hybrid cloud scenarios Provides ability to query non-Microsoft Hadoop distributions, such as Hortonworks and Cloudera SQL Server Parallel Data Warehouse Microsoft Azure HDInsight and Blob Storage PolyBase + Hortonworks Key Points Communicate conceptually how companies are managing big data in current data warehouse environments. This shows both setting up a side-by-side Hadoop and ETL data into existing data warehouse. PolyBase is available only within the Microsoft Analytics Platform System. Supporting Points Many companies have responded to the explosion of big data by setting up side-by-side Hadoop ecosystems. However, these companies are learning the limitations of this approach, including: Steep learning curve of MapReduce and other Hadoop ecosystem tools Cost of installing, maintaining, and tooling side-by-side ecosystems to support two separate query models Many Hadoop solutions do not integrate into enterprise or other data warehouse systems, which creates complexity and cost as well as slows time-to-insights Some Hadoop solutions feature vendor lock-in, which creates long-term obligations Other companies set up costly extract, transform, and load (ETL) operations to move non- relational data directly into the data warehouse. This requires IT to modify or create new data schema for all new data, which is also time consuming and costly. As a result, performance is degraded, and it is often more expensive to integrate new data, build new applications, or access key BI insights. Hortonworks for Windows and Linux Cloudera

13 Servers Reporting PolyBase = runtime integration User perspective
External Table Data source File format Reporting PolyBase Bridge Systems perspective PDW engine service Key Points PolyBase simplifies this by allowing Hadoop data to be queried with standard Transact-SQL (T-SQL) query language, without the need to learn MapReduce and without the need to move the data into the data warehouse. PolyBase unifies relational and non-relational data at the query level. Integrated query: PolyBase accepts a standard T-SQL query that joins tables containing a relational source with tables in a Hadoop cluster referencing a non-relational source. It then seamlessly returns the results to the user. PolyBase can query Hadoop data in other Hadoop distributions such as Hortonworks or Cloudera. No difficult learning curve: Standard T-SQL can be used to query Hadoop data. Users are not required to learn MapReduce to execute the query. Cloud-hybrid scenario options: PolyBase can also query across Windows Azure HDInsight, providing a hybrid cloud solution to the data warehouse. Talk Track Just-in-time data integration Across relational and non-relational data High performance parallel architecture Fast, simple data loading Best of both worlds Uses computational power at source for both relational data and Hadoop Opportunity for new types of analysis Uses existing analytical skills Familiar SQL semantics and behavior Query with familiar tools SSDT Includes Power BI Microsoft Azure

14 Microsoft Analytics Platform System
No-compromise modern data warehouse solution Enterprise-ready Hadoop with Azure HDInsight and the simplicity of PolyBase Enterprise-ready big data Optimized performance with MPP technology and in-memory columnstore Performance at scale Improved data access, querying, and movement Data structures made simple Enhanced value through high availability and disaster recovery Reliability through security Key Points Go beyond your traditional SQL Server deployment with PDW’s massively parallel processing appliance that can handle the extremes of your largest mission-critical requirements. Supporting Points Up to 100x faster than legacy warehouses with in-memory and updateable columnstore. Massively parallel processing architecture that parallelizes and distributes computing for high query concurrency and complexity. Built-in hardware redundancies for fault tolerance. Microsoft as a single point of contact for hardware and software support.

15 Scaling out your data to petabytes
Scale-out technologies in Analytics Platform System Scale-out Multiple nodes with dedicated CPU, memory, and storage Ability to incrementally add hardware for near- linear scale to multiple petabytes Ability to handle query complexity and concurrency at scale No “forklift” of prior warehouse to increase capacity Ability to scale out PDW or Azure Blob Storage PDW PDW PDW PDW PDW PDW Key Points Communicate that the Microsoft Modern Data Warehouse can scale out to petabytes of relational data. Supporting Points SQL Server 2012 PDW is a scale-out, massively parallel processing (MPP) architecture that represents the most powerful distributed computing and scale. This type of technology powers supercomputers to achieve raw computing horsepower. As more scale is needed, more resources can be added to scale out to the largest data warehousing projects. PDW uses a shared-nothing architecture where there are multiple physical nodes, each running its own instance of SQL Server with dedicated CPU, memory, and storage. As queries go through the system, they are broken up  to run simultaneously over each physical node. The benefit is in the highest performance at scale through parallel execution. You need only to add new resources to continually scale out this implementation. This means if you have high concurrency and complex queries at scale, PDW can handle these queries with ease. This also means that PDW can be optimized for “mixed workload” and “near-real time” data analysis. Enjoy faster data loading and more than 2 TB per hour. Other benefits of scale-out technologies: Start small and scale out to petabytes of data Optimized for “mixed workload” and “near-real time” data analysis Support for high concurrency Query while you load No hardware bottlenecks No “forklifting” when you want to scale your system PDW 0 TB 6 PB 15

16 Azure Storage Blobs in APS
Parallel data warehouse Data warehouse Analysis server Cube Structured/ unstructured data ETL Key Points Azure HDInsight provides the ability to access data that is stored in Azure Blob Storage. Hadoop supports a notion of the default file system. The default file system implies a default scheme and authority. It can also be used to resolve relative paths. During the Azure HDInsight creation process, an Azure Storage account and a specific Azure Blob Storage container from that account is designated as the default file system. In addition to this storage account, you can add additional storage accounts from the same Azure subscription or different Azure subscriptions during the creation process.

17 Blazing-fast performance
MPP and in-memory columnstore for next-generation performance Columnstore index representation Updateable clustered columnstore vs. table with customary indexing Up to 100x faster queries Up to 15x more compression C1 C2 C3 C4 C5 C6 Key Points Use an interesting story to show how the new modern data warehouse can handle real-time performance with in-memory technologies. Supporting Points The biggest issue with traditional data warehouses is that data is stored in rows. The values comprising one row are stored contiguously on a page. Rowstores are not optimal for many queries that are issued to the data warehouse, because the query will return the entire row of data—including fields that might not be needed as part of the query. By changing the primary storage engine to a new, updateable version of in-memory columnstore, data is grouped and stored one column at a time. The benefits to doing this are as follows: Only the columns needed must be read. Therefore, less data is read from disk to memory and later moved from memory to processor cache. Columns are heavily compressed, which reduces the number of bytes that must be read and moved. Most queries do not touch all columns of the table. Therefore, many columns will never be brought into memory. This, combined with excellent compression, improves buffer pool usage—which reduces total I/O. PDW parallel data loading performance had improved by up to 60 percent. One scale unit can import as much as 480 GB/hour and was validated at 9.4 TB/hour for a 20 scale unit configuration. The result is massive compression (sometimes as much as 10x), as well as massive performance gains (as much as 100x). Use of columnstore also leverages your existing hardware instead of requiring you to purchase a new appliance. Parallel query execution The MPP architecture of PDW takes advantage of having data distributed across the servers. In addition, each processor in the server can further provide parallel execution of the sub-queries on the server. With Symmetric Multi-Processing (SMP) architecture, queries can execute in parallel, but can only work with one server. The SMP architecture is thus limited to the size of one server. Parallel query execution Data storage in columnar format for massive compression Data loading into or out of memory for next-generation performance, with up to 60% improvement in data loading speed Updateable and clustered for real-time trickle loading Query Results

18 The power of MPP Symmetric multi-processing, SMP/NUMA
Massively parallel processing (MPP) Multiple processors that share same memory but contain one copy of operating system, applications in use, and data―thus reducing transaction time by dividing workload Multiple CPUs are used to complete individual processes simultaneously All CPUs share same memory (SMP) OR different groups of CPUs use different sets of memory on same machine (NUMA) All SQL Server implementations until now have been SMP/NUMA Numerous processors with their own RAM, containing their own copy of the operating system, application code, and part of the data process independently from the others Multiple nodes (computers) are used to process a single task Many separate CPUs run in parallel across multiple nodes to execute a single task Each set of CPUs has its own memory Applications must be segmented, using high- speed communications among nodes Key Points What is MPP? MPP stands for “massively parallel processing” A divide and conquer strategy Take one big problem and break it up to execute it individually Team approach: “Many hands make light work” Supporting Points Requirements: A method for scheduling tasks A communication plan to maximize efficiency A distribution method for exchange of goods SMP/NUMA: All packaged SQL Server implementations up until now have been SMP and/or NUMA SQL Server is highly optimized for NUMA architectures This shared architecture can have trouble in high concurrency, large scanning workloads

19 Full parallelism Partial parallelism
Performed in parallel across compute nodes Performed in parallel across distributions Used to: Create new objects Read data from a distributed table Examples: CTAS (Create Table As Select) SELECT Performed in parallel across compute nodes Performed in series across distributions Guarantees transactional behaviour Examples: INSERT UPDATE DELETE Key Points Now that we’ve laid the groundwork for APS, let’s dive into how we load and process data at such high performance and scale. The PDW region of APS is a scale-out version of SQL Server that enables parallel query execution to occur across multiple nodes simultaneously. The effect is the ability to run what appears to be a very large operation into tasks that can be managed at a smaller scale. For example, a query against 100 billion rows in a SQL Server SMP environment would require the processing of all of the data in a single execution space. With MPP, the work is spread across many nodes to break the problem into more manageable and easier ways to execute tasks. In a four-node appliance (see picture), each node is only asked to process roughly 25 billion rows―a much quicker task. Supporting Points Parallel execution dramatically reduces response time for data-intensive operations on large databases typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Parallel execution is sometimes called parallelism. Simply expressed, parallelism is the idea of breaking down a task so that, instead of one process doing all of the work in a query, many processes do part of the work at the same time.

20 Azure SQL Data Warehouse
A relational data warehouse as a service, fully managed by Microsoft Industry’s first elastic cloud data warehouse with enterprise-grade capabilities Support for your smallest to largest data storage needs while handling queries up to 100x faster Elastic scale & performance Massively parallel processing Scale to petabytes of data Instant-on compute scales in seconds Query relational/non-relational Market-leading price & performance Use simple billing compute and storage Pay for what you need, when you need it with dynamic pause Bring DW to the cloud without rewriting Azure SaaS Azure Public cloud Office 365 Get started in minutes Integrate with Azure ML, Power BI, and ADF Become enterprise-ready Powered by the cloud Key Points What are the four pillars of the service? Supporting Points Elastic scale and performance Separating compute and storage forces a focus on compute as a premium and storage as a commodity. Separating compute enables near-instant movement from one compute size to another, with little to no impact on customers. Storage can scale to PB+ easily, with proven Azure storage as the backing system Separate compute allows for cloud scenarios such as service pausing with minimal cost (storage only at commodity rates). This fulfills the premise of the cloud as a source of massive compute infrastructure. Premium possibilities include add-on SLA for restarts ($$$) and AI-based scaling for workloads based on historical usage data (e.g., learned system scalability and pausing based on optimal configurations such as need for data, past usage, expected usage—like a pause for weekends). Market-leading price/performance Moving the conversation away from nodes and disks to storage and compute as a way to provide insight. Customers don’t care that a DW has x spindles and x cores—they really just want insights into their data. Amazon targets disks and storage without a way for the customer to see how fast they can get an answer from their data. SQL DW will target Amazon Redshift as a price competitor and will provide on par or better performance for a matched price. As a key differentiator, Amazon requires that you decide what system you’d like up front (dense compute or dense storage). Separating the two means that as a customer, I can decide how much data vs. compute to use and adjust over time and then simply be billed for what I use. This differentiates the service as Amazon requires you to “buy/reserve” nodes and storage whether you use the capacity or not. One is a cost calculation (optimizing for usage of data center resources) and the other is customer centric—focusing on what the customer gets and pays for the service with the same or better performance. True SQL Server experience SQL Server 2014-based MPP architecture with a control node and n compute nodes that has been proven via four or more years in the appliance market serving regional customers (Dixons, DenizBank, Helmerich & Payne), to mid-sized companies (T-Mobile, Progressive, Tangerine/ING, LG), to Forbes 100 corporations (Walmart, Johnson & Johnson, Comcast, MetLife). TSQL language surface to simplify workload migrations from SQL Server SMP and SQL DB customers to the service minimizing conversion costs, which means a large addressable market (SQL SMP data warehouse is estimated at 220k units with multiple workloads per unit). Extends the SQL Server brand into the largest DW scenarios without rewrites (premium version of the SMP box and cloud services). Launching with a targeted 60 ISV partners (Tableau, Informatica, Revolution Analytics, SAP, MicroStrategy, Attunity, Tibco, and others) covering data loading, BI and reporting, and analytics (including predictive). Query across all data types and locations The service will integrate with key first-party Azure offerings such as Power BI, Azure Machine Learning, and Azure Data Factory to provide a model for exposing and aggregating data to customers through a full experience (data ingestion, aggregation, analytics and reporting). PolyBase as an integration technology enables the surfacing of data from Hadoop (Horton Data Platform 1.x/2.x, Cloudera 4.x/5.x, HDInsight) and Azure storage as a remote table in the service. Using both Map Reduce/YARN query engines as well as internal SQL Server query optimization, PolyBase is able to efficiently determine how and where to query the data (either a push-down query or streaming data back to SQL dw). This enables all kinds of hybrid scenarios such as structured (say POS and inventory data) and unstructured data (web logs, social media sentiment analysis) into a single world with the SQL DW service as a hub to all data insights.

21 Company policy restrictions
Better together: Azure SQL DW Service and APS Test/dev Age data Company policy restrictions Disaster recovery Test new ideas in SQL Data Warehouse before rolling out to production in APS Age data to SQL Data Warehouse, but maintain full MPP power Store data in APS that company policy prohibits from being in the cloud Use SQL Data Warehouse or APS as disaster recovery solution with dual load Key Points Why is a hybrid service important? Supporting Points Distributes data and query processing across multiple servers, which eliminates bottlenecks inherent in SMP architecture. Integrated storage is more cost-effective compared to most SAN systems, with lowest DW Appliance $$/TB in the industry. Linear scale out to 6 petabytes of usable storage for DW. Significantly less performance tuning over an SMP solution. Seamless RDBMS and Hadoop integration with PolyBase.

22 Microsoft Analytics Platform System
No-compromise modern data warehouse solution Enterprise-ready Hadoop with Azure HDInsight and the simplicity of PolyBase Enterprise-ready big data Optimized performance with MPP technology and in-memory columnstore Performance at scale Improved data access, querying, and movement Data structures made simple Enhanced value through high availability and disaster recovery Reliability through security Key Points Unlike other vendors in the data warehousing space who deliver a high-end appliance at a high price, Microsoft engineered PDW for optimal value through software innovations which result in a lower cost for the appliance. Supporting Points Resilient, scalable, and high-performance storage features built into software, which lowers hardware costs. Reduced data center and management costs by combining a relational data warehouse with Hadoop in one appliance. Data compression up to 15x with the in-memory updateable columnstore, saving up to 70% of storage requirements. Start small with a quarter rack allowing you to right-size the appliance rather than over-acquiring capacity. Use the same tools and knowledge as SQL Server for scale-out data warehouse or big data. Co-engineered with hardware partners offering the highest level of product integration, and shipped to your door offering fastest time-to-value. The lowest price per terabyte for a data warehouse appliance.

23 OR To distribute or replicate? Use replicated tables (APS only)
Use distributed tables Use replicated tables (APS only) Large tables – generally larger than 5 GB (rule of thumb) Fact/detail tables When full table scans do not provide acceptable performance Distributed: Table structure is distributed across all MPP nodes of Data Warehouse database HASH: Where the value of a single column gets hashed to define distribution number where the records will get inserted ROUND_ROBIN: Where the records are distributed in a “round robin” manner across all distributions (new in AU4) Table is small, fairly static (lookup or dimension tables) – generally smaller than 5 GB Replicated: Table structure that exists as a full copy within each MPP node (not supported in Azure SQL Data Warehouse) OR Key Points SQL Server Parallel Data Warehouse (PDW) is a Massively Parallel Processing (MPP) appliance which follows the Shared Nothing Architecture. This means that the data is spread across the compute nodes in order to benefit from the storage and query processing of the MPP architecture (divide and conquer paradigm). SQL Server PDW provides two options to define how the data can be distributed: distributed and replicated tables. Supporting Points A distributed table may have only one distribution column  HASH distributed OR otherwise ROUND_ROBIN distributed “Rule of thumb”  means there should be exceptions made when required! Remember not every scenario is the same and this slide can not cover all scenarios of the world Temporary table A table used to distribute data to multiple locations for query optimizations and aggregations

24 Table geometries (MPP DWH engine)
A distributed table maintains only a single copy of the data, which is divided into separate physical tables on each compute node Each compute node has eight physical tables to store distributed data, known as distributions A six-node appliance has 48 distributions (6 x 8) in which to store records from a distributed table A 10 GB distributed table on a six-node appliance requires 10 GB of disk space The number of nodes is actually irrelevant when calculating distributed table sizing MPP engine uses a hash or a round robin algorithm to determine into which distribution to store a record. The hash algorithm is based on a value in the table column. Typically, large fact tables are distributed *Note that storage is not reflected in the diagram Create table Node 2 Node 5 Node 1 Node 4 Node 3 Node 6 Key Points A distributed table is a table in which all rows have been spread across the SQL Server PDW compute nodes based upon a row hash function. Each row of the table is placed on a single distribution as assigned by a deterministic hash algorithm taking as input the value contained within the defined distribution column. The following slide depicts how rows would typically be stored within a distributed table. Talking Track Distributed tables Each distribution is a separate physical table in the DBMS Related to but different from partitioned tables

25 Table geometries (APS only)
A replicated table is an identical table on every compute node in appliance A six-node appliance has six identical physical instances of a table, one per node A 1 GB replicated table on a six-node appliance requires 6 GB of disk space It is not possible to define a replicate to exist on only some nodes with the appliance MPP engine uses specific data movement processes when creating or modifying replicate tables to ensure the data is consistent across all nodes Typically, smaller dimension tables are replicated Ultimately, decision to replicate should be driven by query requirements, not storage *Note that storage is not reflected in diagram Create table Node 2 Node 5 Node 1 Node 4 Node 3 Node 6 Key Points The above diagram depicts how a row would be stored within a replicated table. A replicated table is striped across all of the disks assigned to each of the distributions within a compute node. Because you are duplicating all data for each replicated table on each compute node, you will require extra storage, equivalent to the size of a single table multiplied by the number of compute nodes in the appliance. For example, a table containing 100 MB of data on a PDW appliance with 10 compute nodes will require 1 GB of storage. Supporting Points Replicated tables Ideal for small dimension tables To achieve ‘Shared Nothing’, small sets of data can be more efficiently stored in full replicated table

26 MPP SQL table geometries
SMP system Compute nodes Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Prod Dim ID Prod Category Prod Sub Cat Prod Desc SQL Server DD SF 1 PD SD ID SQL Server DD SF 2 PD SD ID Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SQL Server DD SF 3 PD SD ID Key Points Table distribution example: Uses both kinds of tables for co-location purposes Dimension tables are typically replicated Appliance maintains data integrity across all nodes Fact tables distributed on a single column Temporal or time-based elements are not recommended for table distributions Distributed columns should be based on data model and utilization Choose a column with high cardinality and low variance Store Dim ID Store Name Store Mgr Store Size Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End SQL Server DD SF 4 PD SD ID

27 More table geometry for MPP Data Warehouse
Physical table structures Additional table geometries Heap Clustered index Clustered columnstore index Partitions Non-clustered index Key Points What is MPP? MPP stands for “massively parallel processing” A divide and conquer strategy Take one big problem and break it up to execute it individually Team approach: “Many hands make light work” Talking Track Requirements: A method for scheduling tasks A communication plan to maximize efficiency A distribution method for exchange of goods SMP/NUMA: All packaged SQL Server implementations up until now have been SMP and/or NUMA SQL Server is highly optimized for NUMA architectures This shared architecture can have trouble in high concurrency, large scanning workloads

28 Non-clustered indexes
Use “judiciously” Will become fragmented with DML Use alter index to defrag Will affect performance of DML operations Will take disk space Key Points Non-clustered indexes are fully independent of the underlying table, and up to 999 can be applied to both heap and clustered index tables. Unlike clustered indexes, a non-clustered index is a completely separate storage structure. On the index leaf page, there are pointers to the data pages. Supporting Points Nonclustered indexes: Compressed (like all MPP DWH data) Can be multi-column Non-clustered indexes are generally not recommended for use with PDW. Because they are a separate structure from the underlying table, loading a table with non-clustered indexes will be slower due to the additional I/O required to update the indexes. While some situations may benefit from a covering index, better overall performance can usually be obtained using a clustered columnstore index (described in the next section).

29 Columnstore Overview …
Segments Row group Overview C1 C2 C3 C4 C5 C6 Clustered columnstore index is comprised of two parts: Columnstore Deltastore Data is compressed into segments Ideally about 1 million rows (subject to system resource availability) per distribution, per partition A collection of segments representing a set of entire rows is called a row group The minimum unit of I/O between disk and memory is a segment (red block is a single segment) Execution batch model (as opposed to traditional row mode) moves multiple rows among iterators: approximately 1,000 rows per distribution, per partition Dictionaries (primary and secondary) are used to store additional metadata about segments Columnstore Key Points Columnstore indexes are designed for data warehouse-type queries where only a portion of the table columns are required. Supporting Points Users can isolate required data far more efficiently than with traditional row-based storage. Typically provides higher compression ratios due to tables generally containing more duplicate values in a column than a row: Page compression is normally ~ 2.5x to 3.5x depending on data. Columnstore is normally ~ 5x to 15x depending on data. Higher compression ratios contribute to a greater ROI for raw storage. New batch mode processing enables lower CPU utilization for the same number of rows processed. Important existing data warehouse functionality is supported, such as partition switching, splitting, and merging. C1 C2 C3 C4 C5 C6 Delta (row) store

30 Columnstore More than compression – batch mode Row-mode scan example
SQL 2012 implements batch mode processing to handle rows of a batch at a time in addition to a row at a time SQL 2008 and earlier versions only had row processing Typically, batches of about 1,000 rows are moved among iterators Significantly less CPU is required due to average number of instructions per row decreasing Batch mode processing is only available for some operators Hash Join/Aggregate supported Merge Join, Nested Loop Join, and Stream Aggregate not supported SELECT COUNT(*) FROM FactInternetSales_Column ms SELECT COUNT(*) FROM FactInternetSales_Row 6704 ms Batch-mode scan example Key Points It is possible for SQL Server to use columnstore to scan the data but execute in row mode. SQL may estimate batch but actual is row (memory or threads – no batch in MAXDOP 1). SQL will never estimate row but actual is batch. Queries were executed against cold cache.

31 Data loading options Data loading options NUMA node 1 DWS Loader
Partner Solutions Large ecosystem of powerful ETL tools Direct loading from a variety of sources Transparently parallelized loads Guaranteed consistency and stability DWS Loader Blazingly fast custom loader for APS/DWS BulkLoad API Seamless loading to and from files/SQL SMP SSIS Parity with on-premises abilities of powerful loading suite PolyBase Advanced data movement and deep integration with Hadoop Attunity Replication of data from first- and third-party storage worldwide Informatica Migration of advanced Informatica packages directly to Azure NUMA node 1 Key Points Tools used DWLoader utility – APS BCP – SQL DW SQL Server Integration Services (SSIS – PDW destination adaptor) – APS SQL Server Integration Services (SSIS – ODBC destination) – SQL DW CREATE TABLE AS SELECT (CTAS) – APS or SQL DW Standard SQL DML statements: INSERT/SELECT – APS or SQL DW – already discussed in slide deck “08-MPPDWH-Data_Access_and_Querying” PolyBase – APS or SQL DW Third-party tools Informatica PowerCenter (versions up to 9.5.1) Windows environments only Default operation row by row MPP DHW loader – Uses bulk functionality SAP BusinessObjects Data Integrator Attunity Replicate – Trickle loading using downloader under the hood. Replicate offers fast setup and streamlined loading from a wide variety of sources for APS. It empowers APS users, allowing them to quickly integrate data from heterogeneous data sources and maintain changed data continuously. Process APPEND, RELOAD, UPSERT: Staging tables are locked during entire load process. Destination table is locked only in final step at row level (reload, append, and upsert modes). FASTAPPEND: Using this option will lock the destination table with an exclusive (U) lock on table level. Load file Bulk insert Each batch sorted in-memory or TempDB Partitioned staging table Insert-select Partitioned final table

32 Microsoft Analytics Platform System
No-compromise modern data warehouse solution Enterprise-ready Hadoop with Azure HDInsight and the simplicity of PolyBase Enterprise-ready big data Optimized performance with MPP technology and in-memory columnstore Performance at scale Improved data access, querying, and movement Data structures made simple Enhanced value through high availability and disaster recovery Reliability through security Key Points Using commodity servers, storage, drives, and networking devices from our three hardware partners (Dell, HP, and Quanta), Microsoft is able to offer a high-performance scale-out data warehouse solution that can grow to very large data sets while providing redundancy of each component to ensure high availability. Starting with standard servers and JBOD (Just a Bunch of Disks) storage arrays, APS can grow from a simple two node and storage solution to 60 nodes. At scale, that means a warehouse that houses 720 cores, 14 TB of RAM, 6 PB of raw storage, and ultra-high-speed networking using Ethernet and InfiniBand networks while offering the lowest price per terabyte of any data warehouse appliance on the market (Value Prism Consulting).

33 Logical architecture … … 1. Optimizer creates parallel query plan
2. Each compute server runs portion of query in parallel 3. Data is combined and returned to user User query Metadata Statistics Data Movement Service Optimizer Key Points APS follows the shared-nothing architecture, and each processor has its own set of disks. Data in a table can be “distributed” across nodes, such that each node has a subset of the rows from the table in the database. Each node is then responsible for processing only the rows on its own disks. In addition, every node maintains its own lock table and buffer pool, eliminating the need for complicated locking and software or hardware consistency mechanisms. Because shared nothing does not typically have a severe bus or resource contention, it can be made to scale massively. Supporting Points APS Components MPP Engine The MPP Engine runs on the control node. It is the brain of the SQL Server Parallel Data Warehouse (PDW) and delivers the MPP capabilities. It generates the parallel query execution plan and coordinates the parallel query execution across compute nodes. It also stores metadata and configuration data for the PDW region. Data Movement Service (DMS) The data movement service (DMS) moves data between compute nodes and between the compute nodes and the control node. It bridges the shared nothing with the shared world. SQL Server Databases Each compute node runs an instance of SQL Server to process queries and manage user data. DMS Balanced storage Compute server aka “The Brawn” DMS Balanced storage Compute server aka “The Brawn” DMS Balanced storage Compute server aka “The Brawn” Data Movement Service

34 Customer architecture
Source systems ETL Delivery channels MDM application Power BI PowerPivot data discovery Master data MDM Hub Analysis server Cube ETL Device files Staging server ETL Parallel data warehouse Transaction data Staging Data warehouse SharePoint extranet Phase 2 Power View reports and dashboards SSRS standard reports Excel Services ETL Azure Blob Storage Windows Azure ML + HDInsight Big data POC Phase 3 Key Points We worked with the customer to develop three phases for the solution deployment. Talking Points Phase 1: The solution architecture starts with the cloud-born, consumer-facing device data. Data is loaded from the devices and staging environment before it is transformed and loaded into a data warehouse. From there, the data is aggregated into an Analysis Services cube and reported on through Excel. Phase 2: We introduce a customer-facing SharePoint extranet where the customer will be able to analyze data with standard reports, Power View dashboards, and Excel. Phase 3: We take the device files that are stored in Azure Blob Storage and leverage HDInsights and Azure Machine Learning to provide predictive analytics for customers. Denote future phases

35 Customer architecture
Source systems ETL Delivery channels MDM application Power BI PowerPivot data discovery Master data MDM hub Analysis server Cube ETL Device files Staging server ETL Parallel data warehouse Transaction data Staging Data warehouse SharePoint extranet Phase 2 Power View reports and dashboards SSRS standard reports Excel Services ETL Azure Blob Storage Windows Azure ML + HDInsight Big data POC Phase 3 Key Points We worked with the customer to develop three phases for the solution deployment. Talking Points Phase 1: The solution architecture starts with the cloud-born, consumer-facing device data. Data is loaded from the devices and staging environment before it is transformed and loaded into a data warehouse. From there, the data is aggregated into an Analysis Services cube and reported on through Excel. Phase 2: We introduce a customer-facing SharePoint extranet where the customer will be able to analyze data with standard reports, Power View dashboards, and Excel. Phase 3: We take the device files that are stored in Azure Blob Storage and leverage HDInsights and Azure Machine Learning to provide predictive analytics for customers. Denote future phases

36 How TDE works on APS CTL01 User creates master key in master
WFOHST01 DEK Cert Initiate Enable MK CTL01 User Master Temp User creates master key in master PDW creates master key on CTL01 PDW creates separate master key (all CMP) User enables appliance encryption PDW encrypts tempdb and pdwtempdb User creates certificate in master PDW creates certificate on CTL01 PDW exports certificate and imports it (all CMP) User creates database encryption key PDW creates database encryption key on CTL01 PDW creates different database encryption key (all CMP) Initiate database encryption PDW encrypts user database CMP01 CMP02 CMP03 CMP04 Key Points Consider segregating appliances that are encrypted versus not encrypted. Points to make clear: DMK and DEK are unique per node, and only the password is the same Supporting Points Appliance-level encryption enables PDW to initiate encryption/decryption of pdwtempdb. Decryption requires the following: DROP or Decrypt all user databases on the appliance DISABLE Appliance Level Encryption – decrypts pdwtempdb Stop and Start appliance – decrypts tempdb CMP05 CMP06

37 APS high availability No single point of failure Control host
Infiniband 2 Ethernet 2 Control host Compute host 1 Compute 1 VM Compute 1 VM FAB AD VMM CTL Infiniband 1 Infiniband 1 Ethernet 1 Ethernet 1 Key Points The fabric layer is built using technologies from the Microsoft portfolio that enable rock-solid reliability, management, and monitoring without having to learn anything new. Starting with Microsoft Windows Server 2012, the appliance builds a solid foundation for each workload by providing a virtual environment based on Hyper-V that also offers high availability via failover clustering―all managed by Active Directory. Combining this base technology with Clustered Shared Volumes (CSV) and Windows Storage Spaces, the appliance is able to offer a large and expandable base fabric for each of the workloads while reducing the cost of the appliance by not requiring specialized or proprietary hardware. Each of the components offers full redundancy to ensure high availability in failure cases. Compute host 2 Failover host Compute 2 VM

38 Analytics Platform System
Concurrency that fuels rapid adoption Great performance with mixed workloads Analytics Platform System PDW Hortonworks PolyBase ETL/ELT with SSIS, DQS, MDS ERP CRM LOB APPS SQL Server SMP Intra-day CRTAS Columnstore Link table ETL/ELT with DWLoader Near real-time Real-time Reporting and cubes ROLAP/MOLAP DirectQuery Key Points Building upon the fabric layer, the current release of APS offers two distinct workload types: structure data through SQL Server Parallel Data Warehouse (PDW), and unstructured data through Hadoop. These workloads can be mixed within a single appliance, offering flexibility for customers to tailor the appliance to their business needs. Supporting Points PDW is a massively parallel processing, shared-nothing scale-out solution for Microsoft SQL Server that eliminates the need to “forklift” additional very large and very expensive hardware into your datacenter to grow as the volume of data exhaust into your warehouse increases. Instead of having to expand from a large multi-processor and connected storage system to a massive multi-processor and SAN-based solution, PDW uses the commodity hardware model with distributed execution to scale out to a wide footprint. This scale-wide model for execution has been proven as a highly effective and economical way to grow your workload. Hadoop/big data PolyBase SNAC BI tools Ad hoc queries Fast ad hoc

39 Conclusion PolyBase Massively parallel processing High availability
Compute host 1 Infiniband 2 Ethernet 2 Compute host 2 Compute 2 VM Failover host Ethernet 1 Infiniband 1 Control host FAB AD VMM CTL Compute 1 VM Scale out compute SQL DW instance Hadoop VMs/ Azure Storage PolyBase SQL Server Control node Shell appliance (SQL Server) foo SELECT SELECT Engine service Compute node (SQL Server) foo Compute node (SQL Server) foo Compute node (SQL Server) foo Microsoft APS Azure SQL Data Warehouse The Microsoft Analytics Platform System can meet the demands of your evolving data warehouse environment with its scale-out, massively parallel processing integrated system supporting hybrid data warehouse scenarios. It provides the ability to query across relational and non-relational data by leveraging Microsoft PolyBase and industry-leading big data technologies. Azure SQL Data Warehouse enables APS customers with different workloads to leverage a cloud-based MPP engine and cloud-based analytics by supporting a hybrid architecture or eco-system with APS + Azure SQL Data Warehouse.

40

41 Top ISV solutions in Data Warehousing (EDW)
Converged data Platform MapR provides the industry's only converged data platform that integrates the power of Hadoop and Spark with global event streaming, real-time database capabilities, and enterprise storage, enabling customers to harness the enormous power of their data. A majority of customers achieves payback in fewer than 12 months and realizes greater than 5X ROI. Key Use Cases/ Benefits MapR’s NoSQL database requires less administration MapR supports the broadest array of Hadoop components MapR is 2-5x faster than HDFS based distributions MapR was designed to scale to exabytes Availability: Global Average deal Revenue: $200K Link to Marketplace Analytic database EXASOL’s high-performance, in-memory technology allows companies to run analytic projects today that previously weren’t possible with legacy database technology. Enables companies turn their staff into data experts and data heroes by unearthing business value from day one. Key Use Cases/ Benefits Combines in-memory, columnar storage and massively parallel processing technologies to provide unrivalled performance Grows with your data volumes, extends your system and increase performance by adding additional nodes compatible with leading ETL and BI products such as Tableau as well as Hadoop Availability: Global Average Deal revenue: $8K/Quarter Link to Marketplace

42 Top ISV solutions in Data Warehousing (EDW)
Podium Data Podium is a data lake management software platform that radically improves the way enterprises manage, prepare, deliver and use business-critical information. Purpose built to leverage the performance and economic advantages of Hadoop, Podium helps organizations deliver data securely to the enterprise at a fraction of the time and cost associated with traditional data management approaches. Key Use Cases/ Benefits Gives business users self-service access to business-ready data in a secure repository Replace today’s costly web of redundant databases and data flows with a consolidated, scalable management platform Podium enables agile collaboration Availability: Global Average Deal Revenue: $50K Link to Marketplace XtremeData dbX Simple, fast, massively scalable data warehouse to ingest, integrate and analyze big data in the shortest time and least cost. Substantially reduces development time, supports existing data models and BI/ETL tools, and augmentation of new big data sources without data model redesign. Dramatically lower TCO and supports “pause mode” to avoid spend when not in use. Key Use Cases/ Benefits Enables customer to evolve to a massively scalable SaaS platform Reduces cycle time from a week to hours for integrating massive amounts of online/offline data Accelerated implementations and eliminated data warehouse development by leveraging OLTP data models for BI/reporting Availability: Global Average Deal revenue: $65K Link to Marketplace


Download ppt "Microsoft Analytics Platform System"

Similar presentations


Ads by Google