Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft.

Similar presentations


Presentation on theme: "Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft."— Presentation transcript:

1 Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft

2 APS Success story 2011 Gartner DW MQ Microsoft released its own MPP data warehouse appliance, the SQL Server 2008 R2 PDW, in November 2010, but the date of its availability did not allow us to consider it when deciding Microsoft's position in the present Magic Quadrant. The lack of attention to high availability, clustering and management, coupled with a late-to-market MPP solution (PDW), shows that Microsoft has generally not understood the market's direction and needs before other vendors. 2012 Gartner DW MQ The best summary of issues probably comes from one of Microsoft's customer references, "Easy to use. Hard to make perfect." Put simply, based on the strengths and weaknesses cited by references, Microsoft offers all of the "parts" of a solution, but it is difficult to assemble and use those parts out of the box. Microsoft maintains these issues are mitigated by the Reference Architectures in Fast Track (which does receive high praise in the market) and appliances such as Parallel Data Warehouse. 2013 Gartner DW MQ By offering DBMS software, providing reference architectures, prebuilding and preloading implementations of reference architectures, offering an appliance and offering professional services (and partner connections), Microsoft offers almost every configuration of data warehouse deployment. Moreover, Microsoft's Parallel Data Warehouse appliance, despite a slow start, has been adopted by approximately 100 organizations in the past 18 months. 2014 Gartner DW MQ Gartner estimates Microsoft's relational DBMS revenue grew 13.6% during 2013 — faster than the overall market. Strength: Microsoft offers appliances, reference architectures including a variety of hardware, prebuilt offerings built to customer selections then delivered ready to run, software licensing and managed services data warehouses. Microsoft has taken steps in pursuing the LDW with HDInsight (HDP for Windows), PolyBase and Microsoft Cloud (Windows Azure Infrastructure Services can be used to deploy a data warehouse).

3 APS OVERVIEW Massively Parallel Data Distribution Columnar Storage Hadoop Integration Analytics handling

4 Analytics Platform System on a Glance

5 Logical Architecture for Parallel Data Warehouse Database “host” Servers Control Host Node Direct Attached Storage Nodes Client Queries Virtualization spare  All servers are virtualization hosts  Running Windows Server 2012 R2  Control and compute nodes are virtual  All run SQL Server 2014  Control node spreads data and workload across compute nodes  Data loads are in parallel and take advantage of the power of all nodes Fast Infiniband interconnection

6 Scalability: Massively Parallel and Shared nothing Smallest (0TB) To Largest (5PB) Start small with a few Terabyte warehouse Add capacity up to 5 Petabytes 0TB 5 PB Add Capacity Add Capacity Just grow by adding scale units An SMP system would have needed to be completely reconfigured

7 Distributed and Replicated Tables dimTime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimStore Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size dimProduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc factSales Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold DimMktCampaign Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Smaller Dimension Tables are Replicated on Every Compute Node TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD

8 Distributed and Replicated Tables dimTime Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day dimStore Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size dimProduct Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc factSales Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold DimMktCampaign Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Large Fact Tables are Split and Distributed among the Compute Nodes

9 Create Table Example CREATE TABLE SalesFact ( DateKey INT NOT NULL, CustomerKey INT, DollarAmount MONEY) WITH (DISTRIBUTION = HASH(CustomerKey)) PDW Node 1 Create Table _a Create Table _b … Create Table _h 8 Tables PDW Node 8 Create Table _a Create Table _b … Create Table _h PDW Node …

10 Columnstore overview Data is stored in columns -> One data type, better compression Only columns are read which are needed for query -> Column elimination Additional information about content of segments is stored in dictionaries -> Segment elimination for queries Better compression + less data to read -> Faster i/o in-memory caching (Vertipaq) + less data -> more data handling in RAM Execution in batch mode (as opposed to traditional row mode) moves multiple rows between iterators: ~ 1000 Rows -> efficient use of HW acceleration Advantages C1 C2 C3 C5C6C4 Row group Segments … Delta (row) store Column- store

11 Define a file in Hadoop as external table. Just metadata in Parallel Data Warehouse Single T-SQL query model for PDW and Hadoop including joins Parallel access between Compute Nodes in PDW and Data Nodes in Hadoop Use existing BI tools without any adaptation to Hadoop, just T-SQL Query Windows Azure HDInsight and Hadoop distributions such as Hortonworks and Cloudera Query Hadoop data with T-SQL using PolyBase SQL Server Parallel Data Warehouse Cloudera Hortonworks (Windows, Linux) Windows Azure HDInsight PolyBase Microsoft HDInsight Select… Result set

12 Big data insights for any user No IT intervention required Analyze PDW and Hadoop data in the same view Allow any users to create new insights with familiar tools Leverages high adoption of Excel, Power View, Power Pivot, and SSAS Power Users Data Scientists Everyone else using Microsoft BI tools

13 APS OVERVIEW Massively Parallel Data Distribution Columnar Storage Hadoop Integration Analytics handling

14 Analytics Process, Example Telecom (1/2) Call Detail Records 5 mio users 100 CDRs per day 2kB 90 days 90 TB, 45 bln rows Analytical Data Set for User Classification 5 mio users 100 key indicators 1kB per row 5 GB, 5 mio rows Examples: # of phone calls/ day # of calls in evenings # of SMS # of international calls avg. payment delay … In APS SUM, AVG, COUNT_BIG, WHERE, GROUP BY, HAVING, NTILE, …

15 Analytics Process, Example Telecom (2/2) Analytical Data Set for User Classification 5 mio users 100 key indicators 1kB per row 5 GB, 5 mio rows In AzureML, R (Revolution Analytics), … Build Model Clustering Random Forest Support Vector Machine … Apply Model To Analytical Data Set Assign Customer to Group Export Sample, e.g. 100k customers Apply to all 5 mio customers

16 Part of a product family: From SQL server standalone to Cloud service offerings TCO: Very low, especially when looking on the whole bundle: ETL (SSIS), PDW, Data marts (SQL server) and Analytics (SSAS, SSRS) Appliance: Much lower effort for DBAs Microsoft product stack integration – SSIS, SSAS, SSRS, PowerPivot, System Center, integration with Cloud services Linear Scaling via Shared Nothing xVelocity: Column Store and In-Memory execution Polybase: Integration with Big Data and Hadoop HDInsight integrated: fast Infiniband interconnect, management and security APS Differentiators “Microsoft exhibits one of the best value propositions on the market with a low cost and a highly favorable price/performance ratio” - Gartner, February 2012

17 Nagrađujemo vas sa 100 WinCoin bodova što ste posjetili predavanje. Osvojite dodatnih 100 WinCoin bodova ukoliko popunite službeni upitnik. HVALA!

18 MVA http://www.microsoftvirtualacademy.com Successful proffessionals never stop learning. Microsoft Virtual Academy offers online Microsoft trainings led by experts to help proffessionals to upgrade their knowledge. Trainings are prepared by leading eyperts from different technology areas. After you take a training, you can test your knowledge. To better understand this session, I advise you to take following trainings: Big Data with the Microsoft Analytics Platform System Big Data with the Microsoft Analytics Platform System http://www.microsoftvirtualacademy.com/training- courses/big-data-with-the-microsoft-analytics- platform-system

19


Download ppt "Microsoft Analytics Platform System Stefan Cronjaeger, Microsoft."

Similar presentations


Ads by Google