Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321.

Similar presentations


Presentation on theme: "Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321."— Presentation transcript:

1

2 Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321

3

4 Source: TDWI Report – Next Generation DW Data Warehousing has shifted almost entirely towards the appliance model due to speed of the balanced appliance and scalability of scale out (MPP) solutions. Jim Cobelius, Forrester Research

5 Source: MS internal analysis, DBSMIT Cloud Market Opportunity Forecast CAGR -0.3% 26.2% 7.1% Share(‘15) 4.6% 5.0% 30.0% 60.4% 7.1%

6

7 Scale out Scalable Standards Based Flexible Cost Effective

8 CONTROL RACK DATA RACK Control Node (query submitted here) Management Node Landing Zone Backup Node Query is executed on all nodes Multiple queries are simultaneously executed across all nodes PDW supports querying while data is loading

9 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End PDW Compute Nodes Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold

10 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End PD TD MD SD PD TD MD SD PD TD MD SD PD TD MD SD Smaller Dimension Tables are Replicated on Every Compute Node

11 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End PD TD MD SD PD TD MD SD PD TD MD SD PD TD MD SD SF-1 SF-2 SF-3 SF-4 Larger Fact Table is Hash Distributed Across All Compute Nodes SF-1 SF-2 SF-3 SF-4

12 SQL Server PDW Appliance

13 Shuffle Movement DMS Redistributes the data by color values in parallel. Compute Node 1 Compute Node 2 Example: Select [color], SUM([qty]) from [Store Sales] group by [color]; Example: Select [color], SUM([qty]) from [Store Sales] group by [color]; Return Ss_idcolorqty Store Sales 1 Red5 3 Blue11 5 Red12 7 Green7 Ss_idcolorqty Store Sales 2 Red8 4 Blue10 6 Yellow12 Distributed Table Temp_1 Red5 12 Red8 Green7 Temp_1 Blue11 Yellow12 Blue10 colorqty colorqty Hash Blue21 Red25 Green7 Yellow12 colorqty Hash Parallel Merge and Aggregate

14 Legend: Control Node Client Interface (JDBC, ODBC, OLE-DB, ADO.NET) Client Interface (JDBC, ODBC, OLE-DB, ADO.NET) DMS Manager PDW Engine … Compute Node 1 DMS Core PDW Agent Landing Zone Node Bulk Data Loader PDW Agent Management Node Active Directory PDW Agent Compute Node 2 DMS Core PDW Agent Compute Node 10 DMS Core PDW Agent PDW service Data Movement ServiceDMS= Parallel Data WarehousePDW= ETL Interface ETL Interface Data Rack (up to 4)Control Rack

15

16 SQL Server Compatibility BI, Analytics, & ETL Integration Performance At Scale Broader functionality Full Alignment Less work for the same results Do the same work more efficiently Native Support for -Analysis Services -Reporting Services -PowerPivot Lay the foundation for broad connectivity support

17 Control Node

18 Shell Appliance (SQL Server) Shell Appliance (SQL Server) Engine Service Plan Steps Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Control Node SELECT foo

19 Shell Appliance (SQL Server) Shell Appliance (SQL Server) Engine Service Plan Steps MEMO Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Compute Node (SQL Server) Control Node SELECT Return

20 1. Simplification and space exploration Query standardization and simplification (e.g. column reduction, predicates push-down) Logical space exploration (e.g. join re-ordering, local/global aggregation) Space expansion (e.g. bushy trees – dealing with intermediate resultsets) Physical space exploration Serializing MEMO into binary XML (logical plans) De-serializing binary XML into PDW Memo 2. Parallel optimization and pruning Injecting data move operations (expansion) Costing different alternatives Pruning and selecting lowest cost distributed plan 3. SQL Generation Generating SQL Statements to be executed

21

22 (l_o = o_o) O (o_o) LI (l_o) (l_o = o_o) shuffle (l_pk) O (o_o) LI (l_o) (l_pk = p_pk) broadcast P (p_pk) SELECT * from orders JOIN lineitem on (o_orderkey = l_orderkey) JOIN part on (l_partkey = p_partkey) WHERE p_name like '%smoke%'; P (p_pk)

23 Seconds Queries 5x improvement in terms of total elapsed time out of the box

24 Goal Eliminate CPU utilization spent on data conversions Further parallelize operations during data moves Functionality Using ODBC instead of ADO.NET for reading and writing data Minimizing appliance resource utilization for data moves Benefits Better resource, CPU, utilization 6x or more faster move operations Increased concurrency Mixed workload (loads + queries)

25

26 SQL PDW Clients (ODBC, OLE-DB, ADO.NET) SQL Server Clients (ADO.NET, ODBC, OLE-DB, JDBC) TDS Server: 10.217.165.13, 17001 Server: 10.217.165.13, 17000 SequeLink Goal ‘Look’ just like a normal SQL Server Better integration with other BI tools Functionality Use existing SQL Server drivers to connect to SQL Server PDW Implement SQL Server TDS protocol Named Parameter support SQLCMD connectivity to PDW Benefits Use known tools and proven technology stack Existing SQL Server ’eco-system’ 2x performance improvement for return operations 5x reduction of connection time

27 Goal Support common scenarios of code encapsulation and reuse in Reporting and ETL Functionality System and user-defined stored procedures Invocation using RPC or EXECUTE Control flow logic, input parameters Benefits Enables common logic re-use Big impact for Reporting Services scenarios Allows porting existing scripts Increases compatibility with SQL Server Syntax CREATE { PROC | PROCEDURE } [dbo.]procedure_name [ { @parameter data_type } [ = default ] ] [,...n ] AS { [ BEGIN ] sql_statement [;] [...n ] [ END ] } [;] ALTER { PROC | PROCEDURE } [dbo.]procedure_name [ { @parameter data_type } [ = default ] ] [,...n ] AS { [ BEGIN ] sql_statement [;] [...n ] [ END ] } [;] DROP { PROC | PROCEDURE } { [dbo.]procedure_name } [;] [ { EXEC | EXECUTE } ] { { [database_name.][schema_name.]procedure_name } [{ value | @variable }] [,...n ] } [;] { EXEC | EXECUTE } ( { @string_variable | [ N ]'tsql_string' } [ +...n ] ) [;] Unsupported Functionality Stored Proc NestingOutput Params Return Try-Catch

28 Goal Support local and international data Functionality Fixed server level collation User-defined column level collation Supporting all Windows collations Allow COLLATE clauses in Queries and DML Benefits Store all the data in PDW w/ additional querying flexibility Existing T-SQL DDL and Query scripts SQL Server alignment and functionality Syntax CREATE TABLE T ( c1 varchar(3) COLLATE traditional_Spanish_ci_ai, c2 varchar(10) COLLATE …) SELECT c1 COLLATE Latin1_General_Bin2 FROM T SELECT * FROM T ORDER BY c1 COLLATE Latin1_General_Bin2 Unsupported Functionality  Cannot specify DB collation during DB creation  Cannot alter column collations for existing tables

29 Connector for Hadoop Bi-directional (import/export) interface between MSFT Hadoop and PDW Delimited file support Adapter uses existing PDW tools (bulk loader, dwsql) Low cost solution that handles all the data: structured and unstructured Additional agility, flexibility and choice Connector for Informatica Connector providing PDW source and target (mappings, transformations) Informatica uses PDW bulk loader for fast loads Leverage existing toolset and knowledge Connector for Business Objects

30

31 Seconds Queries

32 Portal ETL PDW Operational DB’s

33

34

35 Infiniband GBit link

36

37 demo PowerPivot with SQL Server PDW … just like any other SQL Server

38

39

40

41 Sensor/ RFID Data Blogs, Docs Web Data HADOOP

42

43 Sensor/ RFID Data Blogs, Docs Web Data SQL Server PDW Interactive BI/Data Visualization SQOOP Application Programmers DBMS Admin Power BI Users

44 … Landing Zone Compute Node 1 Compute Node 8 HDFS … PDW- configuration file PDW Hadoop Connector SQOOP export with source (HDFS path) & target (PDW DB & table) 1. FTP Server Copies incoming data on Landing Zone 3.3. 2.2. Read HDFS data via mappers Invokes ‘DWLoader’ Telnet Server 4.4. Control Node Compute Nodes Windows/ PDW Linux/ Hadoop 5.5.

45 demo Hadoop Sqoop Connector with SQL Server PDW … integrating unstructured data into your end-to-end DW/BI solution

46

47 Q1 Q2 Q3 Q4Q1 Q2 Improved node manageability Better performance and reduced overhead OEM requests Programmability Batches Control flow Variables Temp tables QDR infiniband switch Onboard Dell Columnar store index Stored procedures Integrated Authentication PowerView integration Workload management LZ/BU redundancy Windows 8 SQL Server 2012 Hardware refresh CALENDAR YEAR 2011 CALENDAR YEAR 2012 Cost based optimizer Native SQL Server drivers, including JDBC Collations More expressive query language Data Movement Services performance SCOM pack Stored procedures (subset) Half-rack 3rd party integration (Informatica, MicroStrategy, Business Objects, HADOOP) Q4 V-Next Appliance Update 3 Appliance Update 1 Shipped Appliance Update 2 Q3 Shipped

48

49 DBI209 – Big Data, Big Deal Lots of BI Tool Specific Related Sessions (PowerPivot, Analysis services, Etc.) Breakthrough Insights: Big Data Analytics & Data Warehousing Demo Station PDW Deep Dive Session Online from TechEd 2010

50 @sqlserver @TechEd_europe #msTechEd mva Microsoft Virtual Academy SQL Server 2012 Eval Copy Get Certified! Hands-On Labs

51 Connect. Share. Discuss. http://europe.msteched.com Learning Microsoft Certification & Training Resources www.microsoft.com/learning TechNet Resources for IT Professionals http://microsoft.com/technet Resources for Developers http://microsoft.com/msdn

52 Evaluations http://europe.msteched.com/sessions Submit your evals online

53

54


Download ppt "Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321."

Similar presentations


Ads by Google