Presentation is loading. Please wait.

Presentation is loading. Please wait.

2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.

Similar presentations


Presentation on theme: "2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv."— Presentation transcript:

1 2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) Meinrad Weiss Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 1

2 2012 © Trivadis AGENDA 1.Overview Microsoft Data Warehousing Solutions 2.Parallel Data Warehouse (PDW) – What’s that? Or MPP vs. SMP 3.Hardware Architecture – Control Rack and Data Rack 4.Tools (Management Dashboard, Nexus Query Tool, DWSQL) 5.Distribution and Replication of Data 6.Table Constraints and Data Type Limitations 7.Comparison: Load Speed with SMP versus PDW 8.Basic Shared Nothing / Shuffle Moves 9.Concrete Offerings Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 2

3 2012 © Trivadis Microsoft Data Warehousing Solutions Scalable and reliable platform for data warehousing on any hardware Reference Architectures offering best price performance for data warehousing Scalable and reliable platform for data warehousing on any hardware Appliance for high-end data warehousing requiring highest scalability, performance, or complexity Ideal for data marts or small to mid-sized EDWs Ideal for data marts or small to mid-sized DWs with scan- centric workloads Ideal for large data marts or mid-sized EDWs Offers flexibility in hardware and architecture Software only Reference Architectures (software and hardware) Software only DW appliance (fully integrated software and hardware) Scale-up DW Scale-out DW with MPP 10s of TB 2 – 80 TB 10s of TB 10s - 100s of TB Hochperformante und Kostengünstige Data Warehouse Systeme

4 2012 © Trivadis Data Warehouse – Products Positioning Appliance Simplicity Scale Complexity HA by default SW-HW integration SQL Server 2008 R2 Fast Track SQL Server 2008 R2 Enterprise PDW SQL Server 2008 R2 Data Center PDW with Distributed Data Architecture Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

5 2012 © Trivadis Data Warehouse – Products Positioning 100% SQL Server 2008 R2 Compatibility Scale Complexity HA by default SW-HW integration SQL Server 2008 R2 with Fast Track Reference Architecture SQL Server 2008 R2 Enterprise PDW SQL Server 2008 R2 Data Center PDW with Distributed Data Architecture Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

6 2012 © Trivadis MPP vs. SMP Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 6  MPP - Massively Parallel Processing  Uses many separate CPUs running in parallel to execute a single program  Each CPU has its own memory and disks  High-speed communications between nodes  Applications must be segmented SMP MPP  SMP - Symmetric Multiprocessing  Multiple CPUs used to complete individual processes simultaneously  All CPUs share the same memory, disks, and network controllers  All SQL Server implementations up until now have been SMP

7 2012 © Trivadis Two hardware vendors: HP and Dell Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 7 Microsoft+Dell Parallel Data Warehouse Appliance Microsoft+HP Enterprise Data Warehouse Appliance

8 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control RackData Rack(s)

9 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control RackData Rack(s)

10 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL  Client connections always go through the control node  Windows Failover Cluster for Availability  Contains no persistent user data  Processes SQL requests  Prepares execution plan  Orchestrates distributed execution  Local SQL Server processes final query plan and aggregates results

11 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL  Provides Support and Patching for the Appliance  Holds image for re-deployment of compute node  Holds Active Directory

12 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL  Provides high-capacity storage for data files from ETL processes  Is available as a sandbox for other applications and scripts that run on the internal network  Provides SQL Server Integration Services Source Landing Zone Files Data Loader Compute Nodes DWLoader or SQL Server Integration Services

13 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control Rack SQL  Provides Integrated Backup Solution  Integrates with 3rd party backup products  Orderable in different sizes

14 2012 © Trivadis SQL Control Node Management Node Landing Zone Backup Node Control RackData Rack(s)  Data Rack Servers 5/10 active + 1 passive per Rack  InfiniBand, FC and Ethernet switching  Expansion Grow from 1/2–4 data racks, storage options, test/dev system  Consists of COMPUTE NODES and STORAGE NODES  Shared Nothing  Spare Node provides failover in case of node failure

15 2012 © Trivadis Compute Node Storage Node Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 15 SQL  Each MPP node is a highly tuned symmetric multi-processing (SMP) node with standard interfaces  More or less multiple FastTrack Servers  Provides dedicated hardware, database, and storage  Runs SQL Server 2008  Local Drives are configured as RAID 1

16 2012 © Trivadis Connectivity and Tools Nexus Query Chameleon DWSQL Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 16

17 2012 © Trivadis Web-BasedManagement Dashboard Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 17

18 2012 © Trivadis System Center (SCOM) Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 18

19 2012 © Trivadis Distribution and Replication of Data Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 19 SF -1 Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold Larger (> 10 B) Fact Table is Hash Distributed Across All Compute Nodes SF -1 SF -2 SF -3 SF -4 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End

20 2012 © Trivadis Distribution and Replication of Data Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 20 Time Dim Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Date Dim ID Calendar Year Calendar Qtr Calendar Mo Calendar Day Store Dim Store Dim ID Store Name Store Mgr Store Size Store Dim ID Store Name Store Mgr Store Size Product Dim Prod Dim ID Prod Category Prod Sub Cat Prod Desc Prod Dim ID Prod Category Prod Sub Cat Prod Desc Mktg Campaign Dim Mktg Campaign Dim Mktg Camp ID Camp Name Camp Mgr Camp Start Camp End TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Smaller (<5GB ) Dimension Tables are Replicated on Every Compute Node TDTD TDTD PDPD PDPD SDSD SDSD MDMD MDMD Sales Facts Date Dim ID Store Dim ID Prod Dim ID Mktg Camp Id Qty Sold Dollars Sold SF -1 SF -2 SF -3 SF -4 Result: Fact -Dimension Joins can be performed locally

21 2012 © Trivadis Creating a Database CREATE DATABASE PDW WITH (AUTOGROW = ON, REPLICATED_SIZE = 1024 GB, -- (per Node) DISTRIBUTED_SIZE = GB, -- (whole System) LOG_SIZE = 1024 GB); Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

22 2012 © Trivadis Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) Distribution on a PDW PDW Node 1 Create Table _a Create Table _b … Create Table _h 8 Tables per Node PDW Node 2 Create Table _a Create Table _b … Create Table _h PDW Node 10 Create Table _a Create Table _b … Create Table _h PDW Node … Final Result: 80 individual tables across a 10 node (1 data rack) appliance CREATE TABLE myTable (column Defs) WITH (DISTRIBUTION = HASH (id)); CREATE TABLE myTable (column Defs) WITH (DISTRIBUTION = HASH (id)); 22

23 2012 © Trivadis Create Replicated Table Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 23 CREATE TABLE DimProduct( ProductId BIGINT NOT NULL, Description VARCHAR(50), CategoryId INT NOT NULL, ListPrice DECIMAL(12,2)) WITH (DISTRIBUTION = REPLICATE); CREATE TABLE DimProduct( ProductId BIGINT NOT NULL, Description VARCHAR(50), CategoryId INT NOT NULL, ListPrice DECIMAL(12,2)) WITH (DISTRIBUTION = REPLICATE);  Creates tables on each of the individual compute nodes and assigns them to the REPLICATED file group.  Data Compression is automatically turned on  CREATE TABLE statement syntax varies slightly from its syntax in standard Transact-SQL

24 2012 © Trivadis Data Type Limitations  Most Scalar data types supported by SQL Server 2008 are supported by PDW  Main exceptions  Text (and related BLOB data types)  XML  SQL Variant  Timestamp  System and CLR UDTs  IDENTITY/DEFAULT constraints not supported  Character data types are case sensitive  PDW uses collation: Latin1_General_BIN Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 24 PDW-/ SQL Server-Data Types bigint binary bit char/nchar date, time datetime datetime2 datetimeoffset decimal float geography/geometry hierarchyid image int money numeric real smalldatetime smallint smallmoney sql_variant sysname text/ntext timestamp tinyint uniqueidentifier varbinary varchar/nvarchar xml

25 2012 © Trivadis Performance Tests: Data Load on SMP System Load a single 75 GB flatfile with 600 million rows on SQL SMP Bulk Copy Rows/sec 1hour48min.1hour48min Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

26 2012 © Trivadis Loading the same single flat file into PDW (75GB / 600 Mill rows) OptionLoadtimeMB/sec Reload09 min 35 sec133 Append09 min 42 sec131 FastAppend02 min 23sec534 Data Load on PDW dwloader.exe -i D:\TPCH\lineItem.tbl -M Fastappend -E -m -d tpch_100gb -E -c -b rt value -rv 100 -R LineItem.tbl.rejects -e ascii -t "|" -r \r\n -U sa -P {password} -T tpch_100gb.dbo.lineitem_Load dwloader.exe -i D:\TPCH\lineItem.tbl -M Fastappend -E -m -d tpch_100gb -E -c -b rt value -rv 100 -R LineItem.tbl.rejects -e ascii -t "|" -r \r\n -U sa -P {password} -T tpch_100gb.dbo.lineitem_Load 45 times faster Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

27 2012 © Trivadis dwloader.exe -i D:\TPCH\lineItem.tbl -M Fastappend -E -m -d tpch_100gb -E -c -b rt value -rv 100 -R LineItem.tbl.rejects -e ascii -t "|" -r \r\n -T tpch_100gb.dbo.lineitem_Load dwloader.exe -i D:\TPCH\lineItem.tbl -M Fastappend -E -m -d tpch_100gb -E -c -b rt value -rv 100 -R LineItem.tbl.rejects -e ascii -t "|" -r \r\n -T tpch_100gb.dbo.lineitem_Load Single file 600 MB/sec READ Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

28 2012 © Trivadis Copy table within PDW Table with 600 million rows (LineItem) 14 times faster min 07 sec (SMP) versus 2 min 12 sec... on PDW SELECT * INTO lineitem_copy FROM tpch_100gb.dbo.lineitem Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

29 2012 © Trivadis Landing Zone ETL Tools Hub and Spoke Departmental Reporting Regional Reporting High-Performance Reporting Central EDW Hub Regional Reporting with Business Decision Appliance Third-Party RDBMS Third-Party Data Integration Mobile Applications Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

30 2012 © Trivadis Remote table copy Create a Heap table on SMP destination server: Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 30 CREATE REMOTE TABLE tpch_Henk.dbo.LineItem_test *) AT ('Data Source = NYCPDW-LZ01,1433; User ID = sa; Password = x;') AS SELECT * FROM tpch_100gb.dbo.lineitem_load CREATE REMOTE TABLE tpch_Henk.dbo.LineItem_test *) AT ('Data Source = NYCPDW-LZ01,1433; User ID = sa; Password = x;') AS SELECT * FROM tpch_100gb.dbo.lineitem_load Check Status of copy operation SELECT * FROM sys.dm_pdw_dms_workers WHERE type = 'PARALLEL_COPY_READER' AND destination_info =[skypdw_Henk].[dbo].[LineItem_test]' *) Requires Infiniband HCA card in remote SQL Server SMP

31 2012 © Trivadis Result: 600 mill rows - Remote table copy 21:25 Minutes Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

32 2012 © Trivadis Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) Node 1 Basic Shared Nothing Join (Replicated/Distributed) Join Type: Shared Nothing Distribution: Compatible  Replication satisfies compatibility for inner joins  Store Sales distribution key not used Streaming Results  Results streamed to client  No aggregation (processing) on Control Node required ColorCost Red10 Green15 Blue25 Yellow5 Item Dim ss_keyColorQty 1Red5 3Blue10 5Yellow12 7Green7 Store Sales Result Set: 5,5 Result Set: 5,5 Final Result Set 5,5 : 6,5 Final Result Set 5,5 : 6,5 Result Set: 6,5 Result Set: 6,5 Node 2 ColorCost Red10 Green15 Blue25 Yellow5 Item Dim ss_keyColorQty 2Red3 4Blue11 6Yellow17 8Green1 Store Sales Replicated Table Distributed Table SELECT ss_key, Cost FROM item_dim a JOIN store_sales b ON a.color = b.color WHERE a.color = 'Yellow' SELECT ss_key, Cost FROM item_dim a JOIN store_sales b ON a.color = b.color WHERE a.color = 'Yellow'

33 2012 © Trivadis Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) Node 1 Basic Shared Nothing Join (Distributed/Distributed) ss_keyColorQty 1Red5 3Blue10 5Yellow12 7Green7 Store Sales Result Set Red,5 Result Set Red,5 Final Result Set Red,5 : Red,3 Final Result Set Red,5 : Red,3 Result Set Red,3 Result Set Red,3 Node 2 ss_keyColorQty 2Red3 4Blue11 6Yellow17 8Green1 Store Sales Distributed Table ws_keyColorQty 1Red15 3Blue20 5Yellow22 7Green17 Web Sales ws_keyColorQty 2Red13 4Blue21 6Yellow27 8Green11 Web Sales Distributed Table SELECT a.color, b.Qty FROM web_sales a JOIN store_sales b ON ws_key = ss_key WHERE a.color = 'Red' SELECT a.color, b.Qty FROM web_sales a JOIN store_sales b ON ws_key = ss_key WHERE a.color = 'Red' Join Type: Shared Nothing Distribution: Compatible  Join includes compatible distribution keys with compatible data types Streaming Results  Results streamed to client  No aggregation (processing) on Control Node required

34 2012 © Trivadis Distribution Incompatible: Shuffle Move Shuffle: Looks for a table that is a candidate for redistribution through a Shuffle operation:  At least one table in the query plan uses a distribution key in its join criteria.  Any table that is not joined on its distribution key is targeted for Shuffle first.  Leftmost table is chosen if multiple tables meet this criteria.  Data types for the join keys must be compatible.  The join must always be true (i.e., A=B or B=C is not valid) Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

35 2012 © Trivadis Redistribution Join: Shuffle Node 1 ss_keyVIDQty Store Sales Result Set 11,15, 5 Result Set 11,15, 5 Final Result Set 11,15,5 : 2,13,3 Final Result Set 11,15,5 : 2,13,3 Result Set 2,13, 3 Result Set 2,13, 3 Node 2 ss_keyVIDQty Store Sales Distributed Table vs_keyColorOrd 11Red15 32Blue20 54Yellow22 78Green17 Vendor Sales vs_keyColorOrd 2Red13 4Blue21 6Yellow27 8Green11 Vendor Sales Distributed Table ss_keyVIDQty Store Sales ss_keyVIDQty Store Sales SELECT vs_key, a.ord, b.qty FROM vendor_sales a JOIN store_sales b ON a.vs_key = b.VID WHERE a.color = 'Red' SELECT vs_key, a.ord, b.qty FROM vendor_sales a JOIN store_sales b ON a.vs_key = b.VID WHERE a.color = 'Red' Join Type: Redistribution  Tables are not co-located on their respective distribution keys Distribution: Incompatible  Distribution used from left table (vendor_sales) only Shuffle-Move Operation  Data from right table (Store_Sales) is rebuilt: DK = VID  Query is now distribution compatible Streaming Results  Results streamed to client  No aggregation (processing) on Control Node required Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW)

36 2012 © Trivadis Appliance Update AU Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 36  Performance – up to 10x improvement  Data Movement Services  New cost based Query Optimizer  New Data Movement Service  1/2 rack appliances from HP and Dell  System Center 2012 Integration (SCOM pack)  And YES … Support for Stored Procedures (subset)  Collations: Full support for international data  Native SQL Server drivers

37 Enterprise Data Warehouse Mark Wunderli Infra2Apps Technology Consultant Hewlett-Packard (Switzerland) GmbH

38 38 | Techtalk Trivadis | 9. Mai 2012 Scales from 1TB to Hundreds of TBs  Balanced solutions ideal for data marts - EDW with scan-centric workloads  Packaged and custom support  From SMB to Enterprise  Built on HP ProLiant G7 Reference Architectures and Appliances HP & Microsoft Data Warehousing Continuum: Business Data Warehouse ProLiant DL370 (2P) Up to 12 Cores Internal HDD (6TB) Basic RA ProLiant DL38x G7 (2P). Up to 24 Cores. P2000 G3 (up to 20TB) Mainstream RAs ProLiant DL58x G7 (4P) Up to 48 Cores P2000 G3 (Up to 60TB) Premium RA ProLiant DL980 G7 (8P) Up to 80 Cores P2000 G3 (Up to 95+ TB) HP Enterprise Data Warehouse Per /rack:11xProLiant DL360 (2P) 10 x P2000G3 (56 – 150TB) Up to 4 data racks (600TB)Appliances Reference Architectures Reference Architectures

39 39 | Techtalk Trivadis | 9. Mai 2012 Enterprise Data Warehouse Components Backup: Control Nodes Management Nodes Landing Zone Data Racks compute nodes per rack Infiniband Fibre Channel Ethernet Compute Nodes

40 40 | Techtalk Trivadis | 9. Mai 2012 High-Level PDW Architecture Data Rack Spare Node Storage Nodes (MSA) Fibre Channel Database Server Nodes Compute Node Infiniband Control Rack Control Node Active/Passive Landing Zone(s) Backup Node Management Node Active/Passive Client Drivers Corporate Backup Solution ETL Load Interface

41 41 | Techtalk Trivadis | 9. Mai 2012 Entry Level option for MPP technology Data Rack Configuration –Lower capacity requirements, same components (4+1 Compute/ 4 Storage Nodes) –HDD Capacities: 300 GB (LFF/SFF), 600 GB SFF, and 1 TB LFF) –Control Rack unchanged Upgrade to Full Data Rack: –At intro max 1 Half-Data Rack EDW orderable –Upgrade to full Data Rack possible Enterprise Data Warehouse Appliance ½ Rack Configurations Half-rack EDW configuration For Backup

42 42 | Techtalk Trivadis | 9. Mai 2012 Total Solution Support Microsoft and HP work together to provide a seamless support experience. Customers choose the service level from Microsoft and from HP to meet their business needs. MicrosoftHP Premier Support * −24x7 Reactive Support with on-site response −Proactive Services −Technical Account Management Premier Mission Critical All features of underlying Premier Plan above plus −Faster reactive support response time with on-site solution engineering support −Prioritized access to Microsoft product groups −Solution supportability review and architectural guidance for maximum performance or upgrade through add-on Support Plus 24 −Reactive 24x7 hardware and software support for HP appliance components with a 4 hr onsite hardware response Proactive 24 Service −Integrated hardware and software support including proactive and reactive services to improve stability and availability across your IT environment Critical Service −Comprehensive support solution designed to help minimize the business impact of downtime for mission critical applications or * Premier support plan (Standard level or above) is a prerequisite for PDW customers HP and Microsoft Converged Systems

43 2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Merci! Thank you! Grazie mille! VIELEN DANK! Hewlett-Packard (Switzerland) GmbH Trivadis AG Beste Skalierbarkeit dank massiv paralleler Verarbeitung mit "Parallel Data Warehouse" (PDW) 43


Download ppt "2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv."

Similar presentations


Ads by Google