Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Warehousing: SQL Server Parallel Data Warehouse AU3 update

Similar presentations


Presentation on theme: "Data Warehousing: SQL Server Parallel Data Warehouse AU3 update"— Presentation transcript:

1 Data Warehousing: SQL Server Parallel Data Warehouse AU3 update
Dandy Weyn Sr. Technical Product Manager Microsoft Corporation @ilikesql

2 Fast Growing Industry and Enterprise Data..
Problem: DataWarehousing systems continue to grow at fast pace New types of large data sets and sources have emerged Data is not in uniform format and shape What is needed? A solution that: Scales from few TBs to PBs of data Allows adding capacity/power as needed Offers variety of choices tailored towards custom needs Handles all the data: Structured, semi-structured and unstructured Unicode and Non-Unicode

3 Microsoft Data Warehouse Offerings
BDWA Fast Track Data Warehouse Effort to Build Very High Very Low Moderate Capacity Variable 5 TB 14 TB 20 TB 40 TB 80 TB 500 TB+ Concurrency Light Medium High Query Complexity

4 SQL Server | Appliances
HP Enterprise Data Warehouse Appliance HP Business Data Warehouse Appliance Dell Parallel Data Warehouse Appliance HP Enterprise Database Consolidation Appliance HP Business Decision Appliance

5 SQL Server Parallel Data Warehouse
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse SQL Server Parallel Data Warehouse Tier-1 Enterprise Data Warehouse Appliance Offering High scalability from tens to hundreds of terabytes High performance through the MPP system Flexibility and Choice Choice of deployment options through distributed architecture Most Comprehensive Solution Complete data warehouse solution spanning desktop, enterprise data warehouse, and data marts

6 PDW – Client Connectivity
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse PDW – Client Connectivity SQL Client Drivers Support/Patching ETL Load Interface Corporate Backup Solution CONTROL RACK DATA RACK

7 Microsoft PDW Appliance – powered by Dell
PowerEdge R610 Database Servers MD3620f Storage Nodes Control Nodes (R710) Active / Passive Client Drivers Management Servers (R610) Data Center Monitoring Dual Infiniband Dual Fiber Channel Landing Zone (R510) ETL Load Interface Backup Node (R710 and MD3600f w/MD1200’s) Corporate Backup Solution Spare Database Server Corporate Network Private Network

8 Microsoft and Hewlett-Packard
HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse PDW – Query Processing SQL ??? ??? ??? ??? ??? QUERY ??? ??? ??? ??? ??? ??? ??? CONTROL RACK DATA RACK

9 DATA RACK CONTROL RACK SQL Microsoft and Hewlett-Packard
HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse SQL CONTROL RACK DATA RACK

10 CONTROL RACK CONTROL NODE
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK CONTROL NODE SQL Client connections always go through the control node Contains no persistent user data Parallel Data Warehouse advantages: Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server processes final query plan and aggregates results Provided by DataDirect Open database connectivity (ODBC), object linking and embedding database (OLE DB), Java Database Connectivity (JDBC), and ActiveX® Data Objects (ADO.net) client drivers Wire protocol (SeQuel link) Drivers are available for 32 bits and 64 bits

11 CONTROL RACK MANAGEMENT NODE
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK MANAGEMENT NODE SQL Provides Support and Patching for the Appliance Holds image for re-deployment of compute node Holds Active Directory

12 DWLoader or SQL Server Integration Services
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK LANDING ZONE SQL Provides high-capacity storage for data files from ETL processes Is available as a sandbox for other applications and scripts that run on the internal network Provides SQL Server Integration Services Source Landing Zone Files Data Loader Compute Nodes DWLoader or SQL Server Integration Services

13 DATA RACK Data Rack Servers 10 active + 1 passive
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse DATA RACK SQL Data Rack Servers 10 active + 1 passive InfiniBand, FC and Ethernet switching Expansion Grow from 1–4 data racks, storage options, test/dev system Consists of COMPUTE NODES and STORAGE NODES

14 DATA RACK COMPUTE NODE Drives are configured as RAID 1
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse DATA RACK COMPUTE NODE Data Rack Servers 10 active + 1 passive InfiniBand, FC and Ethernet switching Expansion Grow from 1–4 data racks, storage options, test/dev system Each MPP node is a highly tuned symmetric multi- processing (SMP) node with standard interfaces Provides dedicated hardware, database, and storage Runs SQL Server Spare Node provides failover in case of node failure Drives are configured as RAID 1 SQL

15 CONTROL RACK BACKUP NODE Provides Integrated Backup Solution
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK BACKUP NODE SQL Provides Integrated Backup Solution Integrates with 3rd party backup option Orderable in different sizes

16 DATA RACK Drives are configured as RAID 1 COMPUTE NODE
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse DATA RACK COMPUTE NODE Data Rack Servers 10 active + 1 passive InfiniBand, FC and Ethernet switching Expansion Grow from 1–4 data racks, storage options, test/dev system Each MPP node is a highly tuned symmetric multi-processing (SMP) node with standard interfaces Provides dedicated hardware, database, and storage Runs SQL Server Spare Node provides failover in case of node failure Drives are configured as RAID 1 SQL

17 Data Layout Approaches
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse Data Layout Approaches Replicated A table structure exists as a full copy within each discrete Parallel Data Warehouse node. Distributed A table structure is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the database management system (DBMS). Ultra Shared-Nothing Provides the ability to design a schema of both distributed and replicated tables to minimize data movement between nodes. Small sets of data can be more efficiently stored in full (replicated). Certain set operations (such as single-node operations) are more efficient against full sets of data.

18 Ultra Shared-Nothing Architecture
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse Ultra Shared-Nothing Architecture Extends Traditional Shared-Nothing Design Pushes shared-nothing architecture into the SMP node—there is IO and CPU affinity within SMP nodes Eliminates contention for user queries Uses full resources for each user query Provides multiple physical instances of tables Distributes large tables Replicates small tables Redistributes rows as needed Provides Fault Tolerance All hardware components have redundancy (including CPUs, disks, networks, power, and storage processors) Control and compute nodes use failover clustering Management nodes have active and standby states

19 SQL Server 2008 R2 Parallel Data Warehouse Appliance Update 3
Cost Based Optimizer Improve Performance Collations and Stored Procedures Broaden Functionality Entry Appliances Expand Flexibility

20 Theme: Performance at Scale Cost-based optimizer
Goal: Generate better execution plans Functionality: Large space of execution alternatives explored Best alternative picked based on the costing Cost model that is sensitive to amount of data to be moved Benefits: Leverages existing SQL Server optimizer and years of development 10X or more performance improvement compared to AU2 Plan adaptable to heuristics change 3X 6X

21 Theme: Performance at Scale Zero data conversions
Goal: Eliminate CPU utilization spent on data conversions Functionality: Using ODBC instead of ADO.NET for reading and writing data Minimizing appliance resource utilization for data moves Benefits: Better resource, CPU, utilization 6x or more faster move operations, compared to AU2 * Improvement factor calculated based on PDW PGQL

22 Theme: Performance at Scale PDW entry appliance (”… for the right price …”)
Goal: Appliance for lower end of the market Functionality: ~40% less processing power (4+1 Compute Nodes) Up to 50TB disk capacity (4 Storage Arrays) Dell based hardware reference architecture Complete PDW functionality (no less, no more) Benefits: ~40% cheaper than 1 rack appliance The lowest cost/TB on the market Increased flexibility and choice (appliances for different needs)

23 Theme: SQL Server Compatibility Stored procedures
Goal: Common code encapsulation and reuse Functionality: System and user-defined stored procedures Invocation using RPC or EXECUTE Support for: control flow logic, input parameters Benefits: Enables common logic re-use Allows porting existing scripts Increases compatibility with SQL Server Syntax: CREATE { PROC | PROCEDURE } [dbo.]procedure_name     [  data_type } [ = default ]    ] [ ,...n ] AS { [ BEGIN ] sql_statement [;] [ ...n ] [ END ] } [;] ALTER { PROC | PROCEDURE } [dbo.]procedure_name [ data_type } [ = default ]    ] [ ,...n ] DROP { PROC | PROCEDURE } { [dbo.]procedure_name } [;] [ { EXEC | EXECUTE } ]     {       { [database_name.][schema_name.]procedure_name }         [{ value }] [ ,...n ]     } [;] { EXEC | EXECUTE }     ( | [ N ]'tsql_string' } [ + ...n ] ) [;]

24 Theme: Improved Integration Hadoop connector
Goal: Handle both structured and unstructured data Functionality: Bi-directional (import/export) interface between MSFT Hadoop and PDW Delimited file support Adapter uses existing PDW tools (bulk loader, dwsql) Data transfer to/from PDW Landing Zone node over FTP channel Benefits: Low cost solution that handles all the data Additional agility, flexibility and choice Hadoop HDFS Config file SQOOP based adapter HDFS Landing Zone Node Bulk Data Loader PDW agent PDW dwsql

25 Theme: Improved Integration
Examples: CREATE TABLE T ( c1 varchar(3) COLLATE traditional_Spanish_ci_ai, c2 varchar(10) COLLATE …) SELECT c1 COLLATE Latin1_General_Bin2 FROM T SELECT * FROM T ORDER BY c1 COLLATE Latin1_General_Bin2 Goal: Support local and international customers / data Functionality: Fixed server level collation User-defined column level collation Supporting all Windows collations Allow COLLATE clauses in Queries and DML Benefits: Store all the data in PDW w/ additional querying flexibility Existing DDLs and Query scripts SQL Server alignment and functionality

26 Distributed Architecture / Hub - Spoke
Fast Track SSRS Excel/Excel Services SharePoint SSIS PDW PerformancePoint Services SSAS Source Systems PowerPivot

27 Flexible Business Alignment
Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse Flexible Business Alignment Parallel database copy technology enables rapid data movement and consistency between EDW and data marts Supports user groups with very different service-level agreements (SLAs): Performance Capacity Loading Concurrency Create SQL Server 2012, Fast Track Data Warehouse for SQL 2012, and SQL Server Analysis Services Data Marts A distributed architecture gives you the flexibility to add or change diverse workloads or user groups while maintaining data consistency across the enterprise

28


Download ppt "Data Warehousing: SQL Server Parallel Data Warehouse AU3 update"

Similar presentations


Ads by Google