Data Warehousing: SQL Server Parallel Data Warehouse AU3 update

Slides:



Advertisements
Similar presentations
Database Architectures and the Web
Advertisements

NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN TechTalk Beste Skalierbarkeit dank massiv.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Microsoft Data Warehouse Vision Massive Scalability at Low Cost Improved Business Agility and Alignment Democratized Business Intelligence Hardware.
Doug Lanman Data Warehousing SSP North Central, Midwest and Heartland Districts SQL Server Data Warehousing.
High Performance Analytical Appliance MPP Database Server Platform for high performance Prebuilt appliance with HW & SW included and optimally configured.
Introduction to DBA.
Microsoft Ignite /16/2017 5:47 PM
Components and Architecture CS 543 – Data Warehousing.
Module – 7 network-attached storage (NAS)
Copying, Managing, and Transforming Data With DTS.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
© Hitachi Data Systems Corporation All rights reserved. 1 1 Det går pænt stærkt! Tony Franck Senior Solution Manager.
Dynamics AX Technical Overview Application Architecture Dynamics AX Technical Overview.
What is a database? Databases are designed to offer an organized mechanism for storing, managing and retrieving information.
Build it yourself Custom configurations High IT expertise “Cooking recipe” Probably higher success Can be ‘sold’ to customers Tied to HW vendor Very.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
SQL Server Warehousing (Fast Track 4.0 & PDW)
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information herein is subject to change without notice. HP Restricted. HP AppSystem for.
SQL Server Data Warehousing Overview
DBI332 ilikesql brianwmitchelll UNSTRUCTURED UNBALANCED UNPREDICTABLE.
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Enterprise Data Warehouse.
Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.
Criteria for D/W Platform Selection Simple Architecture –Easy to deploy the solution with minimal efforts Scalable (Scale Out - Scale Up) –Ability to handle.
Building BI Solutions with SQL Server PDW AU3 Ruwen Hess Senior Program Manager Microsoft Corporation DBI321.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Vorstellung Parallel.
VMware vSphere Configuration and Management v6
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Rushabh Mehta Managing Director (India) | Solid Quality Mentors
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
SMP MPP with PDW ** Workload requirements usually drive the architecture decision.
SQL Server 2008 R2 Parallel Data Warehouse: Under the Hood Brian Mitchell Senior Premier Field Engineer.
Apache Hadoop on Windows Azure Avkash Chauhan
…the secret sauce! Diagrams and video from Microsoft white papers and slide decks.
Video Security Design Workshop:
Data Platform and Analytics Foundational Training
Business Continuity & Disaster Recovery
Chapter 9: The Client/Server Database Environment
Database Architectures and the Web
The Client/Server Database Environment
Lead SQL BankofAmerica Blog: SQLHarry.com
Local Area Networks, 3rd Edition David A. Stamper
Consulting Services JobScheduler Architecture Decision Template
Network Load Balancing
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
The Client/Server Database Environment
The Client/Server Database Environment
SQL Server 2008 R2 – The Newest and the Best
Module – 7 network-attached storage (NAS)
Business Continuity & Disaster Recovery
Scalable SoftNAS Cloud Protects Customers’ Mission-Critical Data in the Cloud with a Highly Available, Flexible Solution for Microsoft Azure MICROSOFT.
Microsoft Analytics Platform System
A developers guide to Azure SQL Data Warehouse
Ch 4. The Evolution of Analytic Scalability
12/4/ :40 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
MANAGING DATA RESOURCES
Introduction to Teradata
Dana Kaufman SQL Server Appliance Engineering
Introduction of Week 11 Return assignment 9-1 Collect assignment 10-1
Database System Architectures
Moving your on-prem data warehouse to cloud. What are your options?
How Dell, SAP and SUSE Deliver Value Quickly
Presentation transcript:

Data Warehousing: SQL Server Parallel Data Warehouse AU3 update Dandy Weyn Sr. Technical Product Manager Microsoft Corporation @ilikesql

Fast Growing Industry and Enterprise Data.. Problem: DataWarehousing systems continue to grow at fast pace New types of large data sets and sources have emerged Data is not in uniform format and shape What is needed? A solution that: Scales from few TBs to PBs of data Allows adding capacity/power as needed Offers variety of choices tailored towards custom needs Handles all the data: Structured, semi-structured and unstructured Unicode and Non-Unicode

Microsoft Data Warehouse Offerings BDWA Fast Track Data Warehouse Effort to Build Very High Very Low Moderate Capacity Variable 5 TB 14 TB 20 TB 40 TB 80 TB 500 TB+ Concurrency Light Medium High Query Complexity

SQL Server | Appliances HP Enterprise Data Warehouse Appliance HP Business Data Warehouse Appliance Dell Parallel Data Warehouse Appliance HP Enterprise Database Consolidation Appliance HP Business Decision Appliance

SQL Server Parallel Data Warehouse Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse SQL Server Parallel Data Warehouse Tier-1 Enterprise Data Warehouse Appliance Offering High scalability from tens to hundreds of terabytes High performance through the MPP system Flexibility and Choice Choice of deployment options through distributed architecture Most Comprehensive Solution Complete data warehouse solution spanning desktop, enterprise data warehouse, and data marts

PDW – Client Connectivity Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse PDW – Client Connectivity SQL Client Drivers Support/Patching ETL Load Interface Corporate Backup Solution CONTROL RACK DATA RACK

Microsoft PDW Appliance – powered by Dell PowerEdge R610 Database Servers MD3620f Storage Nodes Control Nodes (R710) Active / Passive Client Drivers Management Servers (R610) Data Center Monitoring Dual Infiniband Dual Fiber Channel Landing Zone (R510) ETL Load Interface Backup Node (R710 and MD3600f w/MD1200’s) Corporate Backup Solution Spare Database Server Corporate Network Private Network

Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse PDW – Query Processing SQL ??? ??? ??? ??? ??? QUERY ??? ??? ??? ??? ??? ??? ??? CONTROL RACK DATA RACK

DATA RACK CONTROL RACK SQL Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse SQL CONTROL RACK DATA RACK

CONTROL RACK CONTROL NODE Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK CONTROL NODE SQL Client connections always go through the control node Contains no persistent user data Parallel Data Warehouse advantages: Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server processes final query plan and aggregates results Provided by DataDirect Open database connectivity (ODBC), object linking and embedding database (OLE DB), Java Database Connectivity (JDBC), and ActiveX® Data Objects (ADO.net) client drivers Wire protocol (SeQuel link) Drivers are available for 32 bits and 64 bits

CONTROL RACK MANAGEMENT NODE Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK MANAGEMENT NODE SQL Provides Support and Patching for the Appliance Holds image for re-deployment of compute node Holds Active Directory

DWLoader or SQL Server Integration Services Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK LANDING ZONE SQL Provides high-capacity storage for data files from ETL processes Is available as a sandbox for other applications and scripts that run on the internal network Provides SQL Server Integration Services Source Landing Zone Files Data Loader Compute Nodes DWLoader or SQL Server Integration Services

DATA RACK Data Rack Servers 10 active + 1 passive Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse DATA RACK SQL Data Rack Servers 10 active + 1 passive InfiniBand, FC and Ethernet switching Expansion Grow from 1–4 data racks, storage options, test/dev system Consists of COMPUTE NODES and STORAGE NODES

DATA RACK COMPUTE NODE Drives are configured as RAID 1 Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse DATA RACK COMPUTE NODE Data Rack Servers 10 active + 1 passive InfiniBand, FC and Ethernet switching Expansion Grow from 1–4 data racks, storage options, test/dev system Each MPP node is a highly tuned symmetric multi- processing (SMP) node with standard interfaces Provides dedicated hardware, database, and storage Runs SQL Server Spare Node provides failover in case of node failure Drives are configured as RAID 1 SQL

CONTROL RACK BACKUP NODE Provides Integrated Backup Solution Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse CONTROL RACK BACKUP NODE SQL Provides Integrated Backup Solution Integrates with 3rd party backup option Orderable in different sizes

DATA RACK Drives are configured as RAID 1 COMPUTE NODE Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse DATA RACK COMPUTE NODE Data Rack Servers 10 active + 1 passive InfiniBand, FC and Ethernet switching Expansion Grow from 1–4 data racks, storage options, test/dev system Each MPP node is a highly tuned symmetric multi-processing (SMP) node with standard interfaces Provides dedicated hardware, database, and storage Runs SQL Server Spare Node provides failover in case of node failure Drives are configured as RAID 1 SQL

Data Layout Approaches Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse Data Layout Approaches Replicated A table structure exists as a full copy within each discrete Parallel Data Warehouse node. Distributed A table structure is hashed on a single column and uniformly distributed across all nodes on the appliance. Each distribution is a separate physical table in the database management system (DBMS). Ultra Shared-Nothing Provides the ability to design a schema of both distributed and replicated tables to minimize data movement between nodes. Small sets of data can be more efficiently stored in full (replicated). Certain set operations (such as single-node operations) are more efficient against full sets of data.

Ultra Shared-Nothing Architecture Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse Ultra Shared-Nothing Architecture Extends Traditional Shared-Nothing Design Pushes shared-nothing architecture into the SMP node—there is IO and CPU affinity within SMP nodes Eliminates contention for user queries Uses full resources for each user query Provides multiple physical instances of tables Distributes large tables Replicates small tables Redistributes rows as needed Provides Fault Tolerance All hardware components have redundancy (including CPUs, disks, networks, power, and storage processors) Control and compute nodes use failover clustering Management nodes have active and standby states

SQL Server 2008 R2 Parallel Data Warehouse Appliance Update 3 Cost Based Optimizer Improve Performance Collations and Stored Procedures Broaden Functionality Entry Appliances Expand Flexibility

Theme: Performance at Scale Cost-based optimizer Goal: Generate better execution plans Functionality: Large space of execution alternatives explored Best alternative picked based on the costing Cost model that is sensitive to amount of data to be moved Benefits: Leverages existing SQL Server optimizer and years of development 10X or more performance improvement compared to AU2 Plan adaptable to heuristics change 3X 6X

Theme: Performance at Scale Zero data conversions Goal: Eliminate CPU utilization spent on data conversions Functionality: Using ODBC instead of ADO.NET for reading and writing data Minimizing appliance resource utilization for data moves Benefits: Better resource, CPU, utilization 6x or more faster move operations, compared to AU2 * Improvement factor calculated based on PDW PGQL

Theme: Performance at Scale PDW entry appliance (”… for the right price …”) Goal: Appliance for lower end of the market Functionality: ~40% less processing power (4+1 Compute Nodes) Up to 50TB disk capacity (4 Storage Arrays) Dell based hardware reference architecture Complete PDW functionality (no less, no more) Benefits: ~40% cheaper than 1 rack appliance The lowest cost/TB on the market Increased flexibility and choice (appliances for different needs)

Theme: SQL Server Compatibility Stored procedures Goal: Common code encapsulation and reuse Functionality: System and user-defined stored procedures Invocation using RPC or EXECUTE Support for: control flow logic, input parameters Benefits: Enables common logic re-use Allows porting existing scripts Increases compatibility with SQL Server Syntax: CREATE { PROC | PROCEDURE } [dbo.]procedure_name     [ { @parameter  data_type } [ = default ]    ] [ ,...n ] AS { [ BEGIN ] sql_statement [;] [ ...n ] [ END ] } [;] ALTER { PROC | PROCEDURE } [dbo.]procedure_name [ { @parameter data_type } [ = default ]    ] [ ,...n ] DROP { PROC | PROCEDURE } { [dbo.]procedure_name } [;] [ { EXEC | EXECUTE } ]     {       { [database_name.][schema_name.]procedure_name }         [{ value | @variable }] [ ,...n ]     } [;] { EXEC | EXECUTE }     ( { @string_variable | [ N ]'tsql_string' } [ + ...n ] ) [;]

Theme: Improved Integration Hadoop connector Goal: Handle both structured and unstructured data Functionality: Bi-directional (import/export) interface between MSFT Hadoop and PDW Delimited file support Adapter uses existing PDW tools (bulk loader, dwsql) Data transfer to/from PDW Landing Zone node over FTP channel Benefits: Low cost solution that handles all the data Additional agility, flexibility and choice Hadoop HDFS Config file SQOOP based adapter HDFS Landing Zone Node Bulk Data Loader PDW agent PDW dwsql

Theme: Improved Integration Examples: CREATE TABLE T ( c1 varchar(3) COLLATE traditional_Spanish_ci_ai, c2 varchar(10) COLLATE …) SELECT c1 COLLATE Latin1_General_Bin2 FROM T SELECT * FROM T ORDER BY c1 COLLATE Latin1_General_Bin2 Goal: Support local and international customers / data Functionality: Fixed server level collation User-defined column level collation Supporting all Windows collations Allow COLLATE clauses in Queries and DML Benefits: Store all the data in PDW w/ additional querying flexibility Existing DDLs and Query scripts SQL Server alignment and functionality

Distributed Architecture / Hub - Spoke Fast Track SSRS Excel/Excel Services SharePoint SSIS PDW PerformancePoint Services SSAS Source Systems PowerPivot

Flexible Business Alignment Microsoft and Hewlett-Packard HP Enterprise Data Warehouse Appliance Optimized for Microsoft SQL Server 2008 R2 Parallel Data Warehouse Flexible Business Alignment Parallel database copy technology enables rapid data movement and consistency between EDW and data marts Supports user groups with very different service-level agreements (SLAs): Performance Capacity Loading Concurrency Create SQL Server 2012, Fast Track Data Warehouse for SQL 2012, and SQL Server Analysis Services Data Marts A distributed architecture gives you the flexibility to add or change diverse workloads or user groups while maintaining data consistency across the enterprise