Presentation is loading. Please wait.

Presentation is loading. Please wait.

Azure SQL Data Warehouse for SQL Server DBAS

Similar presentations


Presentation on theme: "Azure SQL Data Warehouse for SQL Server DBAS"— Presentation transcript:

1 Azure SQL Data Warehouse for SQL Server DBAS
June 2018 Warner Chaves SQL MCM/ Data Platform MVP

2 Thanks to our sponsors And
Global Gold Silver Bronze Microsoft JetBrains Rubrik Delphix Solution OMD

3 Bio DBA and Consultant for 11 years
Previously L3 DBA at HP in Costa Rica, now Principal Consultant at Pythian in Ottawa, Ontario. Microsoft Data Platform MVP. Blog: Sqlturbo.com Company: Pythian.com

4 Agenda Objective: cover Azure SQL Data Warehouse in a way that is easy to understand and adopt for SQL Server DBAs We will go over: Why Data Warehousing in the cloud? Service Cost and Model Fundamental differences with SQL Server Loading and querying data

5 Pre-requisites Sql Server experience. Basic Data Warehousing concepts.

6 Cloud vs Traditional Data Warehousing
Significant upfront investment Capacity is forecasted and fixed Client needs to manage the solution Static or semi-static software Client needs to complete the ecosystem Predictable recurring bill Dynamic capacity Solution managed by the provider Software in continuous improvement Tightly integrated with the rest of cloud services

7 So what is Azure SQL DW? Microsoft Azure Service
Successor to the on-premises appliance known as APS/PDW Targeted at running multi-TB Data Warehousing workloads It’s a PaaS service – DWaaS (AWS RedShift – Google BigQuery) It’s an MPP (Massively Parallel Processing) system Compute and storage are distributed and independent

8 SMP vs MPP Symmetric MultiProcessing Massively Parallel Processing

9 Azure SQL DW – Gen1 Azure Premium Storage Connection Client
Data Movement Service Connection Client Control Node Compute Nodes Distributions

10 Azure SQL DW – Compute Optimized (Gen 2)
Azure Premium Storage for files and Columnstore segments NVMe Cache Data Movement Service Control Node NVMe Cache Compute Nodes Distributions

11 Service Model Compute and Storage are scaled and billed separately
Compute is measured in Data Warehousing Units The DWU control the capacity of the Compute Nodes Storage is billed in 1TB increments The service allows you to PAUSE compute and stop getting charged for it

12 Backup and Recovery The service keeps backups for 7 days
A snapshot is made every 4 to 8 hours In case of DR, you can do a geo-restore to a ‘paired datacenter’ with the daily backup If you need to retain a copy for more than 7 days, right now the option is to do a restore and then pause compute so you’re only paying for storage (we’re hoping for improvements in this regard…)

13 How is the engine different from SQL Server?

14 Distribution Method Most important concept for good performance in Azure SQL DW. It determines the way ASDM will distribute the records in different buckets. There are three methods: HASH distribution Round-Robin distribution Replicated

15 Hash Distribution Same values end up in the same bucket.
If the distribution column is used in joins or for a Group By then no data movement is necessary. If a particular value is dominant in the table then a distribution can be overloaded compared to the other ones and lower system performance.

16 Overloaded distribution

17 Round Robin Distribution
ASDW simply does a Round-Robin over the records and puts each record in a different bucket. The values in the record don’t matter when assigning a bucket. Data movement is required for most operations. If a table doesn’t have a good HASH column and is too big to be a replicated table then this can be the best option. If a value is skewed, the distribution will still be uniform.

18 Replicated Distribution
The table is copied to each compute node. Recommended for tables smaller than 2GB. For smaller tables that are usually part of join predicates. For simple predicates like equality or inequality. The storage is table size X amount of compute nodes so don’t abuse it.

19 T-SQL Differences ASDW encourages the use of the CTAS (Create Table AS) construct Fully Parallel Logging is minimized Joins on UPDATE – DELETE not supported (there are workarounds) MERGE not supported (for now at least) Some of the complex data types are not present (geography, geometry, hierarchy, xml) Full list here:

20 DEMO: Portal and Metadata

21 Data Warehouse Design Good service to consider if your DW is at 1TB+ and growing. Default table type is Clustered Columnstore. Ideal columnstore segment is 1 million records (same as SQL Server). ASDW uses 60 Distributions. Fact Tables: Columnstores (optionally with Partitioning) with HASH distribution (if possible) Dimension Tables: B-Tree o Columnstore (if it’s a large dimension) HASH, Replicated or Round-Robin.

22 The thing about Partitioning Daily
365 partitions 60 distributions 21900 partitions 1 million is the ideal segment records

23 Partitioning usually at the weekly or monthly level if necessary

24 The thing about Partitioning Monthly
12 partitions 60 distributions 720 partitions 1 million is the ideal segment records

25 Data Loading Two ways of loading data: Control Node Methods
Through the Control Node PolyBase Control Node Methods SSIS BCP Loads from Blob Storage or Azure Data Lake Parallel multi-threaded load that does not go through the Control Node For large data loads the Control Node can become a bottleneck

26 DEMO: Loading data with PolyBase

27 Querying Data Azure SQL DW has some differences in terms of query execution. There are concurrency limits depending on the DWUs. There are transaction size limits per Distribution also based off DWUs. Each user gets assigned a resource class to determine how much compute they get. Some DMVs keep historical information. The use of Query Labels is recommended for troubleshooting and monitoring.

28 Query execution is queued if necessary
Concurrency Limits DWUs 100 200 300 400 500 600 1000+ Concurrent Queries 4 8 12 16 20 24 32 (Gen1) – 128 (Gen2) Query execution is queued if necessary

29 Memory assigned is per distribution. The classes can also be static.
Resource Classes CLASS SMALL MEDIUM LARGE X-LARGE Default X Memory 100MB Up to 3200MB Up to 6400MB Up to 12800MB Memory assigned is per distribution. The classes can also be static.

30 OPTION (LABEL = 'QuantitySum');
Query Label SELECT sum(Quantity) FROM FactTransactionHistory OPTION (LABEL = 'QuantitySum');

31 DEMO: Querying Data

32 Questions?

33 Thanks!!


Download ppt "Azure SQL Data Warehouse for SQL Server DBAS"

Similar presentations


Ads by Google