Presentation is loading. Please wait.

Presentation is loading. Please wait.

SQL Server as a Data Warehousing Platform

Similar presentations


Presentation on theme: "SQL Server as a Data Warehousing Platform"— Presentation transcript:

1 SQL Server as a Data Warehousing Platform

2 Sponsors Gold Sponsors: Bronze Sponsors: Swag Sponsors: 2 |

3 About me DW / BI Consultant @ Contacts 7 years professional experience
ASP.NET, C# Web Development ETL and Business Intelligence Contacts 3 |

4 Agenda Overview Basic Data Warehouse Architectures
Dimensional Modeling SQL Server DW Best Practices SSIS Performance Boost A quick glance at Microsoft BI Tools 4 |

5 OVERVIEW 5 |

6 Why Do I Need a Data Warehouse?
Where is my data? Physically stored across multiple platforms (potentially heterogeneous) Over different subsets of the organization’s data When did it changed? Online applications keep a very limited set of historical data How can I access my data? Online systems do not allow power users to access it How long would it take? Transactional applications are modeled to optimize the speed of inserts and updates. 6 |

7 What is a Data Warehouse?
Definition: A centralized store of business data that can be used for reporting and analysis to inform business decisions. Subject-orientated Integrated Time variant Non-volatile In support of management’s decisions 7 |

8 Online Applications vs. the EDW
Aspect Online Applications EDW environment User Front-office Personnel Back-office Personnel Function Day to Day Operations Decision Support Unit of work Short Transactions Complex Queries DB Design App-oriented, Normalized Subject-oriented, De-normalized Data Current, Detailed Historical, Summarized Usage Real Time Structured and Ad-hoc Access Read and Write Read Only # of Records Tens Millions # of Users Thousands Hundreds DB size 100 MB to 1 GB 100 GB to 10 TB Tools Applications specific Various reporting products. 8 |

9 “Hello, I am a data warehouse.”
Data Mart Definition Set of processes and structures that support business requirements that are specific to a business unit. “Hello, I am a data warehouse.” “And I am a data mart.” 9 |

10 BASIC Architectures 10 |

11 End-user access and applications
Hub and Spoke ‘You can catch all the minnows in the ocean and stack them together and they still do not make a whale.’ Bill Inmon Normalized, relational data warehouse, typically in 3NF(atomic data) Dependent data marts (mostly summarized data) Source Systems Data Acquisition Data Integration Data Warehouse Data Marts End-user access and applications 11 |

12 End-user access and applications
Data Warehouse Bus “The data warehouse is nothing more than the union of all the data marts” Ralph Kimball Dimensional data marts linked by conformed dimensions One mart is created for a single business process DW Bus Source Systems Data Acquisition Data Integration Data Marts End-user access and applications 12 |

13 Independent Data Marts
Data marts are loaded independently Data marts are not consolidated via DW bus Data Acquisition Data Integration Data Marts End-user access and applications Source Systems 13 |

14 How to Decide? Inmon’s Approach Independent DM Kimball’s Approach
Enterprise view Easy maintenance High initial cost Longer start-up time Kimball’s Approach Low initial cost Shorter start-up time Low reusability Difficult maintenance Independent DM Shortest start-up time Low cost Incompatibility issues Low reusability No enterprise view 14 |

15 Dimensional modeling 15 |

16 Cost Quantity Revenue Profit
Dimensional Model Dimensional Model Denormalized. Organized for understandability and ease of reporting rather than update Dimensions store properties for a specific entity Facts describe the business events regarding the entities Time Business questions Cost by product Order quantity by product Sales revenue by customer Profit by region Customer Product Line Cost Quantity Revenue Profit Region Product Salesperson 16 |

17 Star Schema Fact table in the center surrounded by dimensional tables (in the shape of star) Denormalized dimensions Simple queries Effortless design Optimized for fast aggregations Surrogate keys are used 17 |

18 Snowflake Schema Fact tables and dimensions are related to further dimensions Normalized dimension tables Complex levels of relationship Complex queries Optimized for less disk space usage Surrogate keys are used 18 |

19 Surrogate Keys Definition Advantages
Unique identifier that is used as a substitute for a natural key Preferably simple integer Advantages Eliminates the need for composite keys in dimension tables Optimizes the JOIN process Limits the impact caused by changes in natural key format 19 |

20 Dimensions Definition Keys Attributes Examples
Dimensions store properties for a specific entity No records are deleted. Instead, they are expired Keep data at its lowest level of granularity Keys The primary key is a surrogate key The business key is also retained Attributes Typically textual fields used for filtering and query result set labeling Examples Product, Employee, Geography, Date 20 |

21 Slowly Changing Dimensions (SCD)
Definition It is an optimization technique which leverages the knowledge that data changes very infrequently and stores data over time in a very efficient way Types: 0, 1, 2, 3, 4 and hybrids 21 |

22 SCD Types (0-2) Type 0: No effort has been made to deal with the issues Type 1: Keeps the latest version of any record. Changes overwrite the previous instance. Type 2: Keeps track of historical data by creating multiple records with different keys 22 | Example: Change of postal code from M3T 8L0 to M3T 8L1

23 SCD Types (3-6) Type 3: Changes are tracked by using separate columns. The number of changes is limited to the number of columns designated to store historical data. Type 4: This method uses two tables; one for the current records and the other for all or some of the changes Type 6: Uses a combination of 1, 2 and 3 23 | Example: Change of postal code from M3T 8L0 to M3T 8L1

24 Dimension Types Regular dimension Conformed dimensions Time dimensions
Degenerate dimensions Parent-child dimension  Snowflake dimension  Junk dimension Role-playing dimensions  Mini dimensions Inferred dimensions Monster dimensions Static dimension  Multi value dimension   Shrunk dimension  24 |

25 Conformed Dimensions Definition Advantages
Dimensions that are shared by two or more fact tables. Two dimensions are conformed when they are exactly the same, or one is a perfect subset of the other - Kimball Advantages Saving time and effort Fact tables sharing conformed dimensions can be joined together. Data consistency between data marts is ensured Conformed Dimensions Fact1 Fact2 Subject Area 1 Subject Area 2 25 |

26 Time Dimensions Definition Temporal hierarchies
Provide consistent granularity for temporal analysis and reporting Temporal hierarchies Year > Quarter > Month > Week Business-specific attributes Fiscal Periods Public Holidays 26 |

27 Facts and Measures Definition Keys Numeric measures
Facts describe the business events regarding the entities Keys FKs from all the dimensional tables in the star The PK is usually a composite key that contains dimension FK Numeric measures Additive Non-additive Semi-additive Degenerate Dimensions Factless Fact FactOrders CustomerKey SalesPersonKey TimeKey Quantity Cost Profit DiscountRate Additive Non-additive FactAccountTran CustomerKey AccountTypeKey CreditDebitAmount AccountBalance Semi-additive 27 |

28 Grain (Granularity) The grain represents the level of detail of a fact table The grain of the fact table should always represent the lowest level for each corresponding dimension. Create multiple fact tables if multiple grains are required Example: 28 |

29 SQL SERVER DW BEST PRACTICES
29 |

30 Indexing Dimension table indexing Fact table indexing
Business key Clustered index Surrogate key Non-clustered index Frequently searched columns Large dimensions Columnstore indexes Fact table indexing DISABLE before load, then REBUILD Identity column Time dimension key Combination of all the dimension FKs Clustered index Frequently searched dimension keys Non-clustered indexes Fact tables Columnstore indexes 30 |

31 Columnstore index Columnstore index Non-clustered Columnstore Index
Column-based in-memory data storage High compression rates Improves query performance Must be partition-aligned (on partitioned tables) Non-clustered Columnstore Index Read-Only (DISABLE and REBUILD or partition swap is required) Can be combined with other indexes on the table Clustered Columnstore Index (introduced in SQL Server 2014) Updatable The only index on the table Best for bulk-loaded tables Cannot be created on tables with XML, varchar(max) columns, etc... 31 |

32 Partitioning Definition Best practices Benefits
Splitting the data in a single table or index across multiple filegroups Best practices Avoid partitioning dimension tables Partition large fact tables (on a date key) Use partition-aligned indexed views Filter on the partitioning key (WHERE Date = ….)  Benefits Improved query performance (achieved by parallelism) Faster data loading and deletion (~ by sliding window approach) Increased backup and restore flexibility (~ by multiple filegroups) Manage indexes at the partition level 32 |

33 Efficient Initial Data Load
Recovery Model Simple Bulk-logged Populate the staging tables in parallel Use multiple BULK INSERT, BCP or SSIS tasks. Boost your INSERT INTO … SELECT performance Use TABLOCK or TABLOCKX query hints Avoid enforcing foreign key relationships Create foreign key constraints with NOCHECK option  33 |

34 SSIS Performance boost
34 |

35 SQL Server Integration Services
SSIS Feature of SQL Server Used for ETL (Extract Transform Load) Data Flow Engine Defines movement and transformation of data Control Flow Engine Controls the execution of the Data Flow tasks Executing SQL scripts Error Handling Looping etc… 35 |

36 SSIS Performance Tuning Tips
Optimize queries Select only the columns that you need Choose better performing components when possible LOOKUP components are faster than MERGE JOIN Use a SQL Server Destination instead of an OLE DB Destination Don’t use the SCD Transformation for large scale data Fine tune the components if applicable Set the IsSorted property (avoid unnecessary sorting) Use caching in your LOOKUP components Use FastLoad data access mode in OLE DB Destination Use Parallel Processing 36 |

37 SSIS Parallel Processing
Control Flow Parallelism ‘MaxConcurrentExecutables’ package property (default -1) Data Flow Parallelism ‘EngineThreads’ property (default 10) 37 |

38 Choosing between T-SQL and SSIS
Performance T-SQL is processed within the SQL engine (faster) SSIS tasks are processed in the SSIS memory space T-SQL MERGE is faster than SCD task in SSIS Multiple Heterogeneous Sources and Destinations SSIS is designed to work with different type of sources (easier) Features Some features only exist in either T-SQL or SSIS Skill Set Which one are you more familiar with? Ease of Maintenance SSIS is graphical 38 |

39 A quick glance at Microsoft BI TOOLS
39 |

40 Microsoft BI Reporting Tools
High Level Reporting. Dashboards. Target audience: Executives Performance Point High Volume Static reporting Target audience: End-users (daily operations). Managers. Reporting Services (SSRS) Self-Service Business Intelligence. Ad-hoc reporting Target audience: Business analysts and casual users Pivot Table (Part of Excel) Power Pivot (Excel add-in and SharePoint integration) Power View (Excel add-in and SharePoint integration) Power BI (Part of Office 365, Stand-Alone)

41 Happy Customers

42 THANK YOU FOR YOUR ATTENTION!
Any questions? 42 |

43 Reference list Books Websites Articles
Implementing a Data Warehouse with Microsoft SQL Server 2012, D. Sarka, M. Lah, G. Jerkič , © 2012 by SolidQuality Europe GmbH The Data Warehouse Toolkit Third Ed., © 2013 by R. Kimball, M. Ross Websites Articles

44 Sponsors Gold Sponsors: Bronze Sponsors: Swag Sponsors:


Download ppt "SQL Server as a Data Warehousing Platform"

Similar presentations


Ads by Google