Download presentation
Presentation is loading. Please wait.
1
Business Intelligence
The Baker’s Dozen Business Intelligence 13 Productivity Tips for Data Warehouse Patterns in SQL Server/Business Intelligence Kevin S. Goff Microsoft SQL Server MVP
2
Kevin S. Goff: 30 seconds of Shameless Promotion
Developer/architect since 1987 / Microsoft SQL Server MVP Columnist for CoDe Magazine since 2004, “The Baker’s Dozen” Productivity Series”, 13 tips on a SQL/BI/DW topic Wrote a book, collaborated on a 2nd book Frequent speaker for SQL Server/SharePoint community events My site/blog: Launched a new SQL/BI webcast/radio show: Releasing some SQL/BI video courseware in 2014 12/2/2018 Data Warehousing w/SQL-BI
3
Data Warehousing w/SQL-BI
Overview/Objectives Today: 13 tips for Data Warehousing w/SQL and BI Tools Much of successful data warehousing is like running a medical practice, a legal practice, etc. – many best/recommended practices, many proven methodologies, many patterns, etc. CoDe Magazine article on this: If you have not read this book – go out and buy it NOW! Not tied to any one technology Written several years ago – about 99.9% is still as relevant today (third edition also out) Amazon link: Kimball Group Website 12/2/2018 Data Warehousing w/SQL-BI
4
Data Warehousing w/SQL-BI
Topics for today Goals of a data warehouse / analytic database Major components of a Data Warehouse (Facts and Dimensions) Cumulative Transactional Fact Tables Factless Fact Tables Periodic Snapshot Fact Tables Dimension Tables in General Role-Playing Dimensions Junk Dimensions Dimension Outriggers Many-to-Many Bridge Relationships Type 2 Slowly Changing Dimensions Storing NULL values in Fact Tables - DON'T!!! Storing Ratios in Fact Tables - DON'T!!! 12/2/2018 Data Warehousing w/SQL-BI
5
1-Goals of a Data Warehouse/Analytic Database
Customer (Manufacturer) wants to look at Customer Orders in Tons Material Production in Tons Defects (the count), Irregulars, damages, shortages Amount of Material Regraded in Tons Regraded % with respect to Regrade Thresholds Amount of Material Reworks in Tons And they want to look at these numbers “BY” Customer and Order Material Type, Name, Line, Size, Width, Thickness Responsible Department Defect Type Date/Week/Month/Quarter/Year Disposition Regrade Type What is our “end game?” You might build: A relational data mart/Data Warehouse using denormalized star-schema models according to the Kimball Methodology You might use Self-Service BI Tools (Power Pivot and Power View) for power users to “get at” the data You might create Analytic OLAP Cubes or SSAS Tabular Models from the Data Mart for more powerful/advanced analytics 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
6
1-Goals of a Data Warehouse/Analytic Database
Fact/Dimension Usage – Key Area Fact tables = measure groups Note that a single date serves many proposes (roles) Note that material production is related to Rep/Resp Units “through” a Unit to Material Line Bridge Table Note that a business unit serves as a Reporting or Responsible Unit 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
7
1-Goals of a Data Warehouse/Analytic Database
Common dimensionality 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
8
1-Goals of a Data Warehouse/Analytic Database
What do users want out of the data? 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
9
1-Goals of a Data Warehouse/Analytic Database
12/2/2018 Data Warehousing w/SQL-BI Back to TOC
10
1-Goals of a Data Warehouse/Analytic Database
Will need to calculate percentages of aggregated data – some rules about this later 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
11
1-Goals of a Data Warehouse/Analytic Database
All numbers are based on tallies (# of instances where dimension member values come together to form an event 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
12
1-Goals of a Data Warehouse/Analytic Database
12/2/2018 Data Warehousing w/SQL-BI Back to TOC
13
1-Goals of a Data Warehouse/Analytic Database
Calculation of week number based on current date 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
14
1-Goals of a Data Warehouse/Analytic Database
Lowest level of detail 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
15
1-Goals of a Data Warehouse/Analytic Database
People want drilldown!!! User might want to know – for these 29 regrades, what was all the detail underlying data? The database cube (and Excel) allow user to right-click on the measure (either the regrade tons or the count), and under “Additional Actions”, drill through to the lowest level. That will launch a 2nd Excel sheet (see next slide) with all underlying detail 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
16
1-Goals of a Data Warehouse/Analytic Database
Lowest level details for the 29 regrades in August for Caster/CC1 User can scroll out to the right for more details Drilldown can represent one of the most important actions a user will need. Don’t overlook this!!! 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
17
1-Goals of a Data Warehouse/Analytic Database
Fact tables represent “what happened” with measurements we can aggregate Shaping each business activity into Fact and related dimension structures Sometimes best to prototype a few at a time Key point in Dimensional Modeling is the relationships between fact and dimension tables. Sometimes very easy and clean, sometimes more complicated Dimension tables provide business context to fact tables. Loosely speaking, they are the “business master tables” 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
18
1-Goals of a Data Warehouse/Analytic Database
Role playing relationship – Date can serve multiple roles in Fact Table Self-join relationship (often seen in organization hierarchies 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
19
1-Goals of a Data Warehouse/Analytic Database
Each single fact table might seem easy, but things get more involved when we want to analyze measures across fact tables where dimensionality/grain is not identical 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
20
1-Goals of a Data Warehouse/Analytic Database
12/2/2018 Data Warehousing w/SQL-BI Back to TOC
21
1-Goals of a Data Warehouse/Analytic Database
Need trend-based calculations 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
22
1-Goals of a Data Warehouse/Analytic Database
Key Performance Indicators 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
23
1-Goals of a Data Warehouse/Analytic Database
Key Performance Indicators: need to define rules and Data for Thresholds/Goals 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
24
2-Major Components of a Data Warehouse
What’s the overall story? Everyone should have this 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
25
2-Major Components of a Data Warehouse
10 different companies could have 7 different variations Some companies might use OLAP (or other Analytic databases), others might not A data warehouse/data mart is a relational database – just might not be normalized In most Kimball databases, data is “flattened” and denormalized Topology-like picture…also needed Could be operational data store (ODS) in between Often this is reversed – data marts used to accumulate a DW, not vice-versa Possibly rules for what rows a user can see 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
26
2-Major Components of a Data Warehouse
Topology-like picture…also needed 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
27
2-Major Components of a Data Warehouse
Data warehouses consist of 2 main elements: Fact Tables and dimension Tables Again, these are relational tables Fact Tables contain measures that businesses aggregate/evaluate Dimension tables provide business context for the facts Loosely speaking, dimensions are often the “master tables” from OLTP systems Facts are related to dimensions in PK/FK relationships w/integer keys Big paradigm shift from OLTP/normalized platform The measures in a Fact Table have a common “grain” (dimension granularity) Joined with surrogate keys The process of identifying facts/dimensions and establishing direct (or indirect) relationships is what we call Dimensional Modeling 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
28
2-Major Components of a Data Warehouse
From Kimball methodology Data Warehouse Dimension Usage Matrix “BUS” architecture Reflects the fact tables across different business processes (“value chain”) and the intersection points with dimensions 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
29
2-Major Components of a Data Warehouse
So before you begin…. Make sure you’ve shaped your data into star-schema Fact/Dimension tables, using surrogate integer keys Fact tables ideally should only contain numeric measures (dollars, units sold) and foreign key integer values that relate to Business Dimension master tables Database engine features like xVelocity, Columnstore index can optimize these structures Recommend: use the Kimball Methodology Read this book, and read it again, and again! Data Warehouse Toolkit: Complete Guide to Dimensional Modeling 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
30
2-Major Components of a Data Warehouse
From Kimball methodology Data Warehouse Dimension Usage Matrix “BUS” architecture (Image above from the Ralph Kimball book) Key up-front deliverable: helps in communications regarding project management and technical design Let’s look at Fact Tables: (Cumulative Transaction, Factless, Periodic Snapshot) 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
31
3-Cumulative Transactional Fact Tables
Measures are fully additive by all related dimensions Each fact table must have a fully understood “grain statement” (level of detail, level of granularity) Sometimes facts are at a very low level (product sku, ship to account) or much higher (by region, by market, by month, etc.) Populated by ETL processes that run daily, or weekly, or monthly, or even throughout the day Some people might store an identity column in a fact table Best fact tables – only numeric data (Columnstore index in SQL 2012) 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
32
Data Warehousing w/SQL-BI
4-Factless Fact tables Special type of transactional fact table Each row represents an “event” (such as a person signing up for a course, sold by a sales rep) We are looking to tally the # of instances where dimensions come together In some databases, the roll-up of these tallies could be very critical!!! 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
33
Data Warehousing w/SQL-BI
4-Factless Fact tables Tally of Visits for a Health Care Provider By Year, Gender, Visit Type, Physician Group 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
34
Data Warehousing w/SQL-BI
4-Factless Fact tables 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
35
5-Periodic Snapshot Fact Tables
Populated on some interval/period Measures represent a “point in time” count or balance or value Unlike transactional fact tables, the measures in snapshot fact tables are SEMI-ADDITIVE Meaning, they can be rolled up by some dimensions, and maybe averaged across some dimensions But measures are NOT “fully-additive”, “full-aggregatable” End of month or end of period ETL processes to load these tables Variation of this, discussed in the Kimball methodology: accumulating snapshot fact tables Think of a mortgage fact table, with dollar values and dates for initial approval, underwriting approval, final approval 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
36
Many use script to create
6-Dimension Tables General contents of dimension tables Many use script to create Multiple hierarchies Fiscal as well Might be a range of values Represents the context by which users want to aggregate or “slice and dice” data Each dimension should have a surrogate integer key, a business key, one or more descriptions, and one or more attributes (that might form one or more parent-child hierarchy relationships) ETL rules to populate 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
37
Data Warehousing w/SQL-BI
6-Dimension Tables Snowflake dimension schemas As a general rule, build fact-dimension relationships in flat, de-normalized structures (star-schema) Sometimes, however, the repetition of data might be so high that you might make an exception and normalize one or more dimensions (snowflake schema based on dimension outrigger) Snowflake schemas are not “horrible”, but they can introduce complications (sometimes minor) in ETL processes and for end users who build reports against a snowflake model 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
38
7-Role-Playing Dimension Relationships
A single dimension key might serve multiple purposes, or “roles” in a fact table Example: an order might have an order date, a due date, a ship date, etc. No need to create 3 versions of a Date dimension – just one, with 3 relationships Products like Analysis Services will automatically create 3 “views” into the date Dimension Once had a client with SIX roles!!! 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
39
7-Role-Playing Dimension Relationships
Limitation in SSAS Tabular! Must either create three views of the Date dimension Or…Must extend the Tabular Model with DAX formulas: Sum of ShipSalesAmount Reseller := CALCULATE( sum( [SalesAmount]), userelationship( 'Date'[DateKey], ResellerSales[ShipDateKey])) 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
40
7-Role-Playing Dimension Relationships
SSAS OLAP handles natively One single date can be used as an Invoice Date, or a Paid Date, or a PO Date in a transactional spending table An account might be the invoicing account or the PO account A department might be the invoicing department or the PO department 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
41
This “works” but we need to maintain several small tables
8-Junk dimensions 840 rows 5 rows 4 rows This is cleaner, and users can still aggregate and slice Sales by any of the attributes 6 rows This “works” but we need to maintain several small tables If you have several dimensions that each contain a small # of rows, consider creating a Cartesian product No “absolute rule”, more a judgment call 7 rows 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
42
9-Dimension Outriggers
Straight from Ralph Kimball book Image above is straight from Kimball Dimensional Modeling book Might have millions of customers that fall into hundreds of counties Each county has a number of attributes specific to that county If we stored the county attributes directly in customer dimension, you’d have a large number of unique values that don’t vary much by customer This can happen, but isn’t common 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
43
10-Many-to-many Bridge Relationships
One of the more complicated model relationships Often involves rates or ratios Consider another example: A book could be written by multiple authors (who each contribute a %) An author can write multiple books 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
44
10-Many-to-many Bridge Relationships
Another example: we want to look at sales by day, but across different currencies for different countries where the rate varies by day 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
45
10-Many-to-many Bridge Relationships
Suppose the bridge table represents conversion rates from a base volumetric (Lbs) to other Units of Measure Fact Shipments cannot be sliced directly to Units of Measure - but it can be related to a fact table (FactUOMConversionRates) that is also related to a common dimension: product The bridge table permits us to take core shipments for a given product, and apply the conversion rate for whatever unit of measure we wish to view 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
46
10-Many-to-many Bridge Relationships
Limitation in SSAS Tabular! SSAS Tabular does not support bridge table relationships at all To express author dollars (using share), must implement DAX calculation: AuthorDollars := SUMX( 'DimBookPrice', CALCULATE( SUM( 'FactBookSales‘ [SalesDollars]) * SUM( 'BooksXAuthors‘ [AuthorShare])/ ) ) 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
47
10-Many-to-many Bridge Relationships
SSAS OLAP handles natively Regrades are by Resp/Rep Unit Material Production is by Line We want to look at Regrades and associated Production Tons We have a bridge table of M2M between Material Line and Unit (One Line can span many Units, a Unit can span many Lines) We tell relationship editor that we look at Material Product “by” Unit “through” the M2M bridge table. Now we have “common dimensionality” 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
48
11-Type 2 Slowly Changing Dimensions
When Dimension attributes change and we care about tracking history associated with the change, this is known as a Type 2 Slowly Changing Dimension In the book sales application, we might have a Fact Sales table. Want to track sales of a book based on its historical price point Any time a book price changes, we “retire” the dimension row that’s been the “current row” (by setting an end date), and then insert a new dimension row When we write out the fact row, we use the BookPK that’s “ in effect” at the time of the sales. So the surrogate key from the dimension is put into the fact table, based on the effective date of the sale with respect to the StartDate and EndDate from the dimension table. This allows the fact data to be easily joined to the correct dimension data for the corresponding effective date Allows us to report on sales by the book as a whole, or based on sales history of the book 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
49
11-Type 2 Slowly Changing Dimensions
Items to take into account when designing a type 2 SCD: Clearly defining the business process Accounting for all necessary columns & relationships in data model Capturing the change to the attribute (using database triggers, Change Data Capture, SSIS SCD task) Determining the correct dimension surrogate key to use, when populating the fact table Dealing with early and late arriving data (row is posted into a dimension table for a change that won’t take effect for another month….or late-arriving sales data that occurred months prior, where we need to determine the correct product PK “at that point in time” 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
50
12-Storing NULL values in Fact Tables – DON’T!!!
Suppose you have a Fact Table with sales measures, and a CostCenterFK (that relates to a costCenterPK in a CostCenter Master) Suppose that on 5% of the Sales rows, there is no Cost Center While databases will optionally permit it, you should NOT store a NULL for the Foreign Key! This is a very bad idea Instead, store an “Unclassified cost center” or an “N/A Cost Center” in the Cost Center Master (maybe with a key of -1) and use that value in the Master Table This allows users to aggregate sales by the valid cost centers and also see the sales where there was an “unused cost center” 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
51
13-Storing ratios in Fact Tables – DON’T!!!
In fact tables, you can store measures that are either fully additive (sales) or partly additive (end of month inventory count) You can also store measures that are derived from simple math (Net Revenue = Gross Revenue less Returns, less Damages, etc) But DON’T store measures that represent percentages or ratios that are derived from division – calculate them “on the fly”. Why? Because they won’t aggregate to any sensible value For Instance: Store A $10 in returns, $20 in sales (return % of 50%) Store B $10 in returns, $100 in sales (return % of 10%) We want to roll up the returns % for the region – but we obviously want a “weighted” returns %. Storing the returns % is meaningless when we want to aggregate/roll-up. Best practice: store the numbers that represent the numerator/denominator, and then calculate “on the fly” 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
52
Data Warehousing w/SQL-BI
Book recommendation The best book available on the Tabular Model Get this book – read it, read it again The “Three Wise Men” of SSAS Tabular and SSAS OLAP Just like Mosha Pasumansky was the MDX Expert…Chris Webb, Alberto Ferrari, and Marco Russo are the Tabular and DAX gurus Blogs for all 3 authors (have written a great deal on both SSAS OLAP and SSAS Tabular) Chris Webb's Blog Alberto Ferrari's Blog Marco Russo's Blog 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
53
Data Warehousing w/SQL-BI
Book recommendation Book recommendation for SSAS OLAP The best book available on SSAS Amazon link: Blogs for all 3 authors (have written a great deal on both SSAS OLAP and SSAS Tabular) Chris Webb's Blog Alberto Ferrari's Blog Marco Russo's Blog 12/2/2018 Data Warehousing w/SQL-BI Back to TOC
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.