Presentation is loading. Please wait.

Presentation is loading. Please wait.

INNOVATIVE DATA MODELING Make Data Warehousing Cool Again

Similar presentations


Presentation on theme: "INNOVATIVE DATA MODELING Make Data Warehousing Cool Again"— Presentation transcript:

1 INNOVATIVE DATA MODELING Make Data Warehousing Cool Again
Leslie Weed, Architect, RevGen Partners INNOVATIVE DATA MODELING Make Data Warehousing Cool Again

2 Leslie Weed Architect, RevGen Partners All those Data things Colorado
While starting as an app developer 20 years ago I quickly navigated to the data space and have enjoyed every minute of it. Colorado Leslie Weed Love living right next to some of the best parts of the Rocky Mountains. Enjoying both sun and snow. Architect, RevGen Partners Data Modeling is Fun /leslieweedsql The best part of the job – organizing data and helping others organize their data for great performance and usage. @weederbug

3 Keyed Instance and Reference Tables
Hello Data Vault Tables and Rules Business Keys Raw and Business Layer Keyed Instance and Reference Tables Leslie

4 Problems in Data Warehouses
Takes too long to build Once it is up then it is hard to add or modify It simply hasn’t been maintained and is outdated No History/Archive/Storage Plan No well defined usage (datamart vs views vs tabular vs reporting) There has got to be a better way Leslie

5 Hello Data Vault

6 Ensemble Patterns Ensemble Focal Point Anchor Data Vault Your Style
DV 2.0 Hyper Agility Temporal Leslie Image concept from

7 Enterprise Data Warehouse
Sources Stage Data Marts STAGE Data Warehouse Raw BDV EDW Cubes Reports Leslie then Jeff aka decomposed modeling Different flavors of ensemble modeling Strong rules Image concept from

8 Enterprise Data Warehouse
Data Marts Sources Stage Data Lake STAGE Abstraction Layer Data Warehouse Raw BDV EDW Cubes Data Lake Reports Leslie then Jeff aka decomposed modeling Different flavors of ensemble modeling Strong rules Image concept from

9 Ensembles Player Defines an associated set of data
Holds the Core Business Concepts of Event, Person, Thing, Place, Concept Breaking a Unit of Work apart will cause associations between source system entities to be lost Game Season Leslie

10 Ensemble Modeling/Data Vault
The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Extremely Agile (iterative and incremental) in nature Strong in pattern for automated build and works well with BIML Leslie

11 Data Vault Better real time load capabilities - Mostly inserts
Pros Better real time load capabilities - Mostly inserts Incremental builds = Easy Provides Audit History and traceability The ability to respond to changes rapidly in your physical model Iterative development Keeping control of and reporting on data quality issues Leslie

12 Data Vault Cons It is suggested that the extra joins introduced with Data Vault modeling will impact query performance response: Depends on size, hardware, database and indexing strategy. Adhoc reporting is difficult response: Use views or other abstract layer concept Two data warehouses - twice the cost? response: By having well defined usage and purpose the longevity of the systems quickly outruns the cost of implementation, the BDV is NOT a full duplication of the RDV Leslie and Jeff

13 Tables and Rules

14 Terms you need to know and some rules
SK (or PK or SQN i.e. CustomerSK) = Surrogate Key LDTS = Load Date Time Stamp LEDTS= Load End Date Time Stamp RS = Record Source Leslie

15 Data Vault Objects Hubs – Ensemble Identifiers Links - Relationships Satellites – Descriptive information Leslie

16 Ensembles and Relationships
Link F(x) Records a history of the interaction Hub Sat F(x) Player Hub Sat F(x) Season Elements: Hub Link Satellite Hub Sat F(x) Game Image from LearnDataVault.com; Dan Linstedt

17 Hub A hub is based on an identifiable business element
Player Sat Sat Hub Sat A hub is based on an identifiable business element An identifiable business element is an attribute that is used in the source systems to locate data, otherwise known as a ensemble identifier The ensemble identifier has a very low propensity to change, and usually is not editable on the source systems Hubs are loaded first – they are the matcher

18 Example Finding the Ensemble Identifier
TEAM_ID TEAM_ABBREV TEAM_NAME TEAM_NICKNAME 323 Atl Atlanta Falcons 324 Buf Buffalo Bills 325 Hou Houston Texans 326 Chi Chicago Bears 327 Cin Cincinnati Bengals 329 Cle Cleveland Browns 331 Dal Dallas Cowboys 332 Den Denver Broncos 334 Det Detroit Lions 335 GB Green Bay Packers 336 Ten Tennessee Titans 338 Ind Indianapolis Colts 339 KC Kansas City Chiefs 341 Oak Oakland Raiders 343 StL St. Louis Rams 345 Mia Miami Dolphins 347 Min Minnesota Vikings 348 NE New England Patriots 350 NO New Orleans Saints 351 NYG New York Giants 352 NYJ Jets 354 Phi Philadelphia Eagles 355 Ari Arizona Cardinals 356 Pit Pittsburgh Steelers 357 SD San Diego Chargers 359 SF San Francisco 49ers 361 Sea Seattle Seahawks 362 TB Tampa Bay Buccaneers 363 Was Washington Redskins 364 Car Carolina Panthers 365 Jac Jacksonville Jaguars 366 Bal Baltimore Ravens Can be thought of as the business key

19 HUB Example TeamSK TeamNickName LDTS RS 1 Falcons 1/14/13 9:18 PM STATS Sports Database 2 Bills 3 Texans 4 Bears 5 Bengals 6 Browns 7 Cowboys 8 Broncos Ensemble Identifier, Business Key, Load Date, Record Source are mandatory All attributes in the business key are a UNIQUE Index NEVER directly join a HUB to another HUB table

20 Records a history of the interaction
Link A Link is an association of two or more business keys It is based on an identifiable business element relationships It can contain Hub keys and other Link keys A Link’s business key is a composite unique index Link Records a history of the interaction

21 Link Example Sequence Number, Business Key, Load Date, Record Source are mandatory The relationship shouldn’t change over time. It is established as a fact that occurred at a specific point in time and will remain that way forever TEAM_GAME SEASON TEAM (Opponent) TEAM HUB LNK HUB HUB TeamGameSK GameDate SeasonSK TeamSK OpponentSK LDTS RS 1 9/27/2012 33 6 32 1/15/13 7:11 PM STATS Sports Database 2 3 9/30/2012 18 4 5 11 7 13 25 8

22 Satellite A Satellite is based on a non-identifying business elements
Player Sat Sat Hub Sat A Satellite is based on a non-identifying business elements “Descriptive data” Satellite data changes, sometimes rapidly, sometimes slowly Satellites are separated by type of information and rate of change

23 SAT Example Satellite is dependent on the Hub or Link key as a parent
TeamSK LDTS STATSTeamID TeamAbbrev TeamName LEDTS RS 1 1/14/13 9:24 PM 323 Atl Atlanta NULL STATS Sports Database 2 324 Buf Buffalo 3 325 Hou Houston 4 326 Chi Chicago 5 327 Cin Cincinnati 6 329 Cle Cleveland 7 331 Dal Dallas 8 332 Den Denver 4/1/18 12:01 AM Donkeys 4/1/18 3:15 PM Upset Fan Satellite is dependent on the Hub or Link key as a parent The Satellite is never dependent on more than one parent table The Satellite is not a parent table to any other table Sequence Number, Business Key, Load Date, Load End Date, Descriptive Data and Record Source are mandatory

24 Business Key This Photo by Unknown Author is licensed under CC BY-NC-ND

25 One source – One UK SELECT CustomerId ,CustomerName FROM Customers
sp_help Customers CREATE TABLE [dbo].[HUBCustomer]( [CustomerSK] [smallint] IDENTITY(1,1) NOT NULL, [CustomerID] [int] NULL, [LDTS] [datetime] NULL, [RS] [varchar](150) NULL, CONSTRAINT [PK_HubCustomer] PRIMARY KEY CLUSTERED ( [CustomerSK] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY], CONSTRAINT [UK_HubCustomer] UNIQUE NONCLUSTERED [CustomerID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY]

26 More than one source – Multiple UK
All Columns CustomerID CompanyID CustomerType Concatenate columns Customerid|Companyid|CustomerType JSON

27 Hash Key Numerical representation of a column or set of columns that represent the uniqueness of a record

28 Keyed Instance

29 Reference Tables Referenced by SATS Not bound by FK
May or may not have history i.e. translations, translate keys, provide descriptive information

30 Loading the DV

31 Master Package Overview

32 Json for RecordSource Easy to Parse with built in server functions
Extends information about where, why or how of that record Use to report Data lineage Source Information Pretty much any tracking information you need! { "Source": { "DB": "Salesforce", "TBL": "Customer", "OriginalFieldName": "CUID" }, "ETL": { "JobName": "LoadSALES", "PartofStep": "LoadHubCustomer" }

33 Sat Temporal Tables Automation of creating the historical record
Replaces development time for History (Type 2) table creation Reduces code for loading data Easier to read current records Maintains that data audit trail with the history table BEST PART = the RDBMS is doing this for us!!

34 How it Works

35 Master Data

36 Enterprise Code Example
Data Vault Reporting Tools Data Marts Master Data Management System Enterprise Code “Chargers” Source 3 Source 1 Source 2 Codes “LAC” Codes “L.A. Chargers” Codes “LA Chargers” Codes “San Diego Chargers”

37 Data Vault Linking Codes
Enterprise Link Records Chargers <- LA Chargers Chargers <- L.A. Chargers Chargers <- LAC Chargers <- San Diego Chargers Same As Link Records LA Chargers <- L.A. Chargers LA Chargers <- LAC HubTeam Team SK LnkTeam_Enterprise  Team SK  Team SK ENT LnkTeam_SameAs  Team SK  Team SK SAS SatTeam_Source1 Team SK LDTS SatTeam_Source2 Team SK LDTS SatTeam_Source3 Team SK LDTS SatTeam_Enterprise team SK LDTS

38 Resources https://hanshultgren.wordpress.com/
Lots on Youtube

39 THANK YOU, SPONSORS! Rockstar Sponsors!

40 THANK YOU, SPONSORS! Gold Sponsors After Party Sponsor
Breakfast Sponsor

41 THANK YOU, SPONSORS! Silver Sponsors Bronze Sponsors

42 Hub Table Loading

43 Sat Table Loading

44 Link Table Loading


Download ppt "INNOVATIVE DATA MODELING Make Data Warehousing Cool Again"

Similar presentations


Ads by Google