Presentation is loading. Please wait.

Presentation is loading. Please wait.

Get the facts, or the facts will get you. And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British.

Similar presentations


Presentation on theme: "Get the facts, or the facts will get you. And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British."— Presentation transcript:

1 Get the facts, or the facts will get you. And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British physician ( )

2 Data Vault, The new Datawarehouse Supermodel Martijn Evers Datawarehouse Architect Radboud University Nijmegen & President Dutch Data Vault User group.

3 Introduction Welcome Who is ME? My Job My Employer Data Vault

4 This presentation Basic Introduction -Core Concepts -Not enough for deploying a working Data Vault! Giving Directions -Understanding Usability -Further study Fun Alas no demo’s  Contains bonus slides Do ask questions!

5 Agenda Cosmology of Data warehousing Data Vault -Modelling -Loading Data Vault Considerations & Comparisons Example of a Data Vault Analysis & Transformation(METIS) Example DWH Data Vault Architecture Conclusion & wrap-up

6 Star (schema’s) -Aggregates as Planets Data Marts as Constellations Galaxies as (Conformed) Data Marts Where is the Data Vault ?

7

8 Information paradox Event Horizon Holographic Universe Elementary Particles

9 Data is retained indefinitely  Vault  matter is trapped Temporal, Accessible  Information Holographic  ‘Visible’ and Frozen Elementary facts  Elementary  Elementary Particles Integration points  Integrated  Singularity Flexible, extensible  Expandable  Expands on Matter/Information Central EDW  Central Point  Spinning point of the Galaxy Data Vault vs. Black Hole

10 Historic Overview © (Linstedt, Graziano, & Hultgren, The New Business Supermodel, The Business of Data Vault Modeling, 2008, p. 36) Created By Dan Linstedt Released in 2000 Formally Introduced in the Netherlands in 2007 First DV Book: The Business of Data Vault Modeling 2008 First (Dutch) User group in 2010 Technical book from Dan Linstedt in 2011

11 Data Vault Components Modeling ETL/Load architecture

12 ETL/Load Architecture -100% of the data (within scope) 100% of the time -Source driven /Auditable: -“Fact Oriented” -Template/metadata driven -No Business Rules Kimball or Inmon ETL -Complex ETL -Truth oriented -Business Rules before EDW Pictures: Dan Linstedt ©

13 Data Vault Architecture Central EDW No Business Rules Incremental/Non destructive Loading 100% of the data (within scope) 100% of the time Auditable/Source Driven

14 Dualistic approach for central EDW DWH source driven or demand driven? Source driven Goal oriented Neither may dominate!

15 Dualistic approach = realistic approach No problematic assumptions Detailed approach Clear principles User visible

16 Modeling a Data Vault Legenda Based on pictures by Dan Linstedt

17 Data Vault Modelling Primary Entity types -HUB -Unique list of business keys (customer number, order number, part number) -LINK -Unique list of business keys combinations -SATELLITE -Tracks associated attributes through time Secondary Entity Types -Hierarchical LINK -Transactional LINK Helper Tables -PIT -Bridge

18 Loading a Data Vault Metadata Load Templates -Hub -Link -Satellite Loading Phases

19 Common Minimal Metadata Load Sequence Data Vault ID dv_id, DV_SQN Load Date Time Stamp load_dts Load End Date Time Stamp load_dts_end (optional) Record Source – record_src

20 Loading a HUB Pictures: Dan Linstedt © INSERT INTO customer_hub (cust#,load_dts,record_src) FROM source_customer AS source WHERE NOT EXISTS (SELECT * FROM customer_hub AS hub WHERE hub.customer#=source.customer#)

21 Loading a Link Pictures: Dan Linstedt ©

22 Link Load query INSERT INTO custcontact_link(cust_id,contact_id,load_dts, record_src) FROM source_table AS source INNER JOIN contact_hub AS contact ON contact. contact#= source.contact# INNER JOIN customer_hub AS cust ON cust. customer#= source.customer# WHERE NOT EXISTS (SELECT * FROM custcontact_link AS link WHERE link. contact_id= contact.id and link.cust_id= cust.id)

23 Loading a Satellite Pictures: Dan Linstedt ©

24 Satellite Load query INSERT INTO customer_sat (hub_id,load_dts, name,record_src) SELECT FROM source_customer AS source INNER JOIN customer_hub AS hub ON cust.customer#= source.customer# # INNER JOIN customer_sat AS sat ON sat.id= hub.id# AND sat “Is most recent” AND sat.name <> source.name

25 Data Vault Loading Phases Pictures: Dan Linstedt © Where possible !

26 Parallel Loading Synchronization Points/ Dependencies Staging Hubs -Hub Satellites Links -Link Link on Link -Link on Link Satellites Data Mart Feed -Full/Partial Refresh -Incremental loads

27 Geology of a Data Vault (Batch) Loading (Dayly) Batch Real Time/ Transactional Loading Micro Batch, Continuous Batch Pictures: Dan Linstedt ©

28 Data Vault Considerations & Comparisons Pro’s Con’s Versus 3NF Versus. Dimensional Modelling

29 Data Vault Pro’s Scalability -Provides for Multi-Terabyte storage -Delta Driven Information -Loading Auditability -Easier Detection of “Dead Data” -Generation of Audit Trails -Quality Feedback loops -Truth vs. Facts Standardization -Standard Implementation Architecture -Restartable, Consistent Loading Patterns. -Generate ETL & Data model (be careful) Flexible -Rapid Build of Data Marts -Handle combinations of different arrival speeds -Flexible and incremental implementation & Deployment (Agile BI). Robustness -Isolated Development -Restartable Loading

30 Data Vault Cons End-user Access & aggregation performance -Not friendly for direct exploration and user access -Not conducive to today’s BI tools. -Not conducive to OLAP processing. Requires firm Architect -Business Keys -Truth vs. Facts -DV Standards Additional Layer -Might require additional processing

31 But… End-user Access & aggregation performance -Semantical layers & Helper tables/views -Segregation of storage & access Requires firm Architect -Ignore at your own peril -Business Keys -Auditability -Standardization Additional Layer -Adds flexibility & robustness

32 Data Vault vs. 3NF Many to Many Linkages Handle lots of information Tightly integrated information Highly structured Reasonably conducive to near-real time loads Relatively easy to extend Time Driven PK issues Parent-Child Complexities Cascading Change Impacts Difficult to load Not conducive to BI tools Not conducive to Drill- down Difficult to architect for an Enterprise Not conducive to Spiral/scope

33 Data Vault vs. Star Schema Good for Multi-Dimensional Analysis Subject Oriented Answers Excellent for Aggregation Points Less landing zones Great for Some Historical Storage Great for BI Tools Minimize data landing zones No Data mining. No Real-time loading. No ODS/Exploration Expensive updates (type 1,2 and 3) Inflexible modelling of basic elements like history, structure and key distribution Grain issues difficult to resolve High impact changes Latency Issues with late or early arriving facts Complex loading and changing of history Fails under very heavy loads Difficult to automate...

34 Data Vault: Conclusion Go ! -F-Flexible/Agile approach -A-Auditable/Historic -S-Scalable -S-Standardized/Automatable/Repeatable -R-Robust/Stable/Dependable No Go? -E-Experience/Familiarity -N-No Direct Access -E-Extra layer -D-Data Modelling

35 University Research Publications Information System (METIS) EXAMPLE Picture: Paul Kidby ©

36 Transforming a data model to a Data Vault in 5 easy steps 1.Create a working and complete source/business model(s) (“Technical-Functional” Model) 2.Analyze and classify Keys & Columns 3.Classify Entities and Relationships 4.Combine information of step 3 & 4 5.Transform to a DV

37

38

39

40

41

42

43

44 A Data Vault oriented Datawarehouse Architecture Staging & CDC/Replication/Real Time/SOA feeds Central EDW Data Vault Core Business Rule Layer Non Source oriented & DV structured Business Rule results & calculations/aggregations Virtualized Data Mart Layer Star Schema’s encoded in semantical layers (UDM/BISM/views/Universes) None/Partial Physical star schema’s EXAMPLE

45 Advanced Concept: Business Data Vault Data Vault structured layer System Driven instead of Source Driven Centralization Performance Picture: Dan Linstedt ©

46 Datawarehouse Architecture Business (Rule) Vault Business (Rule) Vault BI Apps: SAP-BO Data Vault ( Temporal ) 3NF views Dimensional views Reports Universe OLAP Data Marts (Virtual) Data Marts Voyager Central DWH Staging (Optional) Staging (Optional)

47 MS Fast Track 2.0/3.0 SQL Server 2008 R2 Enterprise Edition Microsoft Fast –Track 2.0/3.0 DWH Architecture met Data Vault Virtual Data marts Challenges Benefits

48 Questions? Anchor Oriented Modeling? Metadata? Change Data Capture? Theorie? Fast Track?

49 Information over Data Vault Data Vault Book: Website creator:

50 Additional Information Data Vault Generators -BIReady: -Quipu: -Several others Blogs & Resources -www.prudenza.nlwww.prudenza.nl -Facebook: datavaultdirectory Linkedin groups -Data Vault Discussions, Temporal Data Modeling -Dutch Data Vault Subgroup

51 Contact MSN/ LinkedIn: Twitter: DM_Unseen Blog: LinkedIn Group: Temporal Data Modeling Facebook: datavaultdirectory

52 Dutch Data Vault User group -HASTAGS: #NLDVGG #DDVGG Website: Windows Live: Facebook: -datavaultdirectory - Dutch Data Vault User group: Belgium -Contact person: Yves Mulkes / BI-community.org -

53 Recap & Checklist 1.Understand selling points -Check out (online) Data Vault Resources -Training/Coaching/Seminars 2.Evaluate -Understand architecture requirements -Prototyping -Consultancy 3.Implement -Small increments

54


Download ppt "Get the facts, or the facts will get you. And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British."

Similar presentations


Ads by Google