Presentation on theme: "Get the facts, or the facts will get you. And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British."— Presentation transcript:
Get the facts, or the facts will get you. And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British physician ( )
Data Vault, The new Datawarehouse Supermodel Martijn Evers Datawarehouse Architect Radboud University Nijmegen & President Dutch Data Vault User group.
Introduction Welcome Who is ME? My Job My Employer Data Vault
This presentation Basic Introduction -Core Concepts -Not enough for deploying a working Data Vault! Giving Directions -Understanding Usability -Further study Fun Alas no demo’s Contains bonus slides Do ask questions!
Agenda Cosmology of Data warehousing Data Vault -Modelling -Loading Data Vault Considerations & Comparisons Example of a Data Vault Analysis & Transformation(METIS) Example DWH Data Vault Architecture Conclusion & wrap-up
Star (schema’s) -Aggregates as Planets Data Marts as Constellations Galaxies as (Conformed) Data Marts Where is the Data Vault ?
Information paradox Event Horizon Holographic Universe Elementary Particles
Data is retained indefinitely Vault matter is trapped Temporal, Accessible Information Holographic ‘Visible’ and Frozen Elementary facts Elementary Elementary Particles Integration points Integrated Singularity Flexible, extensible Expandable Expands on Matter/Information Central EDW Central Point Spinning point of the Galaxy Data Vault vs. Black Hole
Data Vault Architecture Central EDW No Business Rules Incremental/Non destructive Loading 100% of the data (within scope) 100% of the time Auditable/Source Driven
Dualistic approach for central EDW DWH source driven or demand driven? Source driven Goal oriented Neither may dominate!
Dualistic approach = realistic approach No problematic assumptions Detailed approach Clear principles User visible
Modeling a Data Vault Legenda Based on pictures by Dan Linstedt
Data Vault Modelling Primary Entity types -HUB -Unique list of business keys (customer number, order number, part number) -LINK -Unique list of business keys combinations -SATELLITE -Tracks associated attributes through time Secondary Entity Types -Hierarchical LINK -Transactional LINK Helper Tables -PIT -Bridge
Loading a Data Vault Metadata Load Templates -Hub -Link -Satellite Loading Phases
Common Minimal Metadata Load Sequence Data Vault ID dv_id, DV_SQN Load Date Time Stamp load_dts Load End Date Time Stamp load_dts_end (optional) Record Source – record_src
Link Load query INSERT INTO custcontact_link(cust_id,contact_id,load_dts, record_src) FROM source_table AS source INNER JOIN contact_hub AS contact ON contact. contact#= source.contact# INNER JOIN customer_hub AS cust ON cust. customer#= source.customer# WHERE NOT EXISTS (SELECT * FROM custcontact_link AS link WHERE link. contact_id= contact.id and link.cust_id= cust.id)
Satellite Load query INSERT INTO customer_sat (hub_id,load_dts, name,record_src) SELECT FROM source_customer AS source INNER JOIN customer_hub AS hub ON cust.customer#= source.customer# # INNER JOIN customer_sat AS sat ON sat.id= hub.id# AND sat “Is most recent” AND sat.name <> source.name
Data Vault Considerations & Comparisons Pro’s Con’s Versus 3NF Versus. Dimensional Modelling
Data Vault Pro’s Scalability -Provides for Multi-Terabyte storage -Delta Driven Information -Loading Auditability -Easier Detection of “Dead Data” -Generation of Audit Trails -Quality Feedback loops -Truth vs. Facts Standardization -Standard Implementation Architecture -Restartable, Consistent Loading Patterns. -Generate ETL & Data model (be careful) Flexible -Rapid Build of Data Marts -Handle combinations of different arrival speeds -Flexible and incremental implementation & Deployment (Agile BI). Robustness -Isolated Development -Restartable Loading
Data Vault Cons End-user Access & aggregation performance -Not friendly for direct exploration and user access -Not conducive to today’s BI tools. -Not conducive to OLAP processing. Requires firm Architect -Business Keys -Truth vs. Facts -DV Standards Additional Layer -Might require additional processing
But… End-user Access & aggregation performance -Semantical layers & Helper tables/views -Segregation of storage & access Requires firm Architect -Ignore at your own peril -Business Keys -Auditability -Standardization Additional Layer -Adds flexibility & robustness
Data Vault vs. 3NF Many to Many Linkages Handle lots of information Tightly integrated information Highly structured Reasonably conducive to near-real time loads Relatively easy to extend Time Driven PK issues Parent-Child Complexities Cascading Change Impacts Difficult to load Not conducive to BI tools Not conducive to Drill- down Difficult to architect for an Enterprise Not conducive to Spiral/scope
Data Vault vs. Star Schema Good for Multi-Dimensional Analysis Subject Oriented Answers Excellent for Aggregation Points Less landing zones Great for Some Historical Storage Great for BI Tools Minimize data landing zones No Data mining. No Real-time loading. No ODS/Exploration Expensive updates (type 1,2 and 3) Inflexible modelling of basic elements like history, structure and key distribution Grain issues difficult to resolve High impact changes Latency Issues with late or early arriving facts Complex loading and changing of history Fails under very heavy loads Difficult to automate...
Data Vault: Conclusion Go ! -F-Flexible/Agile approach -A-Auditable/Historic -S-Scalable -S-Standardized/Automatable/Repeatable -R-Robust/Stable/Dependable No Go? -E-Experience/Familiarity -N-No Direct Access -E-Extra layer -D-Data Modelling
Transforming a data model to a Data Vault in 5 easy steps 1.Create a working and complete source/business model(s) (“Technical-Functional” Model) 2.Analyze and classify Keys & Columns 3.Classify Entities and Relationships 4.Combine information of step 3 & 4 5.Transform to a DV
A Data Vault oriented Datawarehouse Architecture Staging & CDC/Replication/Real Time/SOA feeds Central EDW Data Vault Core Business Rule Layer Non Source oriented & DV structured Business Rule results & calculations/aggregations Virtualized Data Mart Layer Star Schema’s encoded in semantical layers (UDM/BISM/views/Universes) None/Partial Physical star schema’s EXAMPLE
Datawarehouse Architecture Business (Rule) Vault Business (Rule) Vault BI Apps: SAP-BO Data Vault ( Temporal ) 3NF views Dimensional views Reports Universe OLAP Data Marts (Virtual) Data Marts Voyager Central DWH Staging (Optional) Staging (Optional)
MS Fast Track 2.0/3.0 SQL Server 2008 R2 Enterprise Edition Microsoft Fast –Track 2.0/3.0 DWH Architecture met Data Vault Virtual Data marts Challenges Benefits
Questions? Anchor Oriented Modeling? Metadata? Change Data Capture? Theorie? Fast Track?
Information over Data Vault Data Vault Book: Website creator:
Additional Information Data Vault Generators -BIReady: -Quipu: -Several others Blogs & Resources -www.prudenza.nlwww.prudenza.nl -Facebook: datavaultdirectory Linkedin groups -Data Vault Discussions, Temporal Data Modeling -Dutch Data Vault Subgroup