Presentation is loading. Please wait.

Presentation is loading. Please wait.

Get the facts, or the facts will get you.

Similar presentations


Presentation on theme: "Get the facts, or the facts will get you."— Presentation transcript:

1 Get the facts, or the facts will get you.
And when you get them, get them right, or they will get you wrong. Dr. Thomas Fuller, Gnomologia, 1732 British physician ( )

2 Data Vault, The new Datawarehouse Supermodel
Martijn Evers Datawarehouse Architect Radboud University Nijmegen & President Dutch Data Vault User group. Presentatie: Vaak is een centraal Datawarehouse een hele opgave. De Radboud Universiteit heeft al jaren de wens om het bestaande Datawarehouse incrementeel en flexibel op te zetten, waarbij traditionele afbakeningen en opzet van het Datawarehouse vaak moeten worden losgelaten. Data Vault levert hier een duidelijk handvat voor het ontwerp en de incrementele ontwikkeling van een dergelijke Datawarehouse. Dit levert een nieuwe stap op weg naar een gestandaardiseerd, flexibel en efficiënt universiteitsbreed Datawarehouse. Daarbij worden ook veel nieuwe kennis en best practices op het gebied van o.a. Data Vault ontwikkelt.

3 Welcome Who is ME? My Job My Employer Data Vault Introduction
Martijn Evers AKA DM Unseen 15 yrs information system design & development Analyst , Architect or Consultant Married & Father of 2 boys My Job Datawarehouse Architect Design, Modeling, ETL, whatnot Independent Consultant at DataMasters(Unseen) Infrequent, Independent & specialized consultancy (Temporal) Data modeling, Architecture, DWH My Employer Radboud University Nijmegen Nijmegen: The oldest Dutch City Generic & Academic University Almost (satisfied) Students Has 20 research institutes Servers 121 educational programs Recent Nobel prize for Physics Datawarehouse Operational since 2005 Different revisions and additions Since 2008 working toward central architecture Data Vault Dan Linstedt’s, not the other backup solutions Research Presentations (Higher) Education Architecture & Design Modeling & Development (Online) Data Vault community President Dutch Data Vault User group Member of Data Vault Standard’s Board

4 Not enough for deploying a working Data Vault! Giving Directions
This presentation Basic Introduction Core Concepts Not enough for deploying a working Data Vault! Giving Directions Understanding Usability Further study Fun  Alas no demo’s  Contains bonus slides  Do ask questions!

5 Cosmology of Data warehousing Data Vault
Agenda Cosmology of Data warehousing Data Vault Modelling Loading Data Vault Considerations & Comparisons Example of a Data Vault Analysis & Transformation(METIS) Example DWH Data Vault Architecture Conclusion & wrap-up Cosmology of Data warehousing Stars, Constellations, Galaxies and ….? Data Vault Architecture Modelling Loading Benefits Data Vault Considerations & Comparisons Pro’s en Con’s vs.3NF, Dimensional Modelling Example of a Data Vault Analysis & Transformation(METIS) Example DWH Data Vault Architecture Business Data Vault Radboud University Fast track 2.0 Conclusion & wrap-up Conclusions Questions Further Reading

6 The Cosmology of Data warehousing
Sterren Aggregaten als Planeten Data Marts als Constellaties Galaxies of Melkwegen als (geconformeerde) Data Marts Waar is de Data Vault ? Star (schema’s) Aggregates as Planets Data Marts as Constellations Galaxies as (Conformed) Data Marts Where is the Data Vault ?

7 Sterren, Constellaties, en melkwegen als BI termen. Nu ook zwarte gaten
Superzwaar zwart gat in het centrum van elke Melkweg?, Die van onze Melkweg heet Sagitarius A en is 3 miljoen keer zwaarder dan onze zon.

8 Black Holes as INFORMATION VAULTS
Information paradox Holographic Universe Elementary Particles Event Horizon singularity Informatieparadox Informatie en deeltjes kunnen niet verdwijnen Waarnemingshorizon Zwart gat is een bol Licht ‘bevriest’ Holografisch principe Verschijnend Universum CERN vragen om nieuwe opslag (snaartheorie) Elementaire deeltjes Zwaartekracht trekt alles uit elkaar Singulariteit

9 Data Vault vs. Black Hole
Data is retained indefinitely  Vault  matter is trapped Temporal, Accessible  Information Holographic ‘Visible’ and Frozen Elementary facts Elementary  Elementary Particles Integration points  Integrated  Singularity Flexible, extensible Expandable  Expands on Matter/Information Central EDW  Central Point Spinning point of the Galaxy Data ‘Black Hole’ Once loaded, always loaded, no ‘Undo’ button Information Holographic ‘Visible’ and Frozen Temporal, information, time aware Elementary Save as ‘Raw’ elementary fact (types) Conceptual level Data Vault 6NF FCO-IM(t) Integrated Integration Points (Hubs) Spinning point of the Galaxy Center of Gravity

10 Historic Overview Created By Dan Linstedt Released in 2000
© (Linstedt, Graziano, & Hultgren, The New Business Supermodel, The Business of Data Vault Modeling, 2008, p. 36) Created By Dan Linstedt Released in 2000 Formally Introduced in the Netherlands in 2007 First DV Book: The Business of Data Vault Modeling 2008 First (Dutch) User group in 2010 Technical book from Dan Linstedt in 2011

11 ETL/Load architecture
Data Vault Components Modeling ETL/Load architecture Architectuur of Aanpak? Inmon Architectuur Kimball Architectuur Data Vault alleen voor centraal EDW niet voor Data Marts

12 ETL/Load Architecture
Kimball or Inmon ETL Complex ETL Truth oriented Business Rules before EDW Architectuur of Aanpak? Inmon Architectuur Kimball Architectuur Data Vault alleen voor centraal EDW KINSTEDT ETL/Laadarchitectuur 100% van de data (binnen scope) 100% van de tijd Brongedreven/Auditable Sjabloon/metadata gebaseerd Primaire laadproces: alleen simpele elementen ETL/Load Architecture 100% of the data (within scope) 100% of the time Source driven /Auditable: “Fact Oriented” Template/metadata driven No Business Rules Pictures: Dan Linstedt ©

13 Data Vault Architecture
Central EDW No Business Rules Incremental/Non destructive Loading 100% of the data (within scope) 100% of the time Auditable/Source Driven

14 Dualistic approach for central EDW
DWH source driven or demand driven? Source driven Goal oriented Neither may dominate! Brongedreven en doelgeoriënteerd Top Down architectuur en aanpak Kimball aanpak Klassieke Inmon aanpak Bottom Up architectuur en aanpak ODS Brongedreven Bronnen bepalen scope en gegevens Bronmodellen bepalen mede datamodel “Single version of the facts” Doelgeoriënteerd Doel bepaald mede scope Bedrijfsmodellen bepalen mede datamodel Bepalen architectuur en ontwikkeling BI Geen van beide mogen overheersen Doel overheerst volledig: Business Data Vault Bron overheerst: volledig Operationele Data Vault (Super ODS) Losgekoppelde bron en doel geeft flexibiliteit

15 Dualistic approach = realistic approach
No problematic assumptions Detailed approach Clear principles User visible Geen problematische aannames 1 versie van de Waarheid Juiste integratie Gehele gegevensverzameling Bedrijfsinformatiemodel Gedetailleerde aanpak Modeleren ETL Architectuur Duidelijke uitgangspunten (Sterke) Auditability Herhaalbaar Herstartbaar Schaalbaar Standaardiseren Robuustheid Zichtbaar voor gebruikers Feedback Bedrijfsregels

16 Modeling a Data Vault Primary Entity types HUB LINK SATELLITE
Unique list of business keys (customer number, order number, part number) LINK Unique list of business keys combinations SATELLITE Tracks associated attributes through time Secondary Entity Types Hierarchical LINK Transactional LINK Helper Tables PIT Bridge Legenda Based on pictures by Dan Linstedt

17 Data Vault Modelling Primary Entity types HUB LINK SATELLITE
Unique list of business keys (customer number, order number, part number) LINK Unique list of business keys combinations SATELLITE Tracks associated attributes through time Secondary Entity Types Hierarchical LINK Transactional LINK Helper Tables PIT Bridge

18 Metadata Load Templates Hub Link Satellite Loading Phases
Loading a Data Vault Metadata Load Templates Hub Link Satellite Loading Phases

19 Common Minimal Metadata
Load Sequence Data Vault ID dv_id , DV_SQN Load Date Time Stamp load_dts Load End Date Time Stamp load_dts_end (optional) Record Source – record_src

20 Loading a HUB INSERT INTO customer_hub (cust#,load_dts,record_src) FROM source_customer AS source WHERE NOT EXISTS (SELECT * FROM customer_hub AS hub WHERE hub.customer#=source.customer#) Pictures: Dan Linstedt ©

21 Loading a Link Pictures: Dan Linstedt ©

22 Link Load query INSERT INTO custcontact_link(cust_id,contact_id,load_dts, record_src) FROM source_table AS source INNER JOIN contact_hub AS contact ON contact. contact#= source.contact# INNER JOIN customer_hub AS cust ON cust. customer#= source.customer# WHERE NOT EXISTS (SELECT * FROM custcontact_link AS link WHERE link. contact_id= contact.id and link.cust_id= cust.id)

23 Loading a Satellite Pictures: Dan Linstedt ©

24 Satellite Load query INSERT INTO customer_sat (hub_id,load_dts, name,record_src) SELECT source.cust_name, FROM source_customer AS source INNER JOIN customer_hub AS hub ON cust.customer#= source.customer# # INNER JOIN customer_sat AS sat ON sat.id= hub.id# AND sat “Is most recent” AND sat.name <> source.name

25 Data Vault Loading Phases
Typical Load Phases of a Data Vault DWH Extracting Sources Staging Load Hubs Load Hub Satellites & Links Load Link satellites Load Dimensions Load Facts Loading in parallel where feasible Where possible ! Pictures: Dan Linstedt ©

26 Parallel Loading Synchronization Points/ Dependencies
Staging Hubs Hub Satellites Links Link Link on Link Link on Link Satellites Data Mart Feed Full/Partial Refresh Incremental loads Parallel loading, but not on same entity, except transactional links.

27 Geology of a Data Vault (Batch) Loading
Real Time/ Transactional Loading Micro Batch, Continuous Batch (Dayly) Batch Granularity of Loading Pictures: Dan Linstedt ©

28 Data Vault Considerations & Comparisons
Pro’s Con’s Versus 3NF Versus. Dimensional Modelling

29 Data Vault Pro’s Scalability Flexible Auditability Robustness
Restartable, Consistent Loading Patterns. Provides for Multi-Terabyte storage Generate ETL & Data model (be careful) Delta Driven Information Flexible Loading Auditability Rapid Build of Data Marts Handle combinations of different arrival speeds Easier Detection of “Dead Data” Flexible and incremental implementation & Deployment (Agile BI). Generation of Audit Trails Quality Feedback loops Truth vs. Facts Robustness Standardization Isolated Development Standard Implementation Architecture Restartable Loading Provides for Multi-Terabyte storage Easier Detection of “Dead Data” Delta Driven Information Generation of Audit Trails Standard Implementation Architecture Restartable, Consistent Loading Patterns. Rapid Build of Data Marts A Data Vault can handle any combination of different arrival speeds A Data Vault can handle in-database data mining operations, which can assign weights of relevance to the associations (link tables) between hubs. Extreme scalability both loading and storage Flexible and incremental implementation & Deployment (Agile BI). Quality Feedback loops Generate ETL & Data model (be careful) Truth vs. Facts

30 End-user Access & aggregation performance
Data Vault Cons End-user Access & aggregation performance Not friendly for direct exploration and user access Not conducive to today’s BI tools. Not conducive to OLAP processing. Requires firm Architect Business Keys Truth vs. Facts DV Standards Additional Layer Might require additional processing

31 End-user Access & aggregation performance
But… End-user Access & aggregation performance Semantical layers & Helper tables/views Segregation of storage & access Requires firm Architect Ignore at your own peril Business Keys Auditability Standardization Additional Layer Adds flexibility & robustness

32 Data Vault vs. 3NF Time Driven PK issues Parent-Child Complexities Cascading Change Impacts Difficult to load Not conducive to BI tools Not conducive to Drill-down Difficult to architect for an Enterprise Not conducive to Spiral/scope Many to Many Linkages Handle lots of information Tightly integrated information Highly structured Reasonably conducive to near-real time loads Relatively easy to extend

33 Data Vault vs. Star Schema
No Data mining. No Real-time loading. No ODS/Exploration Expensive updates (type 1,2 and 3) Inflexible modelling of basic elements like history, structure and key distribution Grain issues difficult to resolve High impact changes Latency Issues with late or early arriving facts Complex loading and changing of history Fails under very heavy loads Difficult to automate ... Good for Multi-Dimensional Analysis Subject Oriented Answers Excellent for Aggregation Points Less landing zones Great for Some Historical Storage Great for BI Tools Minimize data landing zones Not conducive to data mining. Not conducive to real-time loading. Can’t handle ODS or Exploration Warehouse Requirements Expensive updates (type 1,2 and 3) Inflexible combination of basic elements like history, structure and key distribution Grain issues difficult to resolve High impact changes Real-time loading impractical Issues with late or early arriving facts Complex loading and changing of history Begins to fail under very heavy loads Difficult to automate transformation & ETL generation Large Data volumes/large dimensions

34 Data Vault: Conclusion
Go ! Flexible/Agile approach Auditable/Historic Scalable Standardized/Automatable/Repeatable Robust/Stable/Dependable No Go? Experience/Familiarity No Direct Access Extra layer Data Modelling Go! Flexible/Agile approach Incremental Data Modelling Truth vs. Facts Loading Data Mart Generation Auditable/Historic Temporal information Scalable Data Querying Standardized/Automatable/Repeatable Standardization Templating Transformation Robust/Stable Simple Isolated Incremental Less testing/auditing Restartable No Go! Experience/Familiarity Training Practicing Researching No Direct Access Isolation benefit Semantical layer & Helper Tables Extra layer Should already be in place (more or less) Data Modelling Required anyway

35 University Research Publications Information System (METIS)
EXAMPLE University Research Publications Information System (METIS) Picture: Paul Kidby ©

36 Transforming a data model to a Data Vault in 5 easy steps
Create a working and complete source/business model(s) (“Technical-Functional” Model) Analyze and classify Keys & Columns Classify Entities and Relationships Combine information of step 3 & 4 Transform to a DV Create a working and complete source/business model(s) (“Technical-Functional” Model) Including FKs, PKs ,SKs & AKs Analyze and classify Keys & Columns Which PK’s become BK’s Which PK’s are actually SK’s Which key drives the Entity & Attributes? Find UOW to create LINK’s Classify Entities and Relationships Which Source entity will supply a Data Vault HUB Which Source entity will supply a Data Vault Link Which source entity has non key attributes to supply to satellites? Combine information of step 3 & 4 Transform to a DV BK’s to HUBs UOW to LINKs Attributes to SATs “Denormalize” LINK to LINK relationships

37 RESEARCH CONTRIBUTION EMPLOYMENT RESEARCH ORGANIZATION EMPLOYEE

38

39

40

41 Theorievorming Splitsen, Satellieten Regels Stabilisatie en homogeen Hub minimalisatieregel Externe (temporele) sleutels Hub eliminatieregel groeperen Entiteiten Onbekende waarden en tijdlijnen (Surrogaat) Key Satelliet Surrogaatsleutel analyse Afgeleide entiteiten Dummy deletes Overerving Integratie Vervlechtingen Feitintegratie Van-Naar Link Gedegenereerde satellieten en links FK links (links met een tijdlijn) Modelanalyse (Entiteits-) relaties Standaard aanpak voor Algoritmen Bronmodellen (Kostengebaseerde) Join Optimalisatie Business modellen Grain/Feitentabel optimalisatie Sterschema’s Model Segmentatie Data Vault modellen Modellen Bestaat uit Gestandaardiseerde Conceptuele DV Metadata Logische DV Fysieke DV Transformatie Formaliseren Visualisatie FCO-IM(t) Hulpmiddelen Temporalisatie PD invoegtoepassing Relatie(s) met Anchor Modeling etc. Input voor Analyse Evaluatie en evt. correctie van analyse ER Analyse Feitanalyse Centrale repository Transformaties Feedback eindgebruikers (Sterke) verankering Object naar rol (en terug) Rol attributen

42

43

44 A Data Vault oriented Datawarehouse Architecture
EXAMPLE A Data Vault oriented Datawarehouse Architecture Staging & CDC/Replication/Real Time/SOA feeds Central EDW Data Vault Core Business Rule Layer Non Source oriented & DV structured Business Rule results & calculations/aggregations Virtualized Data Mart Layer Star Schema’s encoded in semantical layers (UDM/BISM/views/Universes) None/Partial Physical star schema’s

45 Advanced Concept: Business Data Vault
Data Vault structured layer Centralization System Driven instead of Source Driven Performance Data Vault structured layer Hubs , Links & Satellites Incremental Auditable System Driven instead of Source Driven Complex Business Rules Aggregations Cleansing Centralization Central place for calculations Central place for DQ initiatives Traceability to Data Marts Performance Picture: Dan Linstedt ©

46 Datawarehouse Architecture
BI Apps: SAP-BO Central DWH Universe Staging (Optional) ( Temporal ) 3NF views (Virtual) Data Marts Data Vault Metadata gedreven ETL Views (PIT) BUSINESS DATA VAULT Business Rules Naar de Data Marts In een centrale repository Vrijwel altijd nodig Gebrekkige Datakwaliteit Komt voor in een aantal vormen Volledig (aparte) Business Data Vault Fysiek of Logisch Business Rule Vault Apart of geïntegreerd BusVault Kimball DWH bus in DV Formaat DBMS ARCHITECTURE SQL Server 2008 R2 Enterprise Editie Microsoft Fast –Track 2.0 DWH Architectuur met Data Vault SQL Server ‘Appliance’ Hardware+ configuratie ligt vast Geen indexen Geen updates Etc. etc. IO+data in elkaars verlengde Uiteindelijk geen IO bottleneck op seq. toegang tot data Appliance performance met SQL Server en standaard HW Virtuele Datamarts Views (PIT), laatste Business Rules (Vault) Reports Dimensional views BR Voyager OLAP Data Marts Business (Rule) Vault

47 SQL Server 2008 R2 Enterprise Edition
MS Fast Track 2.0/3.0 SQL Server 2008 R2 Enterprise Edition Microsoft Fast –Track 2.0/3.0 DWH Architecture met Data Vault Virtual Data marts Challenges Benefits Based SQL Server 2008 R2 Enterprise Edition Microsoft Fast –Track 2.0/3.0 DWH Architecture met Data Vault SQL Server ‘Appliance’ Hardware+ configuration fixed Light indexing Light on updates Etc. etc. IO+data paths Eliminate IO bottlenecks Virtual Data marts No indexes on DV Views to from Data Marts Business Rules (Vault) for specific transformations Challenges DV instead of star schema loading Benefits Extreme flexibility

48 Questions? Change Data Capture? Metadata? Fast Track? Theorie?
Anchor Oriented Modeling?

49 Information over Data Vault
Data Vault Book: Website creator:

50 Additional Information
Data Vault Generators BIReady: Quipu: Several others Blogs & Resources Facebook: datavaultdirectory Linkedin groups Data Vault Discussions, Temporal Data Modeling Dutch Data Vault Subgroup

51 LinkedIn: http://www.linkedin.com/in/dmunseen Twitter: DM_Unseen
Contact MSN/ LinkedIn: Twitter: DM_Unseen Blog: LinkedIn Group: Temporal Data Modeling Facebook: datavaultdirectory

52 Dutch Data Vault User group
HASTAGS: #NLDVGG #DDVGG Website: Windows Live: Facebook: datavaultdirectory Dutch Data Vault User group: Belgium Contact person: Yves Mulkes / BI-community.org

53 Understand selling points
Recap & Checklist Understand selling points Check out (online) Data Vault Resources Training/Coaching/Seminars Evaluate Understand architecture requirements Prototyping Consultancy Implement Small increments

54 Thank You! There is no escape with the modelling of stars.
You’ll use Data Vault if you did BI on Mars! Thank You!


Download ppt "Get the facts, or the facts will get you."

Similar presentations


Ads by Google