Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production.

Similar presentations


Presentation on theme: "Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production."— Presentation transcript:

1 Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production Harry Goossens – Statistics Netherlands Head Data Service Centre / ESSnet Coordinator hct.goossens@cbs.nl

2 ESSnet Data Warehousing 2 The CSO Corporate Data Model (CDM) Underlying principle:4 datastores 1.INPUT-raw data 2.CLEAN UNIT-cleaned data 3.AGGREGATE-aggregated data 4.DISSIMINATION-published data  CDM was seen as ≈ active DWH

3 ESSnet Data Warehousing 3 The CSO Corporate Data Model (CDM) Main characteristics:  All (statistical) processes must use the 4 datastores  Processing systems interact on the data stores  At some moments: snap shots, which build next data store  It is possible to work further on the same (snap shotted) data store  Simultanious updating of / on data is mainly organisational issue

4 ESSnet Data Warehousing 4 The CSO Corporate Data Model INPUT CLEANED DATASETS AGGREGATE DATASETS DISSEMINATION DATA MANAGEMENT STORE ADMINISTRATIVE DATA CENTRE 2OPERATIONAL IMPLEMENTATIONS Surveys Admin data

5 ESSnet Data Warehousing 5 Data Management Store (DMS)  First implementation of CDM  Only survey data  Data tables are created and populated through the DMS applications.  Metadata must be entered as the data tables are created.  Metadata capturing = minimal  bottleneck  BR outside DMS (stand alone)

6 ESSnet Data Warehousing 6 CDM – Data Management Store DATA COLLECTION ACTIVITIES INPUT CLEANED DATASETS AGGREGATE DATASETS DISSEMINATION DMSDMS APP – layer, incl. I/O interfaces DMS meta layer – Basic descriptions SHARED INPUT SHARED CLEANED UNIT AGGREGATE STORE SNAPSHOTS BIBI SYS 1 SYS 2 SYS n Mainly surveys

7 ESSnet Data Warehousing 7 Administrative Data Centre (ADC)  Developed for organisational reasons  Only Admin data  A catalyst to exploit administrative data for statistical purposes  Interface with public authorities on admin data flows to CSO  Clearing house inside CSO for admin data  Data governance with respect to admin data

8 ESSnet Data Warehousing 8 Administrative Data Centre (ADC)  Has an analysis layer  R&D on available data  To develop new datasets  Without specific needs / demands from statistics

9 ESSnet Data Warehousing 9 CDM – Administrative Data Centre INPUT CLEANED DATASETS AGGREGATE DATASETS DISSEMINATION ADCADC ADC meta layer BIBI SYS 1 SYS 2 SYS n DATA COLLECTION ACTIVITIES SOURCES Data Products ETLETL ADC Front Door LEAN INTERFACE Only Admin Data

10 ESSnet Data Warehousing 10 Corporate Data Model CSO - Ireland DATA COLLECTION ACTIVITIES INPUT CLEANED DATASETS AGGREGATE DATASETS DISSEMINATION DMSDMS ADCADC APP – layer, incl. I/O interfaces DMS meta layer – Basic descriptions ADC meta layer SHARED INPUT SHARED CLEANED UNIT AGGREGATE STORE SNAPSHOTS BIBI SYS 1 SYS 2 SYS n DATA COLLECTION ACTIVITIES SOURCES Data Products ETLETL ADC Front Door LEAN INTERFACE

11 ESSnet Data Warehousing 11 The CBS Data Service Centre (DSC) The concept:  No data without metadata  Dedicated metadata model as basis  Strict distinction between:  Statistical data (facts & figures)  Conceptual metadata (definitions, description of quality,process activities etc.)  Steady states explicitly designed for re-use.  All metadata (of steady states) are generally accessible and are standardised as much as possible

12 ESSnet Data Warehousing 12 The CBS Data Service Centre (DSC) What is it ?  Fundamental corner stone of the CBS Business Architecture  Central ‘vault’ with Steady States, linking:  statistical data (facts & figures)  conceptual metadata (description)  technical metadata (user’s guide)\  Documentation  Implementation of the Dutch metadata model

13 ESSnet Data Warehousing 13 The CBS Data Service Centre (DSC) What offers it ? Generic services:  Metadata coordination  Centralised data distribution  Authorisation management  Automatic process interfacing (in developement)  Archiving of statistical dataset

14 ESSnet Data Warehousing 14 The CBS Data Service Centre (DSC) Why do we do it ?  Data-sharing / re-using data Intermediary, archive and distribution, CBS data-vault. Maximum efficient use of data en metadata  Process guarantee / security Safety net in case of calamity, static ‘froozen’ data  Process standardization Transparancy & efficiency  Coordination of metadata & classificaties One, single source with elements for the statistical process  Process chain support Steady States as data hubs  Generic process for data linking DSC structure enables linking datasets with equal object type

15 ESSnet Data Warehousing 15 CBS Business Architecture: Layers Strategy Design Chain management StatisticsProduction SteadyStates DSC - Data Storage DSC – Metadata Catalogue

16 ESSnet Data Warehousing 16 CBS Business Architecture: Steady States

17 ESSnet Data Warehousing 17 DSC: What are Steady States ?  A steady state is a dataset together with information for its correct interpretation.  Rectangular  Rows represent units (micro) or classes of units (macro)  Columns represent variables  Heading: population, time  Dataset design is like a template of a table: only borders and heading  1 Dataset design, n Datasets Data Service Centre - DSC

18 ESSnet Data Warehousing 18 DSC: Why Steady States ?  Reduce storage:  Store once  Re-use many times  Secure the statistical proces:  Each steady state is a guaranteed fall back point  Improve consistency:  Every following process uses the same dataset  Improve flexibility:  Enables independent, generic proces design

19 ESSnet Data Warehousing 19 Conclusions Both CSO & CBS  Use the same basic principle of 4 (static) stages/bases  had the same 'drivers' to start DWH: -re-use of data, -deconnecting input - output (= getting rid of stove pipes) CSO  strong focus on practical results, (succesfull) quick wins;  2 different implementations of the CDM  organisational driver for ADC CBS  Strong focus on metadata model  DSC = essential element of the business architecture  1 implementation supporting all processes

20 ESSnet Data Warehousing 20 Conclusions Regarding the DWH ESSnet  S-DWH architecture covers both best practices  ESSnet indicated right issues to focus: -metadata -role/position BR  strong desire for knowledge exchange, learning from other NSIs  CSO = very helpful best practice case  CSO acknowledges importance of ESSnet, wants to stay closely involved


Download ppt "Best practice case: Comparing the implementations of the Irish CDM and the Dutch DSC ESSnet on microdata linking and data warehousing in statistical production."

Similar presentations


Ads by Google