Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS

Similar presentations


Presentation on theme: "ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS"— Presentation transcript:

1 ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS
Challenges at Statistics Netherlands ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS

2 Challenges Statistics Netherlands - 1
SDMX implementation concurrently to ESA2010 revision Focus on NA domain (not BoP, etc.) One system for all transmissions to international organisations, extendable to other domains Full NA domain SDMX 2.0 (Eurostat) and 2.1 (ECB) Different header requirements Validation on-site

3 Fall 2013: start project Two parallel implementation projects: SDMX converter Straightforward, easy to plan (strict deadline!) January 2014: implementation of the converter + Excel templates. September 2014: first delivery SDMX to IO’s, using converter SDMX-RI: Complex, IT intensive Spring-Summer 2014: experimentation with RI Autumn-winter 2014/2015: requirements + development RI tooling Spring-summer 2015: testing + further development Fall-winter 2015/2016: Deployment

4 Implementation problems
Data flow definitions: Some artefacts were missing Many only available in a test registry Different requirements for SDMX header specifics made by different international organisations ECB more stringent DSD matrix is needed for validation, but has no official status and contains some inconsistenties. Key set constraints As yet unavailable Validation tools still to be tested with those artefacts

5 ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS
Implementation of SDMX-RI at Statistics Netherlands ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS

6 Who are we? Statistics Netherlands National Accounts
Government Finance Statistics Harm Melief (NA): Statistics project leader; requirements and validation Vincent Ohm (GFS): requirements and testing Dick Windmeijer: RI-expertise and methodology Wilma Triepels: IT development (National accounts) Anne Reedijk: IT development (CBS IT department) Olav ten Bosch/Hans Beneker: IT-project leader.

7 SDMX at Statistics Netherlands
Goal: Controlled introduction of SDMX at Statistics Netherlands Duration: 2009 to present Strategy: SDMX for external data communication, not (yet) for internal processes As much as possible generic SDMX services / tools / processes Projects: Census Hub (SDMX-RI) ICT-Hub (pilot, SDMX-RI) Fishery, Waste, Trade, Education, etc. (mostly SDMX converter) STES / SDDS (SDMX-RI + Statistics Netherlands dissemination database StatLine) National Accounts (SDMX-RI)

8 National Accounts project
IT project at Statistics Netherlands: Business case (justification): Implement SDMX for National Accounts domain Using RI, converter supported until 2016 Basic assumptions General requirements (Business architecture) General functionality (Use case model) Specifications (Use cases) Development, testing, deployment

9 Basic assumptions One system for all transmissions to international organisations Exports both SDMX 2.0 (Eurostat) and 2.1 (ECB) Flexible with respect to header requirements Applicable for full NA domain Generic system reusable for other domains Local validation Using the RI software for generating SDMX files

10 Architecture Two interacting systems
DSD’s NA Output system Dissemination System Dataflows Data files from Specialists SDMX files Definitions Codelists Two interacting systems NA output system for collecting and validating all NA dataflows (domain specific) Dissemination tooling (generic) for: interpreting DSD’s to derive dataflow definitions and codelists Importing, mapping and SDMX-conversion

11 Why two systems? Choice was made to separate generic from specific functionality. Dissemination tooling: Does not use all RI components IT intensive  not for statisticians. Extendable over other domains (generic) NA Output database Contains data + (staged) validations Statistical expertise needed  not for IT people Domain specific

12 General requirements for IT-project
The following needs were defined: Delivery of SDMX files Technical validation of data Redelivery of data to correct mistakes Viewing data prior to SDMX conversion Implement changes in DSD’s Viewing older SDMX files (rejected) Administrative data, delivery times, versions, etc. Delivering meta-data to NA output system (added later-on )

13 Functionality Five (four + one) use cases defined:
Implementation of new DSD versions Importing data files into the dissemination system Generating SDMX files Viewing SDMX files, prior to export Exporting SDMX meta-data to NA-output system

14 Example of Use Case

15 ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS
CBS-System overview and demo ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS

16 Architecture revisited
DSD’s NA Output system Dissemination System Dataflows Data files from Specialists SDMX files Definitions Codelists Two interacting systems NA output system for collecting and validating all NA dataflows (domain specific) Dissemination tooling (generic) for: interpreting DSD’s to derive dataflow definitions and codelists Importing, mapping and SDMX-conversion

17 Design choices Separate generic (dissemination) from domain-specific statistics (output) tooling. NA Output system delivers dataflows already mapped to SDMX dimensions and codes These choices allow: Statistical validation in NA output system Automation of Dissemination system

18 Dissemination tool architecture

19 Dissemination system – RI components
Mapping assistant: Imports DSD Translates DSD information into mapping store Not used for (manual) mapping data to DSD Mapping Store database Separate database Stores DSD artefacts Contains mappings for all dataflows

20 Dissemination system – RI components
Webservice: Imports Mappings from mapping store Imports Dataflows from local repository Converts data to SDMX Webclient Used for inspection of converted data RI components may be accessed through SOAP or REST interface of the RI webservice

21 Dissemination system – Generic I/O shell
CBS mapping generator: Imports DSD + data flow definitions Generates dissimination DB from DSD artefacts Generates Data mapping: 1-on-1 Dissimination DB Generated into DSD format Contains dataflows imported from NA output database Built on SQL server/C#.NET and the REST interface.

22 Dissemination system – Generic I/O shell
Post processing: Defining header through standard RI tooling complex. Easier and more flexible solution: adjust header after conversion Important for deliveries to ECB Body of SDMX file not changed

23 Dissemination system – User interface
Controls I/O shell Export DSD metadata (e.g. codelists) to NA output database Manage data flows: Uploading from NA output database Monitoring properties (e.g. number of records) Initiating conversion and export of SDMX Allows specifying of header information Based on ASP.NET MVC WebApp and using WCF service

24 Dissemination system – Externals
Rest interface allows only SDMX 2.1 Command-line version of converter used to convert SDMX 2.1 to SDMX 2.0 Dissemination Database pulls data and delivers meta-data from NA Output database, not vice versa.

25 Demo

26 NA-output tool architecture
Two goals: Storing separate dataflows, delivered by the specialists Validating dataflows, using DSD information (codelist) Output database Keys validation Code list validation Other validations Collection Upload to dissimination system Specialist data

27 NA-output database - 2 SQL server database Storing unvalidated data
Validation scripts (stored procedures) Storing validated data Access/VBA user interface General process management Scripts for importing specialist data Running validation scripts

28 Keys Validation -1 Join Keylist to dataflow
Keys with no corresponding data indicate missing records Data with no corresponding keys indicate superfluous records Dataflow Keys Missing Match Superfluous

29 Keys Validation -2 Keys validation:
Each record in a SDMX data flow can be uniquely identified by the values of its relevant dimensions: Key Each dataflows uses a (relevant) subset of the DSD-dimensions Relevant dimensions are defined in the DSD matrix Correct dataflow consists of set of unique keys Output database contains stored procedures to confront the unvalidated data to its key-set. Missing records imply an incomplete transmission Extra records are superfluous

30 Code list validation Code validation: Fields in the NA DSD’s:
Values (OBS_VALUE) Generic format, i.e. datetime Categorial (using code lists) DSD links dimensions to code list (if applicable) Output database contains stored procedures to confront the unvalidated data to the linked codelists No plausibility checks, for instance not whether the country code matches the transmitting country.

31 Content validation Currently not implemented, planned for 2016
Monitoring progress and results of Validation TF Wish: Provide all Keylists in a central repository Some validation checks are planned: Country code (dimension in Key) Combinations Conf_Status and Embargo date Combinations Obs_Status and Obs_Value Build up the checks as more are defined by the TF.

32 Demo

33 Implementation problems
Data flow definitions: Some were missing Many only available in a test registry Different requirements on header specifics made by different international organisations ECB more stringent DSD matrix is needed for validation, but has no official status and contains some inconsistenties. Key constraints As yet unavailable Possibly not be used by validation task force.

34 Current / Future work Collaborate with Eurostat to fill in missing artefacts Data flow definitions DSD matrix Monitor work validation taskforce Possibly implement validation rules Extend to other domains, for instance ESSPROS


Download ppt "ESTP WORKSHOP ON SDMX IN NATIONAL ACCOUNTS"

Similar presentations


Ads by Google