Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data.

Similar presentations


Presentation on theme: "Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data."— Presentation transcript:

1 Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data structure definitions and data file implementation 29 November 2007

2 Slide 2 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 A - Introduction

3 Slide 3 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Purpose of the training session Provide understanding of the basic SDMX principles (DSD and Dataset Implementation) Provide knowledge to the SDMX Standard and its XML implementation Present ESTAT tools as case studies illustrating their scope and usage

4 Slide 4 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Current practices Current practices on data and metadata exchange: –Legal Framework ( Commission Regulations, Council Regulations, etc. ) –Data and metadata files, questionnaires, quality reports, etc. –Format ( paper form, EDIFACT, XML, Structured Files, etc. ) –Media ( , file upload, Web-form, removable media, dial-up, etc. )

5 Slide 5 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 The need for a standard… Enhance electronic data and metadata exchange Enhance availability of statistical data and metadata information for the users Promote interoperability between different systems Improve the quality of transmitted data ( Timeliness & Punctuality, Accessibility & Clarity, Accuracy, Comparability )

6 Slide 6 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX (Statistical Data and Metadata eXchange) Initiative on the s tandardisation of the statistical data and metadata exchange process. 7 Sponsors ( BIS, ECB, ESTAT, IMF, OECD, UN, WB ) “Push” and “pull” mode Use of XML technologies to promote interoperability Basic principles:  Data Structure Definitions (DSD) & Metadata Structure Definitions (MSD)  SDMX registries  Data on the WEB using SDMX

7 Slide 7 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX (cont.) Exchange and Sharing of statistical information –Statistical data –Statistical metadata Structural metadata Reference metadata Emphasis on macro-data (aggregated statistics) Promotes a “data sharing” model –low-cost –high-quality of transmitted data –interoperability between (otherwise) incompatible systems

8 Slide 8 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 B – SDMX Core Elements SDMX Training 29 November 2007

9 Slide 9 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 EXAMPLE DATASET1

10 Slide 10 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 EXAMPLE DATASET2

11 Slide 11 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX Information Model The SDMX Information Model (SDMX-IM) is a conceptual model from which syntax specific implementations are developed. The SDMX-IM provides for the structuring not only of data, but also of “reference” metadata! The model is constructed as a set of structures which assist in the understanding, re-use and maintenance of the model. –Data Structure Definition and Metadata Structure Definition –Dataflows - Datasets –Data Provisioning –…

12 Slide 12 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Structures in the SDMX-IM StructureComponents Concept SchemeConcept Code ListCode Category SchemeCategory Organisation Scheme Organisation Organisation Role - DataProvider - DataConsumer - MaintainanceAgency Data Structure Definition (DSD) Dimensions Attributes Measures Groups

13 Slide 13 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Structures in the SDMX-IM (cont.) Fundamental parts: 1.Structural metadata ( DSD, concepts, code lists ) 2.Observational data ( organised set of numeric observations ) 3.Reference metadata Definitions: Data Structure Definition (DSD): set of structural metadata needed to understand the dataset structure Dataflow Definition: a description of the dataset which identifies, categorises and constraints the allowable content of the dataset Dataset: –an organised collection of statistical data –the ‘container’ of a Data Flow Definition for an instance of the data.

14 Slide 14 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Structures in the SDMX-IM (cont.) Code lists – Codes: list of predefined values to be used within the DSD –Codelists enumerate a set of values to be used in the representation of several structural components of SDMX. Concept Schemes – Concepts: a statistical characteristic used within a DSD –Additional properties can be defined for concepts: Provide Name/Description in various locales Assign default representation (coded or uncoded) Define semantic hierarchies of concepts Category Schemes – Categories: Category schemes are made up of a hierarchy of categories (subject matter domains), which in SDMX may include any type of useful classification for the organization of data and metadata –A Dataflow may be linked to many Categories

15 Slide 15 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 DSD components Dimension (e.g. frequency, reference area): –Classificatory variable used for identification of subsets or single observations –Definition of the key descriptor for reporting Datasets Attribute (e.g. title, observation status): –Add additional metadata about the observations –Can be attached at four possible levels (Observation, Time Series / Cross-Sectional data, Group, Data Set) Measure (e.g. turnover index, # of births, # of deaths): –Data (uncoded / unclassified) that can be reported (The observation value) –Primary (Time Series) or Cross-Sectional (Cross-sectional data) Groups: –Grouping of dimensions in order to attach group attributes (e.g. sibling group)

16 Slide 16 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Data Structure Definition Examples: –Time Series dataset STS domain: Turnover Index for Retail Trade and repair DSD –Cross-Sectional dataset Demography domain: Rapid questionnaire DSD

17 Slide 17 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 STS Sample Dataset Dimensions Measure Attributes Dimensions

18 Slide 18 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 STS DSD components Dataflow: STSRTD_TURN_M

19 Slide 19 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Demography Sample Dataset Measures Dimensions Attributes

20 Slide 20 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Demography DSD components Dataflow: DEMOGRAPHY_RQ

21 Slide 21 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Data Provisioning A Data Provider can provide data/metadata for many Dataflows using an agreed data structure. Dataflows may incorporate data coming from more than one Data Provider. Provision Agreement  which data providers are supplying what data to which data flows. The Dataflow may be linked to 1 or more Categories (subject matter domains) from different Category Schemes.

22 Slide 22 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Identification, Versioning & Maintenance Identification: every structural element must have a semantic identifier (e.g. CL_UNIT) Versioning: a specific element may have different versions (updates of the element) Maintenance: some structures must be maintained by an organisation –Unique identification: id+version+agency id: CL_UNIT version:1.0 agency: ESTAT id: CL_UNIT version:1.0 agency: ECB Internationalization: the use of multiple languages for describing any element SDMX-IM covers aggregate data and metadata in all domains (not domain-specific)

23 Slide 23 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX High level View Category Scheme Data or Metadata Structure Definition Category can have child categories comprises subject or reporting categories Data or Metadata Flow Data Provider Provision Agreement uses specific data/metadata structure can be linked to categories in multiple category schemes conforms to business rules of the data/metadata flow can get data from multiple data providers can provide data or metadata for many data or metadata flows using agreed data or metadata structure is registered for Registered Data or MetadataSet Data or Metadata Set

24 Slide 24 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Tools Demonstration

25 Slide 25 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX Registry A repository for keeping –Structural metadata (e.g. CodeLists, ConceptSchemes, DSDs) –Provisioning information (e.g. Dataflows, Provision agreements) Repository is accessible via a Web Service accepting SDMX-ML messages Graphical User Interface (GUI) for user interaction over the Web

26 Slide 26 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Data Structure Wizard DSW – “standalone” application (replacing AccessDB tool) Main functionalities –Manage data structures (create, modify, delete, query) –Import/Export SDMX-ML structures (validate structure messages) –Import/Export GESMES/TS structure files –Create Data messages –Query SDMX Registry –Submit data structures to SDMX Registry

27 Slide 27 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Example - DSD creation using the DSW

28 Slide 28 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Example Dimensions –Frequency ( CL_FREQ ) –Reference Area ( CL_AREA_EE ) –Time period –Product ( CL_PRODUCT ) Attributes –Compilation ) –Confidentiality ( ) –Status ( ) –Availability ( ) Group

29 Slide 29 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 C – SDMX-ML Data sets SDMX Training 29 November 2007

30 Slide 30 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Syntaxes for SDMX data Based on a common Information Model –SDMX-EDI (GESMES/TS) EDIFACT syntax Time series oriented – One format for Data Sets –SDMX-ML XML syntax Four different formats for Data Sets Easier validation (XML based) Tools enable us to use the desired format

31 Slide 31 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX-ML Data Messages Equivalent representations for reporting Datasets: –Generic message: one schema, not domain-specific –Compact message: format for large-volume exchange of data, schema is specific to a DSD –Utility message: format for advanced validation, schema is specific to a DSD –Cross-Sectional message: format for non-time- series data, schema is specific to a DSD

32 Slide 32 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 The SDMX-ML Time-Series format Used for representing time-series data Contain related metadata as defined in DSDs Three different (equivalent) representations available –Generic message –Compact message –Utility message

33 Slide 33 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Generic Dataset

34 Slide 34 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Compact Dataset

35 Slide 35 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Utility Dataset

36 Slide 36 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 The SDMX-ML Cross-Sectional data format Used for representing non time-series data Contain related metadata as defined in DSDs Two different representations available –Generic message –Cross-Sectional message

37 Slide 37 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Cross-Sectional Dataset

38 Slide 38 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Conversions Equivalent formats –Can convert from any SDMX-ML format to another –Based on the same IM –Exceptions: If a Cross-Sectional DSD does NOT contain time dimension –Conversions: Between the SDMX-ML formats Can be expanded to other formats (e.g. CSV, GESMES)

39 Slide 39 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 D – Producing SDMX-ML Data sets SDMX Training 29 November 2007

40 Slide 40 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Reporting and Dissemination Guidelines Define and classify all the underlying concepts of a dataset Provide the specification of the DSD: –Name & identifier –List of statistical concepts –List of metadata concepts –List of code lists Provide the related Dataflows ( e.g. STSRTD_TURN_M, DEMOGRAPHY_RQ ) List the Mandatory attributes ( e.g. reference area, frequency ), and the Conditional ones

41 Slide 41 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Message Implementation Guidelines (MIG) Comprises: –DSD details (id, version, agencyID) –Dimensions (concepts, representations, dimension types -e.g. frequency, entity, count, etc.-, attachment level ) –Measure (primary or cross-sectional) –Attributes (concept, representation, assignment status –mandatory or conditional-, attachment level, attribute type, attachment measure) –Groups ( subset of dimensions )

42 Slide 42 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Structure of a MIG document 1.DSD table 2.Dataflows table 3.Referenced concept schemes 4.Referenced Code Lists 5.Detailed explanation of the Generic SDMX- ML sample dataset 6.Detailed explanation of the Compact (or Cross-Sectional) SDMX-ML sample dataset

43 Slide 43 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Example - Data Set creation using the DSW

44 Slide 44 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX Converter Main Functionality –Reading the input message parsing of the message populating the data model of the tool (based on the SDMX v2.0 information model ) –Writing the converted message uses the data model to write the output message in the required target format. Information retrieved from the Registry –Data flow ID is used to retrieve the data flow definition from the Registry. –The DSD is retrieved from the data flow definition and is used to acquire the DSD

45 Slide 45 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX Converter (cont.) Tool utility: –You may already have data in other format than SDMX-ML (e.g. CSV, GESMES/TS) CSV  Compact SDMX-ML –You may want further validation of your data Compact SDMX_ML  Utility SDMX_ML Conversions: –From CSV to any type –From SDMX-ML to any type –From SDMX-EDI to any type

46 Slide 46 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 Conversion Example


Download ppt "Slide 1 Eurostat Unit B3 – Statistical Information Technologies SDMX Training for users 29 November 2007 SDMX training session on basic principles, data."

Similar presentations


Ads by Google