Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESTP Training Course 8 & 9 April 2014 Fabien JACQUET Eurostat B5

Similar presentations


Presentation on theme: "ESTP Training Course 8 & 9 April 2014 Fabien JACQUET Eurostat B5"— Presentation transcript:

1 ESTP Training Course 8 & 9 April 2014 Fabien JACQUET Eurostat B5
SDMX and METADATA ESTP Training Course 8 & 9 April 2014 Fabien JACQUET Eurostat B5

2 Types of metadata Structural metadata
acting as identifiers and descriptors of the data, such as: dimensions of statistical cubes variables titles of tables Nomenclatures (code lists) always be associated with the data to allow their identification, retrieval and browsing.

3 Example for structural metadata

4 Types of metadata Reference metadata
acting only as descriptors of the data, they don’t help to actually identify the data. They can be of different kinds: conceptual metadata methodological metadata quality metadata (process and output) can be exchanged independently from the data they are related to, but are however often linked to them.

5 Example for reference metadata

6 Metadata and the ESS vision
The ESS vision is based on the Commission Communication 404/2009 “on the production methods of EU statistics: a vision for the next decade”. Some main ideas of this vision are: From statistical ‘stove pipes’ to more integrated statistical production processes; Better integration of the ESS in terms of IT infrastructure, IT tools, data quality, metadata, methodology etc. (both in terms of horizontal and vertical integration); Broader use of administrative data sources in the statistical data production processes; Statistical legislation should also be cross-cutting in covering larger statistical domains (first cross-cutting legislation drafted).

7 Standardisation of structural metadata
Code lists describe dimensions in data tables, giving a meaning to the data. Code lists are based on: official statistical classifications such as NACE, NUTS, ISCO… the SDMX Content Oriented Guidelines Domain specific codifications A standard code list is a code list already harmonised Standard code lists should be used all along the statistical business process: data design, collection, aggregation, dissemination, archiving…

8 Example of a harmonised code list (NACE Rev. 1.1)
Old version (before harmonisation) New version (after harmonisation) Domains Old codes Old label_en New codes New label_en hrst, htec MA_TOTAL Manufacturing sector D Manufacturing fats MAN Manufacturing industries theme3 RD Manufacturing industry theme4 B0200 theme8 SE0_4 theme9 TOT_MANUF ds, hrst, htec MA_LOW_TEC Low technology manufacturing sector D_LTC Low-technology manufacturing fats / inn LOT Low Technology (incl. following NACE codes: 15-22; 36, 37) inn I_LOW_TEC Low tech industries: NACE Rev.1 codes 15 to 22, 36 and 37 SE_TOTAL Services: NACE Rev. 1.1 sections G to Q = 50 to 99 G-Q Services SER Services sector

9 Impact on the statistical business processes
Better comparability: same codes for the same concepts Increase efficiency: less transcoding; less code lists; clean lists Improve accuracy: facilitate data management and exchange and reduce the number of errors Re-usability and integration of the data: data warehouse are only possible if codes corresponding to the same concept are the same SDMX implementation: it is essential for the implementation of a SDMX data/metadata exchange process. The ESS standard code lists will also be made available in the Euro SDMX Registry (currently Ramon).

10 Ramon

11 Standard Code Lists in RAMON

12 Standardisation of Reference Metadata
ESMS Euro SDMX Metadata Structure ESQRS ESS Standard for Quality Reports Structure EPMS Eurostat Process Metadata Structure

13 Standardisation of reference metadata
The Euro SDMX Metadata Structure (ESMS)

14 Standardisation of reference metadata
The ESS Standard for Quality Reports Structure (ESQRS)

15 Dissemination of reference metadata

16 Dissemination of national reference metadata

17 The ESS Metadata Handler
The business process Input from national metadata Metadata from the Eurostat Domain manager Eurostat as main administrator ESS – Metadata Handler Euro SDMX Registry ESS-MH IT application RAMON CODED Common user Interface Output produced for the Eurostat Web Other output for Eurostat or external users

18 Impact on the statistical business processes
ESS reference metadata standards are integrated into the ESS-MH, exchange and dissemination of reference metadata in the ESS, allowing: More AUTOMATIC PRODUCTION of the reference metadata in the ESS. Information collected ONLY ONCE and reused (ESQRS -> ESMS; NSI->Eurostat; Eurostat-> IMF/OECD). More harmonised and better availability of metadata on quality. Full SDMX compliance (metadata creation, exchange and dissemination). Cost and resources savings in the ESS.

19 Reference Metadata in the SDMX Information model
19

20 Modeling the data exchange
Who? Who? When? How? Where? What? What? When exchanging statistical data, other useful information are also exchanged: reference metadata 20 2

21 Reference Metadata - PURPOSE
In the SDMX model, objects can have explanatory texts explaining the main features of data. linked to the object by a simple “reference” to the object can be stored and exchanged without being embedded in the data message In the SDMX model, objects can have explanatory texts explaining the main features of data. This is typically described as “reference” metadata. Reference metadata can be stored and exchanged without being embedded in the data message. In other words, those metadata are normally linked to the object by a simple “reference” to the object. Another important point is that very often these metadata are associated not with specific observations or series of data, but with entire collections of data or even with the institutions providing the data. 21

22 Reference Metadata - REPORT
Content provide a structured (sometimes hierarchical) presentation of specific metadata items From a content point of view, reference metadata can provide information on the statistical concepts used, on the methods used for the production of the data or on data quality. Reference metadata may be organised in Metadata Reports, which provide a structured (sometimes hierarchical) presentation of specific metadata items, such as “Contact”, “Metadata update” or “Classification System”. 22 22

23 Reference Metadata - STRUCTURE
A reference metadata report has a metadata structure definition which describes how it is organized. This is similar to a data structure definition describing how a data set is organised. 23

24 SDMX Information Model - DSD vs MSD
Dataset Target Identifier Group of series Metadata attributes Target Object Data Attributes Series Observation The SDMX information model intentionally models the metadata structure definition in a way that is similar to the data structure definition. Where the dimension list (and to some extent groups) in the DSD contains dimensions which define how a data set describes what is being measured, the full and partial target identifiers in the MSD contain identifier components which define how a metadata set identifies what object is being described by the reference metadata. In the DSD, the attribute list and measure list contain attributes, measures, and attachment information which describes what information is in the data set and how it is presented. Similarly, the report structures in the MSD contain metadata attributes which describe what concepts are included in the reference metadata set. 24

25 Reference Metadata - STRUCTURE
A Metadata Structure Definition identifies what metadata concepts are being reported, how these concepts relate to each other (typically as hierarchies), what their presentational structure is, i.e. how they may be represented (as free text, as coded values, etc.), and to which data object they are attached. 25

26 Example of Reference Metadata
This is an example taken from the Eurostat web site, in the tourism statistics tables. The figure is an extract of the Eurostat publication tree, in which the data sets are organised in categories. Next to some nodes in the tree, an icon with ‘M’ indicates the existence of metadata associated to a data set or a collection of datasets for particular categories One metadata report for collection of datasets 26

27 Example of Reference Metadata
the information applies to all the datasets into that Tourism category Reference metadata contains descriptive information organised in a structured way. Reference metadata contains descriptive information organised in a structured way. Reference metadata contains descriptive information organised in a structured way. Clicking in the tree on an icon next to a category level opens a new window displaying the reference metadata pertaining to the category. In this example, the information applies to all the datasets for the population category This figure shows that reference metadata contains descriptive information such as contact details, last updates of the reference metadata, and information on statistical presentation The reference metadata window can present the information in a clean and organised manner because the metadata structure definition defines the structure of the report and provides useful names and descriptions for the information being reported. In short, this example demonstrates the utility of metadata structure definitions in support of such quality frameworks. 27 27

28 Practical Example reusability exchange of data and related metadata
Cross-domain concepts in the SDMX framework describe metadata concepts relevant to several statistical domains. SDMX recommends use of the concepts outlined below whenever feasible to promote re-usability and exchange of statistical information and their related metadata between national and international organizations. Whenever used, these concepts should conform to the specified names, roles, and representations defined in the SDMX Content-Oriented Guidelines. CDC promote: reusability exchange of data and related metadata Dynamic asset: initial growth, then relative stabilisation expected continuous maintenance 28

29 Example for reference metadata
That source contains metadata about the Tourism datasets 29

30 Let's select a specific topic from which we'll create a metadata report
30

31 Content of the selected topic

32 Content of the selected topic
32

33 metadata set & Metadata Structure Definition
A reference metadata set has a set of structural metadata which describes how it is organized. This metadata identifies what reference metadata concepts are being reported, how these concepts relate to each other (typically as hierarchies), how they may be represented (as free text, as coded values, etc.), which is the role in its usage (mandatory or conditional) with which formal SDMX object types they are associated An MSD comprises two fundamental parts: The Object Type(s) to which metadata can be attached to The Concepts for which metadata have to be reported these concepts are grouped under one (or more) Report Structure(s) A Metadata Structure Definition (MSD) defines: the valid content of a Metadata Set in terms of the Concepts comprising the structure of the Metadata Set, how these concepts are related in terms of their role in the Metadata Set, and the valid content of each of the Concepts when used in a Metadata Set. An MSD is similar in structure and intent to a Data Structure Definition (DSD): however, whereas the DSD defines the structure of a DataSet, the MSD defines the structure of a MetadataSet.

34 Defining a Metadata Structure Definition
The Tasks Analysis of the entire set of metadata in order to identify and document the “Concepts” for which metadata are to be reported or disseminated. Determine the structure of the “Metadata Report” in terms of the concepts used, the hierarchy of the concepts when used in the report, and their “representation” (e.g. is a code list used, is the format free text?). Specify the “object type” to which the metadata are to be attached, and how this object type is identified: knowledge of the SDMX Information model is useful here (as the metadata can only be attached to object types that can be identified in terms of the object types that exist in the information model).

35 Defining a Metadata Structure Definition
Which metadata are to be reported or disseminated? Contact Information Content Metadata

36 CONTACT_MAIL_ADDRESS
Metadata Report Structure – Contact Information CONTACT CONTACT_MAIL_ADDRESS CONTACT_ORG CONTACT_ORG_UNIT Usually, the information about the organisation and the unit that publishes the metadata needs to be reported. This information is available on the Eurostat web site. We identify the different information using concepts The following report structure, and underlying concepts, can be derived…. The following screenshots show some of the metadata reported by Eurostat on the Eurostat statistical web site.

37 Metadata Report Structure – Contact Information Two levels of hierarchy in the report Role Concept ID Concept name Format Attribute CONTACT Contact Sub-Attribute CONTACT_ORG Contact organisation Text CONTACT_ORG_UNIT Contact organisation unit CONTACT_MAIL_ADDRESS Contact mail address The actual definition of the concept will be stored into the Concept Scheme. The following screenshots show some of the metadata reported by Eurostat on the Eurostat statistical web site. The usage of the concept, its place in the hierarchy, representation, and attachment are defined in the “Metadata Attribute” part of the MSD (called Attribute in the table). 37

38 Metadata Report Structure – Content Metadata
BASIC_METH_ISSUES POS_ACC_TOUR The following attributes can be derived from these examples. STAT_UNIT

39 SCOPE_OBS TOUR_ACC_ESTAB NACE_55_1
The following attributes can be derived from these examples. NACE_55_1 39

40 Metadata Report Structure – Content Metadata
Role Concept ID Concept name Format Attribute BASIC_METH_ISSUES Basic methodological issues Text Sub-Attribute POS_ACC_TOUR Position of accomodation statistics STAT_UNIT Statistical unit SCOPE_OBS Scope of observation TOUR_ACC_ESTAB Tourist accomodation establishment NACE_55_1 Nace 55.1 – Hotel and silmilar accomodation The following attributes can be derived from these examples.

41 Metadata Report Structure – Concept Scheme
The following concepts are derived from the previous tables: CONTACT CONTACT_ORG The concepts in the concept scheme can be defined in a hierarchy where there is a semantic link between the parent and child concepts; the child concept(s) having a more fine grained semantic meaning of (a part of) the parent. CONTACT_ORG_UNIT CONTACT_MAIL_ADDRESS BASIC_METH_ISSUES POS_ACC_TOUR STAT_UNIT The following attributes can be derived from these examples. SCOPE_OBS TOUR_ACC_ESTAB NACE_55_1 41

42 Metadata Report Structure – The Attachment Object Type
The Metadata Set which is reported (i.e. the actual metadata content) is intended to be metadata about “something”. The “something” is the object type and in an MSD it is necessary to declare the object type and to define how it is identified in terms of its constituent components. For instance, a Code would be identified by a combination of the Code List identifier and the Code identifier. The following attributes can be derived from these examples.

43 Metadata Report Structure – The Attachment Object Type
The attachment object type must be definable using the identifiable object types in the SDMX Information Model – the XML schema demands this and list the following object types. Agency Category Hierarchy ConceptScheme OrganisationScheme StructureSet Concept DataProvider StructureMap Codelist MetadataStructure ComponentMap Code FullTargetIdentifier CodelistMap KeyFamily PartialTargetIdentifier CodeMap Component MetadataAttribute CategorySchemeMap KeyDescriptor DataFlow CategoryMap MeasureDescriptor ProvisionAgreement OrganisationSchemeMap AttributeDescriptor MetadataFlow OrganisationRoleMap GroupKeyDescriptor ContentConstraint ConceptSchemeMap Dimension AttachmentConstraint ConceptMap Measure DataSet Process Attribute XSDataSet ProcessStep CategoryScheme MetadataSet ReportingTaxonomy HierarchicalCodelist The following attributes can be derived from these examples. 43

44 Metadata Report Structure – The Attachment Object Type
Data Category The object type is the Data Category (called “Category” in the SDMX Information Model). If the intent of the MSD is to define where the metadata are to be attached in the Eurostat dissemination environment then this is all that is required. 44

45 Metadata Report Structure – The Attachment Object Type
As we've ssen, Eurostat wishes to publish this and make it available to other organisations (e.g. as a downloadable file) then it would be necessary to identify the Data Provider (which in this case is Eurostat). Data Provider information 45

46 Metadata Report Structure – The Attachment Object Type
Object types Category and Data Provider could be associated with a Code list – there would certainly be a list for all of the data categories (this would be a “Category Scheme”), but for the Data Provider this could be declared as non enumerated (i.e. text). tour_cap Capacity of tourist accommodation establishments tour_occ Occupancy of tourist accommodation establishments tour_emp Employment in the accommodation sector Code list TOURISM_CATEGORY One metadata report per category 46

47 Metadata Report Structure – Bringing it Together
Report Structure - General Structure defined within a Metadata Structure Definition The Metadata Report is given an Id. and name and the Metadata Attributes that comprise the report are defined. Each attribute must reference a Concept and the Concept Scheme within which it is maintained. Thus the Metadata Attributes can be based on concepts from different concept schemes. The format or permitted value list (e.g. a code list, and whether its presence in a Metadata Set is mandatory) is declared for the Metadata Attribute. 47

48 Metadata Report Structure – Bringing it Together
Report Structure - Contact Report CONTACT CONTACT_ORG ESTAT_MSD CONTACT_ORG_UNIT CONTACT_MAIL_ADDRESS ESTAT_METADATA_CS CATEGORY_REPORT For the Contact report the following schematic shows the definition.

49 Metadata Report Structure – Bringing it Together
Report Structure - Quality Report BASIC_METH_ISSUES POS_ACC_TOUR STAT_UNIT ESTAT_MSD SCOPE_OBS TOUR_ACC_ESTAB NACE_55_1 CATEGORY_REPORT ESTAT_METADATA_CS

50 Metadata Report Structure – Bringing it Together
Defining the Attachment Object Type Schematic references a sub set of the Identifier Components of the Full Target identifier defines all of the possible object types that are within the scope of the MSD The intent here is to define the object types and identifiers of the object types. These object types (Target Object Type in the diagram) must be in the list in the schema (this list is shown earlier in this paper). There can be many such objects identified in a single Metadata Structure Definition. The Full Target Identifier defines all of the possible object types that are within the scope of the MSD, and links each with a representation scheme (which can be any of the “Item Schemes” in the Information Model e.g. Code List, Concept Scheme, Category Scheme, Organisation Scheme). The Partial Target Identifier references a sub set of the Identifier Components of the Full Target Identifier. As will be seen below, a Report Structure is linked to either the Full Target Identifier or one of the Partial Target Identifier. 50

51 Metadata Report Structure – Bringing it Together
Defining the Attachment Object Type Attachment Object Types Data Category 51 Data Provider

52 Metadata Report Structure – Bringing it Together
Defining the Attachment Object Type Attachment Object Types references just the Identifier Component linked to the Data Provider ESTAT_MSD comprises the Category and the DataProvider object types CATEGORY AGENCY Category CATEGORY In this MSD, the Full Target Identifier (called CATEGORY) comprises the Category and the DataProvider object types. The Partial Target Identifier (AGENCY) references just the Identifier Component linked to the DataProvider (called AGENCY). ESTAT_CATEGORY_SCHEME Data Provider AGENCY 52

53 Metadata Report Structure – Bringing it Together
Defining the Attachment Object Type Note that this metadata is attached at a fairly high level – the level of the subject domain category – for the data provider. If there are metadata at a lower level of granularity, for instance at the level of the “table”, then this can also be specified in an MSD. In order to attach metadata to each of the tables then each of these tables can be defined as a “Dataflow” and the metadata attached to the provision of the data by a data provider for this dataflow. It is also possible, using an MSD, to define metadata attachment to data structure definitions, components of data structure definitions (e.g. a dimension), even values of dimensions.

54 Metadata Report Structure – Bringing it Together
Link of the Report Structures to the relevant Target Identifiers ESTAT_MSD CATEGORY CATEGORY_REPORT ESTAT_CATEGORY_SCHEME AGENCY The final piece of the jigsaw in this example is to link the Report Structures to the relevant Full or Partial Target Identifier. This has already been done in the XML examples link the Report Structures to the relevant Target Identifiers 54

55 Metadata Report Structure – Bringing it Together
Link of the Report Structures to the relevant Target Identifiers The XML that makes this link is the target attribute in the Report Structure

56 Metadata Set: Structure
References to : a Metadata Structure Definition (MSD) a Report Structure a Target Identifier Defines: The actual values of the target objects Comprises: The Reported Attributes and their corresponding Values These Attributes may be: coded text date/time number etc.

57 Metadata Set – General Schematic
CATEGORY_REPORT ESTAT_MSD Category = tour_occ Data Provider = EUROSTAT Metadata is reported in a Metadata Set. Each metadata report is reported in a separate Metadata Set, and there can be many such sets in an SDMX Message.

58 Metadata Set – General Schematic
CATEGORY_REPORT CONTACT BASIC_METH_ISSUES CONTACT_ORG POS_ACC_TOUR Unit G3 Short-term statistics; tourism CONTACT_ORG_UNIT Eurostat, Statistical Office of the European Communities CONTACT_MAIL_ADDRESS

59 Metadata Set – General Schematic
CATEGORY_REPORT STAT_UNIT SCOPE_OBS

60 Metadata Set – General Schematic
CATEGORY_REPORT TOUR_ACC_ESTAB NACE_55_1

61 Metadata Set – Metadata file
The SDMX-ML for this, reported in the generic format for a Metadata Set is displayed here.

62 Metadata Set – ESMS example
Metadata reports are structured using a set of main Concepts. In this example, taken from the Eurostat standard metadata format, these concepts and sub-concepts represent the metadata elements for which a text needs to be reported. In the example, “Contact organisation” has the value “ Statistical Office of the European Communities (Eurostat)” 62 62

63 Metadata Set – ESMS Example
In our example, the concept used in the metadata report can be easily retrieved into the metadataset: Attribute “Contact” with its three sub-concepts: Organisation, organisation unit and address Attribute “Metadata update” with its three sub-concepts: last certified, last posted, and last updated. 63 63

64 Identifier Component is renamed Target Reference
METADATA STRUCTURE Changes Full Target Identifier and Partial Target Identifier are replaced by the single Metadata Target Identifier Component is renamed Target Reference With the SDMX 2.1 release of the standards some changes were introduced. Using more meaningful terms, the Full Target Identifier and Partial Target Identifier are replaced by the single Metadata Target. The related Identifier Component is renamed Target Reference containing the Identification elements of a Target Object. Another important change is that, in version 2.1, the target object to which metadata can be attached is not limited to simple objects such as a Code or a Category, but may be better identified at any level. Furthermore, it is even possible to specify a URN as the identification mechanism, and the Metadata Report can also be specified as XHTML. 64 64

65 Metadata Message SYNTAX Changes
2.0 2.1 Generic Structure-specific Metadata set GenericMetadata MetadataReport StructureSpecificMetadata In 2.1, the two formats are quite similar In 2.1, the two formats are quite similar Some changes have also been made for the reference metadata messages. the Structure Specific metadata set previously called MetadataReport is renamed to StructureSpecificMetaData to be more consistent with the data messages. The XML structures of the generic and structured metadata (StructureSpecificMetadata) sets have also been aligned. In version 2.0, the generic metadata message was structured much differently from the structure-specific metadata message. In 2.1, the two formats are quite similar, with the same tags being used for each level of the metadata set structure, and there is a better equivalence between data messages and metadata messages.

66 Questions?


Download ppt "ESTP Training Course 8 & 9 April 2014 Fabien JACQUET Eurostat B5"

Similar presentations


Ads by Google