Eurostat Unit B3 – IT and standards for data and metadata exchange

Eurostat Unit B3 – IT and standards for data and metadata exchange
SDMX for IT Experts Mapping Process Jorge Nunes Raynald Palmieri 18-20 February 2014 Eurostat Unit B3 – IT and standards for data and metadata exchange

Table of Contents Mapping Process
Introduction Architecture of Pull Method Data Provider Side Data Consumer Side Data Producer Architecture Census-hub When to do a mapping and why Step by Step Example

Advantages of Pull method
Introduction Data Sharing Model Push method Data Provider main actor Pull method Data Consumer main actor Advantages of Pull method Reduce reporting burden Data consumer is guaranteed to receive most recent data In the framework of a data sharing model, a group of organizations agrees on a common way to provide access to data according to standard processes, formats and technologies. There are two ways for sharing statistical data: the “push” method, in which the data provider takes action to send data to the data consumer; and the “pull” method[1], where data are taken directly from the data provider by the data consumer. The pull method has the advantage of reducing the reporting burden for European and National Institutions, as the NSIs (National Statistical Institutes) do not need to send data to local consumers. In this way, data consumers are also guaranteed to receive the most recent data, as they are taken directly from the source. [1] Reference source: Student book – “Introduction to SDMX”

Architecture of Pull Method
Architectures Database-driven architecture Hub architecture Both Architectures implement notification mechanism Two Actors Data Provider Data Consumer Data sharing using the pull mode is well adapted to the database-driven and data hub architectures. Both architectures provide the best benefits for the data producers because they can lessen the burden of publishing the data to multiple counterparties. In both architectures, it is necessary to implement a notification mechanism, providing provisioning metadata in order to alert collecting organisations that data and metadata sets are made available by data providers. Details about the online mechanism for getting data (for example, a query able online database or a simple URL) and constraints regarding the allowable content of the data sets that will be provided. There are two actors in the pull method: the data provider (the organization providing the data) and the data consumer (the organization using the data made available by the data provider). For this reason, this process will be described from both sides:

Architecture of Pull Method Data Provider
Using SDMX compliant files ALERT NOTIFICATION (URL) DATA PROVIDER DATA CONSUMER D.C. SDMX-ML D.P. URL The data provider can make data available in two different ways: Using SDMX compliant files. The steps are as follows: The data provider creates a new SDMX-ML file with new data or updated old data. The data provider makes the SDMX-ML file available at a specific URL. The data provider alerts the data consumer about this new SDMX-ML data file by sending the URL where the file can be retrieved.

Architecture of Pull Method Data Provider
Using a web service SDMX QUERY ALERT NOTIFICATION (END-POINT) DATA PROVIDER Database SQL DATA CONSUMER D.C. DATA SDMX-ML Using a web service. In this case, the web service controls and manages the access to the data provider’s dissemination database. When new data are available in the data provider’s dissemination database, the data provider alerts the data consumer by sending the end-point of the web service to access this new data. To retrieve the new data, the web service receives an SDMX query sent by the data consumer; the web service translates it into a SQL query and processes it in the database. The data resulting from the SQL query is translated by the web service into an SDMX-ML data file and provided to the data consumer in response to the request.

Architecture of Pull Method Data Consumer
Database-driven architecture In the data consumer model there are two different designs from the point of view of architecture: database-driven architecture and hub architecture. The database-driven architecture is implemented by those collecting organisations that periodically need to fetch the data and to load them in their database. In general a batch process is used in order to automate the flow in which a whole or a partial dataset, including incremental updating, is used. The pull approach within a database-driven architecture includes the following steps based on a provision agreement: When new data is available, the data provider creates an SDMX-ML file containing the new data set OR provide a web service (WS) that builds SDMX-ML messages upon request. Notification to data consumers about the new data and the details on how to obtain them can be performed with an RSS web feed. The data collector Pull Requestor reads the new RSS feed entry (or receives the information on the new data by other means. He can now retrieve the SDMX-ML file from the specified URL OR use the “Query Message” included in the RSS feed to query the data provider’s web service.

Architecture of Pull Method Data Consumer
Hub architecture SDMX also supports the “Data Hub” concept/architecture, where users obtain data from a central hub which itself automatically assembles the required dataset by querying other data sources. Data providers can notify the hub of new sets of data and corresponding structural metadata (measures, dimension, code lists, etc.) and make data available directly from their systems through querying means. Data users can browse the hub to define a dataset of interest via the above structural metadata and retrieve the desired dataset. From the data management point of view, the hub is also based on a specific datasets, which are - contrary to the database driven architecture - not kept locally at the central hub system. Instead the following process operates: A user identifies a dataset through the web interface of the central hub using the structural metadata, and requests it; The central hub translates the user request in one or more queries and sends them to the related data providers’ systems; Data providers’ systems process the query and send the result to the central hub in standard format; The central hub puts together all the results originated by all interested data providers’ systems and presents them in a human readable format.

Census-HUB Key Issues (1)
Dissemination 2011 data on population and housing censuses in the European Union Data structured according to “hypercubes” agreed with Member States (Census Regulation) Harmonised concepts and definitions Management of massive amounts of data by Member States For the first time, the European Union has a legislation aiming at the availability of harmonised high-quality data from the population and housing censuses conducted in the EU Member States in 2011. It is a common ambition of the ESS to disseminate the results in a way that provides the users with an easy access to detailed census data that are methodologically comparable between the EU Member States and structured in the same way. The volume of census data is particularly high. This situation calls for an innovative technical solution for the transmission and dissemination of census data The dissemination system shall provide the user high data accessibility. The added value of the census, namely the high geographical resolution and the possibility to cross tabulate harmonised census data, should be offered to the user to the maximum possible extent. Based on harmonised concepts, definitions and specifications of the data transmitted, the tool should be designed to allow maximum flexibility to cross tabulate data. Despite these requirements, the dissemination tool should be easy to use. Problems linked to a speedy access to massive amounts of data should be overcome.

Census-HUB Key Issues (2)
Users have easy access to detailed census data High accessibility to data and metadata Flexibility to cross-tabulate data from different sources Easy to use For the first time, the European Union has a legislation aiming at the availability of harmonised high-quality data from the population and housing censuses conducted in the EU Member States in 2011. It is a common ambition of the ESS to disseminate the results in a way that provides the users with an easy access to detailed census data that are methodologically comparable between the EU Member States and structured in the same way. The volume of census data is particularly high. This situation calls for an innovative technical solution for the transmission and dissemination of census data The dissemination system shall provide the user high data accessibility. The added value of the census, namely the high geographical resolution and the possibility to cross tabulate harmonised census data, should be offered to the user to the maximum possible extent. Based on harmonised concepts, definitions and specifications of the data transmitted, the tool should be designed to allow maximum flexibility to cross tabulate data. Despite these requirements, the dissemination tool should be easy to use. Problems linked to a speedy access to massive amounts of data should be overcome.

Census-HUB Concept (1) Based on the concept of data sharing.
A group of partners agree on providing access to their data according to standard processes, formats and technologies. New system to achieve the dissemination of the 2011 Census data, based on the concept of data sharing, where a group of partners agree on providing access to their data according to standard processes formats and technologies. Vision: one of the first crucial tests for a practical implementation of the Commission Communication on “a vision for the next decade”. Data sharing: Rather than duplicating existing data warehouses at national level, the Hub is based on the concept of data sharing, where a group of NSIs open up access to their data repositories, organised according to standard processes and formats. SDMX ensures the use of standard formats and techniques for retrieval, exchange and processing of data and metadata. Limited investment. In some cases (e.g. in Poland) the whole national architecture could be set up in just a few days, as the country could use, in part or as a whole, solutions already developed by Eurostat (directly on indirectly through the ESSnet) or by other institutes for building the SDMX infrastructure. Reusability. The solutions built for the Census Hub project can easily be re-used in other subject matter domains: invest once and use several times. Capacity-building. Eurostat has put in place a set of capacity building actions (Training, Workshops and a specific working group for NSIs’ IT staff involved in the Census Hub exercise), in addition to technical advice through bilateral meetings.

Census-HUB Concept (1) Advantages of the hub approach:
Decoupling of NSIs' systems from the central hub via standard formats and techniques for the exchange. NSIs free to provide more information than what is contained in the agreed hypercubes without additional effort. Limited investment, re-usability (with the advantage of using recognized international standards). New system to achieve the dissemination of the 2011 Census data, based on the concept of data sharing, where a group of partners agree on providing access to their data according to standard processes formats and technologies. Vision: one of the first crucial tests for a practical implementation of the Commission Communication on “a vision for the next decade”. Data sharing: Rather than duplicating existing data warehouses at national level, the Hub is based on the concept of data sharing, where a group of NSIs open up access to their data repositories, organised according to standard processes and formats. SDMX ensures the use of standard formats and techniques for retrieval, exchange and processing of data and metadata. Limited investment. In some cases (e.g. in Poland) the whole national architecture could be set up in just a few days, as the country could use, in part or as a whole, solutions already developed by Eurostat (directly on indirectly through the ESSnet) or by other institutes for building the SDMX infrastructure. Reusability. The solutions built for the Census Hub project can easily be re-used in other subject matter domains: invest once and use several times. Capacity-building. Eurostat has put in place a set of capacity building actions (Training, Workshops and a specific working group for NSIs’ IT staff involved in the Census Hub exercise), in addition to technical advice through bilateral meetings.

Census-HUB – aims and objetives
To develop the central applications hosted in Eurostat Hub Tool to allow NSIs to update the LAU code list directly To support the implementation of the national SDMX IT infrastructures Advice on how to implement the SDMX infrastructure SDMX training and technical workshops Technical IT Working Group To facilitate software sharing between countries Developed and made available for free download a NSI SDMX infrastructure by Eurostat Through the ESSnet on SDMX Local Administrative Units (LAU) are the basic components of NUTS regions. NUTS: Nomenclature of Territorial Units for Statistics At the local level, two levels of Local Administrative Units (LAU) have been defined. The upper LAU level (LAU level 1, formerly NUTS level 4) is defined for most, but not all of the countries. The second LAU level (formerly NUTS level 5) consists of about municipalities or equivalent units in the 27 EU Member States (situation of 2007)

Census-HUB – a user can…
Browse the Hub to define a dataset of interest, navigating via structural metadata: Select hypercube or search by topic (filters) Select data (level of detail, breakdowns) Select layout (axes) View a table Save a query Export a file (CSV, Excel, SDMX-ML) Local Administrative Units (LAU) are the basic components of NUTS regions. NUTS: Nomenclature of Territorial Units for Statistics At the local level, two levels of Local Administrative Units (LAU) have been defined. The upper LAU level (LAU level 1, formerly NUTS level 4) is defined for most, but not all of the countries. The second LAU level (formerly NUTS level 5) consists of about municipalities or equivalent units in the 27 EU Member States (situation of 2007)

Census-HUB – architecture
National Statistical Institute A data hub is a pull mode based architecture for common data sharing. Data is not previously collected and stored in a central repository but it is [A>] directly accessed from the data providers’ databases through a central hub upon [B> ]request of a data collector:[/A][/B] [C>] The data collector browses the Hub to define the dataset of interest via its structural metadata. The Hub converts the user’s request into an SDMX Query message and [D>] sends it to an National Statistical Institute’s Web Service. [/C] [/D] [E>}The NSI Web Service converts the SDMX Query in a set of SQL queries, fetches the data from the NSI database, dynamically constructs the SDMX-ML file and [F>] sends it back to the hub. If the request concerns data of several NSIs, the steps are executed simultaneously for each of them. [/E][/F] [G>] The Hub assembles all the SDMX-ML files received from the NSIs and presents the result to the user in a readable format.[/G] The workflow is: Step 1: a “data user” browses the Hub to define a dataset of interest via structural metadata. He browses the dimensions and selects a dataset. Then he chooses the organization of the output layout specifying which dimension wll match X-axis and Y-axis and which dimension will vary item after item to generate new tables Step 2: The Hub converts the user request into an SDMX Query and sends the SDMX Query to an interested NSI Web Service Step 4: The NSI Web Service converts the SDMX Query in a set of SQL queries and sends them to the NSI data warehouse Step 5: The NSI data warehouse sends the result to the NSI web service Step 6: The NSI Web Service converts the result in a SDMX-ML Data message and sends it to the Hub Step 7: The same steps are repeated if the user has requested data from different MSs Step 8: the Hub puts together all the SDMX-ML data messages proceeding from the interested NSIs and presents the result to the “data user” in the web browser in readable format. Eurostat Census Hub National Statistical Institute

Census-HUB how to participate - 1
NSIs may announce their interest to participate any time by contacting Eurostat Eurostat prepares the implementation kit (DSDs, dummy data files, SDMX messages examples, guidelines, etc.) Eurostat interacts with the census IT contact in order to define a common working plan NSIs may ask technical advice to Eurostat (bilateral meetings)

Census-HUB – how to participate – 2
CIRCA → Eurostat → X-DIS Census Hub → Public documents Read the experiences from other NSIs Contact Eurostat unit B5 for technical advice SDMX training available Technical bilateral meeting can be arranged upon request

SDMX Structure files Concepts Codelists (excluded GEO) Keyfamilies (one for each hypercube) Partial Geo Codelist SDMX data message example (cross-sectional), SDMX Query message example and Dummy hypercube in CSV format MIG XML schema (XSD file) Explanatory notes Census Hub WS Implementation Guideline Census Hub WS Security Guideline Census 2011 Regulation Census 2011 Explanatory notes

Hardware One Server (acting as web and database server ) exposed on Internet or A web server exposed on internet and one database server Application solution In house SDMX NSI Reference Infrastructure by Eurostat Software Web server (IIS, Apache, etc.) Application Server (.NET, Tomcat, WebLogic, etc.) DBMS (Oracle, MS SQL Server, MySQL) Web Service framework (ASP.NET, JAX-WS)

Evaluate the possibility to re-use/adapt software already developed SDMX NSI Reference Infrastructure, developed by Eurostat Tools available from the SDMX web site Set up the necessary hardware and software Agree with Eurostat on a work-plan When the system is up and running, contact Eurostat to start testing

Census-HUB - Environment

Data Producer Architecture
Which statistical domains are involved and where are the data currently stored? Which structural metadata are involved, and where are they currently stored? What is the business process behind the data flow involved in the exercise? Will the SDMX data producer architecture be part of data warehouse architecture, of data hub architecture or of both? In order to implement an SDMX IT architecture for data-sharing using the pull mode, several steps must be accomplished by a data producer and several questions must be considered: Which statistical domains are involved and where are the data currently stored? Which structural metadata are involved, and where are they currently stored? What is the business process behind the data flow involved in the exercise? Will the SDMX data producer architecture be part of data warehouse architecture, of data hub architecture or of both?

Data Producer Architecture
Data and structural metadata continue to be stored in files. Data and structural metadata are already stored in a database. One separate special-purpose database is set up to store data and structural metadata. The cases (b) and (c) make possible: To extract SDMX-ML files from the database that will be made available to be pulled by data collectors; To allow the database to be queried directly through a web service. Generally data and structural metadata that will be involved in the new SDMX information system are stored either in databases or in files. The two cases lead to different architectural approaches: a. Data and structural metadata continue to be stored in files (for example: XLS, CSV, etc.) and the only need is to translate those files into SDMX-ML data files to be pulled by the data collector; b. Data and structural metadata are already stored in a database and it is necessary to build suitable software interfaces in order to make the system “SDMX-compliant”. c. One separate special-purpose database is set up to store data and structural metadata. This database will be designed with the main aim of being part of an SDMX-compliant system. In this case the database can be modelled using the SDMX Information Model. The cases (b) and (c) make possible: -To extract SDMX-ML files from the database that will be made available to be pulled by data collectors; -To allow the database to be queried directly through a web service. Whichever type of data producer architecture is involved, a mapping process between structural metadata may be necessary, as explained below.

When to do a mapping and why
When the data provider needs to disseminate SDMX format data described by local structural metadata. The data of the data provider are described by local concepts that are different from the concepts of the corresponding DSD. The mapping is used to: Translate the SQL query resulting from the parsing process of an SDMX Query Message to local concepts (data provider shares data using a web service). Create an SDMX data file. A mapping process is necessary when the data provider needs to disseminate SDMX format data described by local[1] structural metadata. Some suppositions will be considered: The data of the data provider are described by local concepts that are different[2] from the concepts of the corresponding DSD. The local concepts and DSD concepts are in a one-to-one relationship. This means that the local concepts can only be compared with DSD concepts. To make the examples easier, the same letters will be used for the same concepts on both sides (e.g. local concept Cloc corresponds to the DSD concept Cdsd). The mapping is used to: [1] Data defined by the data provider. [2] The names, the code lists or the values of the local concepts are different from the names, code lists and values of the DSD concepts.

Step by Step Example Step 1: List of Local Concepts
Primary table: Domain Set Category Type Activity Freq um E ip 63 m DA 12 pe DB Secondary Table: Domain Set Category Type Activity Month Year Value e ip 63 m DA 01 2006 101.1 DB 77.0 A dissemination database will be used as an example to explain the mapping process. This database is managed in the web application ConIstat[1] for short terms statistics in Italy. The type of storage used in this database comprises more than one table. The primary table has one primary key comprising the columns “Domain”, “Set”, “Category”, “Type”, “Vs”, and “Activity” and the table is completed by the columns “Freq” and “um”. The secondary table is joined to the first one by the primary key of the primary table and it contains the columns concerning time data: “Month”, “Year” and the column “Value”. [1] ConIstat is a database containing more than time-series of short term indicators produced by ISTAT (Italian National Institute of Statistics).

Step by Step Example Step 1: List of Local Concepts
Local columns description Domain Set of time series grouped by topic Set Grouping of second level Category Grouping of third level Type Adjustment Activity Nace code Freq Frequency of series um Unit of measure of data, unit multiplier and base year for index Year Year of measurement of data Month Month of measurement of data The list of local concepts is: “Domain”, “Set” and “Category” represent three levels of grouping information. The first one (“Domain”) represents the topic of series. For example: “Consumer prices”, “Services”, “Industry”, etc. The second one (“Set”), represents a subgroup of “Domain”. For example for Domain=“Industry”: “Industrial turnover”, “Industrial production”, etc. The last one, “Category”, represents a subgroup of “Set”. For example for Set=”Industrial production”: “raw index of industrial production”, “Industrial production working day adjusted”. Type: Defines if the data are raw or adjusted and the kind of adjustment. For example “m” represents “Working day adjusted”. Activity: Represents the version 1.1 of the classification of economical activity Nace ver. 1.1[1]. For example “DA” represents “Manufacture of food products, beverages and tobacco products” and “DB” Manufacture of textiles, apparel, leather and related products”. Freq: Represents the frequency at which data are produced. In this database the value “12” indicates that the frequency is twelve months. Um: Is the unit of measure and also includes the base year and unit multiplier. For example “pe” represents “Index number (base 2000)”. Year and Month: Represents the period of survey. [1] Nace vers 1.1 is the classification of economical activity. The link where to find it is:

Step 2: Associate all the local concepts to the DSD concepts
Id Description Type of concept Obligatoriness ADJUSTMENT Adjustment indicator Dimension FREQ Frequency OBS_STATUS Observation status Attribute Mandatory REF_AREA Reference area STS_ACTIVITY Economic Activity code STS_BASE_YEAR Series variation in short-term statistics STS_INDICATOR STS Indicator STS_INSTITUTION Institution originating STS dataflow to ECB TIME_FORMAT Time format code TIME_PERIOD Time period or range TimeDimension UNIT Unit Conditional UNIT_MULT Unit multiplier The DSD used is “EUROSTAT_STS version 2.0”. As for paragraph for the DSD of Demography rapid survey, this paragraph provides a brief description of the DSD. The concepts used are the ones of the table.

Id Code list used Code(s) used Description of code(s) ADJUSTMENT CL_ADJUSTMENT W Working day adjusted, not seasonally adjusted FREQ CL_FREQ M Monthly OBS_STATUS CL_OBS_STATUS A Normal value REF_AREA CL_ AREA_EE IT Italy STS_ACTIVITY CL_STS_ACTIVITY N100DA Manufacture of food products, beverages and tobacco N100DB Manufacture of textiles and textile products STS_BASE_YEAR CL_STS_BASE_YEAR 2000 Year 2000 STS_INDICATOR CL_STS_INDICATOR PROD Production (variables 110, 115, 116) STS_INSTITUTION CL_STS_INSTITUTION 1 National Statistical Office(s) TIME_FORMAT CL_TIME_FORMAT P1M TIME_PERIOD - UNIT CL_UNIT PURE_NUMB dimensionless value other than PC, PCPA and POINTS UNIT_MULT CL_UNIT_MULT Units The name of the corresponding code lists and the values used in the example are:

Freq FREQ - REF_AREA Type ADJUSTMENT Domain STS_INDICATOR Set Category Activity STS_ACTIVITY STS_INSTITUTION Um STS_BASE_YEAR UMIS UMIS_MULT Year, Month TIME_PERIOD TIME_FORMAT OBS_STATUS The following are considered in the mapping: The Domain, Set and Category fields are used by the web application to navigate between different levels of aggregation and all represent the indicator. This means that they can be associated to the DSD concept “STS_INDICATOR” that represents the type of indicator such as production, turnover, etc. The “Type” local concept can be associated to the DSD concept “ADJUSTMENT”. “um” can both be associated to the DSD concepts “UNIT”, “UNIT_MULT” and “STS_BASE_YEAR”. This is because they contain information that needs to be mapped with all three DSD concepts. The “Freq” local concept can be associated to the DSD concept of “FREQ”. The “Year” and “Month” local concepts can be associated to the DSD concept “TIME_PERIOD”. In the table there are the values “2006” for “Year” and “01” for “Month”. Here, a table with the associations. The mapping table of local concepts and DSD concepts indicates that the mapping between concepts is not always one-to-one. We have these possibilities in the example we are working with: One local concept corresponds to one DSD concept: Local concept “Type” with DSD concept “ADJUSTMENT”. Local concept “Activity” with DSD concept “STS_ACTIVITY”. One DSD concept corresponds to more than one local concept: “Domain”, “Set” and “Category” with the DSD concept “STS_INDICATOR”, Local concepts “Year”, “Month” with DSD concept “TIME_PERIOD”. One local concept corresponds to more than one DSD concept; “um” with DSD concepts “STS_BASE_YEAR”, “UMIS”, “UMIS_MULT”. “Freq” with DSD concepts “FREQ” and “TIME_FORMAT”. One DSD concept does not correspond to any local concept: “REF_AREA”, “STS_INSTITUTION”; “TIME_FORMAT” and “OBS_STATUS. In this case the values of these concepts must be added according to the data because they are mandatory in the SDMX data file. In the example, the concept REF_AREA is equal to “IT”= “Italy”, the concept STS_INSITUTION is equal to “1”=”National Statistical Office(s)”, and the OBS_STATUS is equal to “A”=”Normal Value”. These values will be reported in the mapping codes, the process that will be presented next. A local concept does not correspond to any DSD concept. There is no case in the example.

Step 3: Mapping of Codes LOCAL CONCEPTS LOCAL CODES DSD CODES DSD CONCEPTS Freq 12 M FREQ IT REF_AREA Type m W ADJUSTMENT Domain Set Category e ip 63 PROD STS_INDICATOR Activity DA DB N100DA N100DB STS_ACTIVITY STS_INSTITUTION Um pe 2000 STS_BASE_YEAR PURE_NUMB UMIS UMIS_MULT Year, Month TIME_PERIOD P1M TIME_FORMAT Value OBS_VALUE A OBS_STATUS To understand better the mapping of codes, we need to know the description of the different values the concepts of the example can have. Once the values have been described adequately, the result of the mapping of local codes with DSD codes is:

Step 4: Use of the mapping
Two purposes: Translation of a SDMX Query. Translation of one dataset Once the mapping has been created, it can be used for two purposes:

Translation of a SDMX Query <query:DataWhere> <query:And> <query:Dimension id="STS_INDICATOR">PROD</query:Dimension> <query:Dimension id="ADJUSTMENT">W</query:Dimension> <query:Time> <query:StartTime>200601</query:StartTime> <query:EndTime>200601</query:EndTime> </query:Time> <query:Or> <query:Dimension id="STS_ACTIVITY">N100DA</query:Dimension> <query:Dimension id="STS_ACTIVITY">N100DB</query:Dimension> </query:Or> </query:And> </query:DataWhere> Having one “query:DataWhere” element example from a SDMX Query as this one:

Translation of a SDMX Query STS_INDICATOR = ‘PROD’ is equal to Domain = ‘e’ and Set=’ip’ and Category=’63’ ADJUSTMENT = ‘W’ is equal to Type=’m’ STS_ACTIVITY = ‘N100DA’ OR STS_ACTIVITY = ‘N100DB’ is equal to Activity=”DA” or Activity=”DB” STARTTIME = AND ENDTIME = ” is equal to year=’2006’ and Month=’01’ “STS_INDICATOR = ‘PROD’ AND ADJUSTMENT = ‘W’ AND (STS_ACTIVITY = ‘N100DA’ OR STS_ACTIVITY = ‘N100DB’ ) AND STARTTIME = AND ENDTIME = ” The parsing of the SDMX query will give the where clause of the SQL query below: “STS_INDICATOR = ‘PROD’ AND ADJUSTMENT = ‘W’ AND (STS_ACTIVITY = ‘N100DA’ OR STS_ACTIVITY = ‘N100DB’ ) AND STARTTIME = AND ENDTIME = ” Using the mapping, the translation of the DSD elements to the local elements we have already mapped is: “Domain = ‘e’ and Set=’ip’ and Category=’63’ and Type=’m’ and (Ateco=”DA” or Ateco=”DB”) and year=’2006’ and Month=’01’ Where: STS_INDICATOR = ‘PROD’ is equal to Domain = ‘e’ and Set=’ip’ and Category=’63’ ADJUSTMENT = ‘W’ is equal to Type=’m’ STS_ACTIVITY = ‘N100DA’ OR STS_ACTIVITY = ‘N100DB’ is equal to Ateco=”DA” or Ateco=”DB” STARTTIME = AND ENDTIME = ” is equal to year=’2006’ and Month=’01’

Translation of a Dataset Year Month CN Domain Set Category Type freq umis Activity 2006 01 84.5 e ip 63 m 12 pe DA 85.6 DB From a local data table expressed in local concepts taken as example.

Translation of a Dataset TIME_ PERIOD OBS_ VALUE STS_ INDICATOR ADJUSTMENT ACTIVITY FREQ INSTITUTION STATUS FORMAT REF_ AREA BASE_ YEAR 200601 84.5 PROD W N100DA M 1 A P1M IT 2000 85.6 N100DB The local concepts and local codes are translated to the mapped values of the DSD.

Translation of a Dataset Year Month CN Domain Set Category Type freq umis Activity 2006 01 84.5 e ip 63 m 12 pe DA 85.6 DB TIME_ PERIOD OBS_ VALUE STS_ INDICATOR ADJUSTMENT ACTIVITY FREQ INSTITUTION STATUS FORMAT REF_ AREA BASE_ YEAR 200601 84.5 PROD W N100DA M 1 A P1M IT 2000 85.6 N100DB The local concepts and local codes are translated to the mapped values of the DSD.

Translation of a Dataset <estat_sts:DataSet> <estat_sts:Series FREQ="M" REF_AREA="IT" ADJUSTMENT="W" STS_INDICATOR="PROD" STS_ACTIVITY="N100DA" STS_INSTITUTION="1" STS_BASE_YEAR="2000" TIME_FORMAT="P1M"> <estat_sts:Obs TIME_PERIOD="200601" OBS_VALUE="101.10" OBS_STATUS="A" OBS_CONF="F" /> </estat_sts:Series> STS_INDICATOR="PROD" STS_ACTIVITY="N100DB" STS_INSTITUTION="1" <estat_sts:Obs TIME_PERIOD="200601" OBS_VALUE="77" OBS_STATUS="A" OBS_CONF="F" /> </estat_sts:DataSet> Once the mapping is finished, the SDMX Data File can be created according to the data of the table expressed with DSD concepts.

Mapping process Evaluation 1. Datasets are stored in hub architecture:
Locally in the central hub system It is not stored in any place Data providers system 2. How can be stored the data and metadata information? Files Databases Both answers are correct 3. In mapping process, what happens if one DSD concept doesn't correspond to any local concept? Values must be added according to the data because they are mandatory in the datafile. Mapping process can continue without this values since they are not needed for the final result. The DSD selected it is not correct since there is not a complete correspondence of the concepts. Any of the answers is correct.

Eurostat Unit B3 – IT and standards for data and metadata exchange

Similar presentations

Presentation on theme: "Eurostat Unit B3 – IT and standards for data and metadata exchange"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Eurostat Unit B3 – IT and standards for data and metadata exchange

Similar presentations

Presentation on theme: "Eurostat Unit B3 – IT and standards for data and metadata exchange"— Presentation transcript:

Similar presentations

About project

Feedback