Presentation is loading. Please wait.

Presentation is loading. Please wait.

GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu

Similar presentations


Presentation on theme: "GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu"— Presentation transcript:

1 GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu (scanu@istat.it)scanu@istat.it ISTAT – Italian National Institute of Statistics Geneva, UNECE - 5 May 2015

2  Istat has been disseminating data and metadata through a Single Exit Point since 2009. The SDMX registry was the first centralized structural metadata system in Istat. It consists of structural metadata related to disseminated data.  Since 2010 Istat decided to have a centralized system (SUM) that contains structural metadata for all the data produced by Istat, from data collection up to data dissemination. The system should aim at easing:  data retrieval  metadata harmonization/integration  data traceability  In the meantime GSIM began to be discussed: GSIM claims to be the first internationally endorsed reference framework for statistical information.  The idea was that SUM should be GSIM compliant. How to combine the (mandatory) SDMX infrastructure and a GSIM-compliant metadata system?  In 2013 UNECE organized workgroups on GSIM and other standards. One of them compared GSIM and SDMX (as well as DDI) Background Mauro Scanu – Geneva, UNECE - 5 May 2015

3 Focus of this talk 1.Relationship between SDMX and GSIM 2.What information should be available for a complete and correct definition of the meaning of data, and how to structure it in a unique way for every theme SUM and GSIM SUM adopted GSIM terminology and definitions. Most of the concepts came from the GSIM groups Concepts and Structure, and some from the Production group Mauro Scanu – Geneva, UNECE - 5 May 2015 Apologies: I do not describe this graph and differences with GSIM

4 Concept schemes Code list Primary measure Dimensions SDMX and statistics (and GSIM) Mauro Scanu – Geneva, UNECE - 5 May 2015 Data structure definitions Attributes Time dimension Frequency dimension Measure dimension (SDMX version 2.1) Statistical variables Time related concepts Other concept (operative/transformation) Data content Classifications Time related code lists: CL_FREQ,… List of transformation methods: CL_ADJUSTMENT,… List of data contents GSIM COMMENT: GSIM useful for harmonizing DSD’s content SDMX

5 Data Content 1 Attributes Data content List of data contents GSIM defines everything, but “data content”. The problem is: what is the complete set of information that defines the meaning of a figure in a table In SUM, we introduced the “data content” concept because  it plays the role of the “title” in the old data tables  it is defined according to the GSIM (specification) lines 47-50: Each data is a result of a Process step through the application of a Process method on the necessary Inputs. Hence it is modelled by specifying  Statistical Program and Statistical Program Cycle  Process Step (phase)  Process Method  Inputs Examples: Monthly average household expenditures Household budget survey Time dimension and Freq Dissemination Average Validated sample of the HBS Population: households Num. Variable: monthly expenditures Activity rate Labour Force survey Time dimension and Freq Dissemination Ratio DC1: Active pop DC2: Pop DC1: Active pop DC2: Pop Mauro Scanu – Geneva, UNECE - 5 May 2015

6 Data Content 2 Attributes Data content List of data contents Data Content feeds the GSIM concept Measure in a Data Structure. We are not aligned with GSIM in this line: “measures correspond to Represented Variables with uncoded Value Domains (Described Value Domains)”  SUM maintains a code list “Data Content” where each item contains all the previous details.  In this way a user can find the meaning of the data in a hypercube in a unique place.  Furthermore a data producer has a form to complete for describing a new data content.  Any data producer should describe the “Data Content” according to the same “model”.  If the “Data Content” item has less information than needed, further dimensions or attributes should be included in a data structure in order to be complete.  If the “Data Content” has more information than needed, some dimensions would become useless.  These two deviations from a standard data content correspond to a content “stove pipe” and to the massive use of mappers. Mauro Scanu – Geneva, UNECE - 5 May 2015

7 Conclusions Attributes  SDMX has been a real success for the harmonization of the IT infrastructure for the exchange of data and metadata.  Anyway, SDMX is a real puzzle for those who have (only) a statistical background (in terms of concepts, organization of a data structure, …).  GSIM could help a lot a statistician in using SDMX, assigning a concrete statistical role to concepts before their use in a DSD.  The use of GSIM concepts before their use in a DSD helps in harmonizing the description of a data cube.  Among the concepts already available in GSIM, an additional concept (the “Data Content”) could be useful in order to feed in a standard and complete way a Measure of a Data Structure (of macrodata).  This is what we have done in Istat. The corporate DWH (I.Stat) has almost 3000 “data contents”. In SUM it is possible to search data through different facets:  Statistical program  Reference population of the data  Numerical variables used for the production of a data content  Categorical variables used to cross cut data contents  Categories of a categorical variable used in data structures  Furthermore it is easy to reconstruct the relationships between statistical programs (reuse of data for computation of other data)  This year we are including micro data (for the data collection and validation steps) Mauro Scanu – Geneva, UNECE - 5 May 2015

8

9 Additional slide – the whole SUM system Almost completed Started 2015 Mauro Scanu – Geneva, UNECE - 5 May 2015

10


Download ppt "GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu"

Similar presentations


Ads by Google