GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu

Slides:



Advertisements
Similar presentations
ESSnet on SDMX phase II Laura Vignola ISTAT Rome, 3-4 December 2012.
Advertisements

Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Background Data validation, a critical issue for the E.S.S.
CES 2012 Paris 1 High Level Group for Strategic Developments in Business Architecture in Statistics Strategy Gosse van der Veen, Statistics Netherlands.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Using ISO/IEC to Help with Metadata Management Problems Graeme Oakley Australian Bureau of Statistics.
Overview of SDMX: Statistical Data and Metadata eXchange Technical and Content Standards for Statistical Data Ann McPhail, Division Chief Statistics Department,
SDMX AND DATA DISSEMINATION SDMX Training BANK INDONESIA SEPTEMBER 2015 YOGYAKARTA, INDONESIA.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
United Nations Economic Commission for Europe Statistical Division Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova UNECE Work Session.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
Statistics Portugal/ Metadata Unit Monica Isfan « Joint UNECE/ EUROSTAT/ OECD Work Session on Statistical Metadata.
Metadata Architecture at StatCan MSIS 2008 Luxembourg, April 7-9, 2008 Karen Doherty Director General Informatics Branch Statistics Canada.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
United Nations Economic Commission for Europe Statistical Division Introduction to Steven Vale UNECE
Statistical Metadata Strategy and GSIM Implementation in Canada Statistics Canada.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
Model and Representations
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
SDMX IT Tools Introduction
SDMX and Metadata SDMX Basics Course 12 April 2013 Daniel Suranyi Eurostat B5 Management of statistical data and metadata.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
GSIM Mapping to SDMX and DDI: Preliminary Findings and Status Arofan Gregory Metadata Technology METIS, May , Geneva.
Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) APRIL 2006Mar Blanco Frías STATISTICAL METADATA MODEL DEVELOPED IN SPAIN:CURRENT.
7b. SDMX practical use case: Census Hub
OECD Expert Group on Statistical Data and Metadata Exchange (Geneva, May 2007) Update on technical standards, guidelines and tools Metadata Common.
1 Enhancing data quality by using harmonised structural metadata within the European Statistical System A. Götzfried Head of Unit B6 Eurostat.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
A strategy on structural metadata management based on SDMX and the GSIM models Stefania Bergamasco, Alessio Cardacino, Francesco Rizzo, Mauro Scanu, Laura.
Eurostat November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois LEBLANC Christian SEBASTIAN SDMX IT Tools SDMX.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
United Nations Economic Commission for Europe Statistical Division Standards-based Modernisation Steven Vale UNECE
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS Istituto Nazionale di Statistica ITALY.
Topic 2 (ii) Metadata concepts, standards, models and registries
Contents Introducing the GSBPM Links to other standards
The Generic Statistical Information Model (GSIM) and the Sistema Unitario dei Metadati (SUM): state of application of the standard Cecilia Casagrande –
GSIM Implementation at Statistics Finland Session 1: ModernStats World - Where to begin with standards based modernisation? UNECE ModernStats World Workshop.
Metadata Standards for Statistical Classifications
Exchanging Reference Metadata using SDMX
Interoperable data formats: SDMX
SDMX Information Model
SDMX: A brief introduction
ESSnet on SDMX phase II Laura Vignola
Cross-domain concepts
Logical information model LIM Geneva june
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
Draft EP/Council Regulation for processes, standards and
SDMX Information Model: An Introduction
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Contents Introducing the GSBPM Links to other standards
Presentation to SISAI Luxembourg, 12 June 2012
Part B of CMF: Metadata, Standards Concepts and Models Jana Meliskova
The role of metadata in census data dissemination
Generic Statistical Information Model (GSIM)
Petr Elias Czech Statistical Office
The Role of Metadata in Census Data Dissemination
7. Introduction to the main SDMX objects for metadata exchange
Developing SDMX artefacts for data exchange, sharing and dissemination
Standardizing and industrializing a business process – the dissemination use case Alessio Cardacino - ESTP Course “Information standards.
Hands-on GSIM Mauro Scanu ISTAT
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

GSIM implementation in the Istat Metadata System: focus on structural metadata and on the joint use of GSIM and SDMX Mauro Scanu ISTAT – Italian National Institute of Statistics Geneva, UNECE - 5 May 2015

 Istat has been disseminating data and metadata through a Single Exit Point since The SDMX registry was the first centralized structural metadata system in Istat. It consists of structural metadata related to disseminated data.  Since 2010 Istat decided to have a centralized system (SUM) that contains structural metadata for all the data produced by Istat, from data collection up to data dissemination. The system should aim at easing:  data retrieval  metadata harmonization/integration  data traceability  In the meantime GSIM began to be discussed: GSIM claims to be the first internationally endorsed reference framework for statistical information.  The idea was that SUM should be GSIM compliant. How to combine the (mandatory) SDMX infrastructure and a GSIM-compliant metadata system?  In 2013 UNECE organized workgroups on GSIM and other standards. One of them compared GSIM and SDMX (as well as DDI) Background Mauro Scanu – Geneva, UNECE - 5 May 2015

Focus of this talk 1.Relationship between SDMX and GSIM 2.What information should be available for a complete and correct definition of the meaning of data, and how to structure it in a unique way for every theme SUM and GSIM SUM adopted GSIM terminology and definitions. Most of the concepts came from the GSIM groups Concepts and Structure, and some from the Production group Mauro Scanu – Geneva, UNECE - 5 May 2015 Apologies: I do not describe this graph and differences with GSIM

Concept schemes Code list Primary measure Dimensions SDMX and statistics (and GSIM) Mauro Scanu – Geneva, UNECE - 5 May 2015 Data structure definitions Attributes Time dimension Frequency dimension Measure dimension (SDMX version 2.1) Statistical variables Time related concepts Other concept (operative/transformation) Data content Classifications Time related code lists: CL_FREQ,… List of transformation methods: CL_ADJUSTMENT,… List of data contents GSIM COMMENT: GSIM useful for harmonizing DSD’s content SDMX

Data Content 1 Attributes Data content List of data contents GSIM defines everything, but “data content”. The problem is: what is the complete set of information that defines the meaning of a figure in a table In SUM, we introduced the “data content” concept because  it plays the role of the “title” in the old data tables  it is defined according to the GSIM (specification) lines 47-50: Each data is a result of a Process step through the application of a Process method on the necessary Inputs. Hence it is modelled by specifying  Statistical Program and Statistical Program Cycle  Process Step (phase)  Process Method  Inputs Examples: Monthly average household expenditures Household budget survey Time dimension and Freq Dissemination Average Validated sample of the HBS Population: households Num. Variable: monthly expenditures Activity rate Labour Force survey Time dimension and Freq Dissemination Ratio DC1: Active pop DC2: Pop DC1: Active pop DC2: Pop Mauro Scanu – Geneva, UNECE - 5 May 2015

Data Content 2 Attributes Data content List of data contents Data Content feeds the GSIM concept Measure in a Data Structure. We are not aligned with GSIM in this line: “measures correspond to Represented Variables with uncoded Value Domains (Described Value Domains)”  SUM maintains a code list “Data Content” where each item contains all the previous details.  In this way a user can find the meaning of the data in a hypercube in a unique place.  Furthermore a data producer has a form to complete for describing a new data content.  Any data producer should describe the “Data Content” according to the same “model”.  If the “Data Content” item has less information than needed, further dimensions or attributes should be included in a data structure in order to be complete.  If the “Data Content” has more information than needed, some dimensions would become useless.  These two deviations from a standard data content correspond to a content “stove pipe” and to the massive use of mappers. Mauro Scanu – Geneva, UNECE - 5 May 2015

Conclusions Attributes  SDMX has been a real success for the harmonization of the IT infrastructure for the exchange of data and metadata.  Anyway, SDMX is a real puzzle for those who have (only) a statistical background (in terms of concepts, organization of a data structure, …).  GSIM could help a lot a statistician in using SDMX, assigning a concrete statistical role to concepts before their use in a DSD.  The use of GSIM concepts before their use in a DSD helps in harmonizing the description of a data cube.  Among the concepts already available in GSIM, an additional concept (the “Data Content”) could be useful in order to feed in a standard and complete way a Measure of a Data Structure (of macrodata).  This is what we have done in Istat. The corporate DWH (I.Stat) has almost 3000 “data contents”. In SUM it is possible to search data through different facets:  Statistical program  Reference population of the data  Numerical variables used for the production of a data content  Categorical variables used to cross cut data contents  Categories of a categorical variable used in data structures  Furthermore it is easy to reconstruct the relationships between statistical programs (reuse of data for computation of other data)  This year we are including micro data (for the data collection and validation steps) Mauro Scanu – Geneva, UNECE - 5 May 2015

Additional slide – the whole SUM system Almost completed Started 2015 Mauro Scanu – Geneva, UNECE - 5 May 2015