Petr Elias Czech Statistical Office

Slides:



Advertisements
Similar presentations
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Advertisements

United Nations Statistics Division Principles and concepts of classifications.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
The use and convergence of quality assurance frameworks for international and supranational organisations compiling statistics The European Conference.
Neuchâtel Terminology Model: Classification database object types and their attributes Revision 2013 and its relation to GSIM Prepared by Debra Mair, Tim.
Giovanna Brancato, Marina Signore Istat Work Session on Statistical Metadata (METIS) Metadata and Quality Indicators Reuse for Quality reporting Geneva,
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
European Conference on Quality in Official Statistics (Q2010) 4-6 May 2010, Helsinki, Finland Brancato G., Carbini R., Murgia M., Simeoni G. Istat, Italian.
Metadata management and statistical business process at Statistics Estonia Work Session on Statistical Metadata (Geneva, Switzerland 8-10 May 2013) Kaja.
Quality assurance activities at EUROSTAT CCSA Conference Helsinki, 6-7 May 2010 Martina Hahn, Eurostat.
REFERENCE METADATA FOR DATA TEMPLATE Ales Capek EUROSTAT.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Recent Developments of the OECD Business Tendency and Consumer Opinion Surveys Portal coi/coordination
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
European Conference on Quality in Official Statistics 8-11 July 2008 Mr. Hing-Wang Fung Census and Statistics Department Hong Kong, China (
1 C. ARRIBAS, D. LORCA, A. SALINERO & A. COLMENERO Measuring statistical quality at the Spanish National Statistical Institute.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
1 Integration of the Eurostat and ESS Metadata Systems A. Götzfried Head of Unit B6 Eurostat.
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
1 5a. SDMX and reference metadata exchanges Bogdan ZDRENTU Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
General Recommendations on STS Carsten Boldsen Hansen Economic Statistics Section, UNECE UNECE Workshop on Short-Term Statistics (STS) and Seasonal Adjustment.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
Reference metadata: a step towards greater accessibility and clarity of statistical data European conference on quality in official statistics 2-5 June.
13 November, 2014 Seminar on Quality Reports QUALITY REPORTS EXPERIENCE OF STATISTICS LITHUANIA Nadiežda Alejeva Head, Price Statistics.
National Bureau of Statistics of the Republic of Moldova 1 High Level Seminar for Eastern Europe, Caucasus and Central Asia Countries (EECCA) on 'Quality.
M O N T E N E G R O Negotiating Team for Accession of Montenegro to the European Union Working Group for Chapter 18 – Statistics Bilateral screening: Chapter.
Quality declarations Study visit from Ukraine 19. March 2015
Exploratory Research Design: Secondary Data. 4-2 Primary vs. Secondary Data Primary data are originated by a researcher for the specific purpose of addressing.
Implementation of Quality indicators for administrative data
Prepared by: Galya STATEVA, Chief expert
Quality assurance in official statistics
WORKSHOP GROUP ON QUALITY IN STATISTICS
Exchanging Reference Metadata using SDMX
4.1. Data Quality 1.
SDMX Information Model
Documentation of statistics
MSDs and combined metadata reporting
Measuring Data Quality and Compilation of Metadata
Working Group on Population and Housing Censuses
Structural Business Statistics Data reporting to Eurostat, transmission format and tools ESTP course, SBS module 13 March 2013.
The new metadata structure & Country Specific Notes
Quality report contents: Conceptual and methodological metadata
2. An overview of SDMX (What is SDMX? Part I)
2. An overview of SDMX (What is SDMX? Part I)
Modernization of Statistical data processes
Documentation of statistics Metadata
SDMX Information Model: An Introduction
Quality assessment ESTP Training Course “Quality Management and survey Quality Measurement” Rome, 24 – 27 September 2013 Giorgia Simeoni Researcher Unit.
Statistical Information Technology
Sub-Regional Workshop on International Merchandise Trade Statistics Compilation and Export and Import Unit Value Indices 21 – 25 November Guam.
August Götzfried Eurostat unit B 4
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Education and Training Statistics Working Group – 2-3 June 2016
A review of the 2011 census round in the EU, including the successful implementation of a detailed European legal base First meeting of the Technical Coordination.
Mapping Data Production Processes to the GSBPM
Quality Reporting in CBS
Structural Business Statistics
The role of metadata in census data dissemination
SDMX Implementation The National Accounts use case
Metadata on quality of statistical information
2.7 Annex 3 – Quality reports
Work Session on Statistical Metadata (Geneva, Switzerland May 2013)
ESTP course on Statistical Metadata – Introductory course –
Joint UNECE/Eurostat/OECD
ESS conceptual standards for quality reporting
GSIM overview Mauro Scanu ISTAT
Presentation transcript:

Petr Elias Czech Statistical Office WHAT IS METADATA? Petr Elias Czech Statistical Office

The main goal of statistics ??? ...to produce statistical data, interpret them and make them available to users.

Example of data to be published Average household size Netherlands Published: January 2012 2,7 Survey: Census 2011 Preliminary results

Average household size Statistical metadata Data about statistical data (OECD definition) Information about data processes (of producing and using data) tools involved (UNECE definition) 2,7 Published: January 2012 Average household size Netherlands Survey: Census 2011 Preliminary results

Users of metadata Data providers and interviewers Statisticians questionnaires Statisticians data processing and analyses End-users search for and understanding of data

Metadata coverage (GSBPM*) * GSBPM = Generic Statistical Business Process Model Source: http://www.unece.org

Categories of metadata Structural metadata Reference metadata = Identification and description of data Description of the content and the quality of data

= Names of columns / dimensions Structural metadata Must be associated with data = Names of columns / dimensions Necessary for: identification, retrieval and navigation through the data understanding the data from matrixes and data cubes

Structural metadata – example Nights spent by non-EU residents inside EU – per 1.000 population 2005  2006  2007  2008  2009  2010  EU - 27 427  459  458  441  419  473  Austria 1109  1175  1162  1186  1121  1223  Estonia 287  359  356  374  483  Finland 319  379  417  446  401  422  Germany 220  249  246  251  234  268  Hungary 266  270  253  248  213  243  Italy 726  777  772  734  693  755  Luxembourg 472  497  533  519  447  :  Netherlands 310  327  329  289  316  Slovakia 134  152  151  135  103  117  Slovenia 481  529  585  653  579  632 

Reference metadata „Documentation“ covering: Concepts Methodology e.g. definitions, practical implementation of concepts Methodology e.g. sampling, collection methods, editing processes Quality e.g. timeliness, accuracy

Reference metadata – example ESMS reference metadata structure used by Eurostat (for Census) Contact organisation organisation unit name person function mail adress email address phone number fax number Metadata update last certified last posted last update Statistical presentation data description classification system coverage – sector statistical concepts and definitions statistical unit statistical population reference area coverage – time base period Unit of measure Reference period Institutional mandate legal acts and other agreements data sharing Confidentiality policy data treatment Release policy release calendar release calendar access release policy – user access Frequency of dissemination Dissemination format news release publications online database microdata access other Accessibility of documentation documentation on methodology documentation on quality management Quality management quality assurance assessment Relevance user needs user satisfaction completeness Accuracy overall accuracy sampling error non-sampling error Timeliness and punctuality timeliness punctuality Comparability geographical over time Coherence cross domain internal Cost and burden Data revision practice Statistical processing source data frequency of data collection data collection data validation data compilation adjustment Comment

Standardisation of metadata Exercise 1 Team A Why standardisation? Team B Who develops and implements standards? Team C What can be standardised?

Standardisation of metadata – WHY? Common vocabulary Comparability Data exchange – compatibility Reduction of costs

Standardisation of metadata – WHO? Major players International organisations World: UNECE, OECD, IMF, WCO, WHO, World Bank, BIS... EU: European Commission (Eurostat), ECB... National statistical institutes participation in international standardisation projects

Standardisation of metadata – WHAT? Content code lists & classifications (Neuchâtel model, SDMX) variables Technology file structure & format (XML) editting applications (Metadata handler) transmission standards (SDMX)

Statistical unit / person Common vocabulary Population Survey sample Statistical unit / person ... Statistical measure Total Index Median ... Variables Economic activity Marital status ... Measurement unit Piece % Euro ... Classifications NACE Rev.2 ... Code lists Marital status value

Common vocabulary Terminology (1/2) Population: Population is the total membership or population or "universe" of a defined class of people, objects or events. Target population (= scope of the survey): A target population is the population outlined in the survey - objects about which information is to be sought. Survey population (= coverage of the survey): A survey population is the population from which information can be obtained in the survey. Survey sample: A sample is a subset of a population where elements are selected based on a randomised process with a known probability of selection. Statistical unit: An object of statistical survey and the bearer of statistical characteristics. The statistical unit is the basic unit of statistical observation within a statistical survey. Observation unit: Observation units are those entities on which information is received and statistics are compiled. (e.g. establishment, person) Reporting unit: Reporting units are units that supplie the data for a given survey instance. (e.g. enterprise, person) Analytical unit: Analytical units represent real or artificially constructed units, for which statistics are compiled. (e.g. corporation, person)

Common vocabulary Terminology (2/2) Variable: A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (e.g. income, age, weight, etc. and "occupation", "industry", "disease”). Measurement unit: A measurement unit has a type (e.g. currency: Euro, …) and provides the level of detail (e.g. Euro, 1000 Euro) for the value of the variable. Classification: A classification is a set of discrete, exhaustive and mutually exclusive observations, which can be assigned to one or more variables to be measured in the collation and/or presentation of data. Code list: A code list is a predefined list from which some statistical concepts (coded concepts) take their values. Statistical measure: A summary (means, mode, total, index etc.) of the individual quantitative variable values for the statistical units in a specific group (study domains).

Common vocabulary – Neuchâtel model Developed by Neuchâtel Group RuN Software Werkstatt

Common vocabulary – Neuchâtel model Purpose to define common language & common perception of the structure of classifications and links among them

Common vocabulary – Neuchâtel model Classification Family Classification Item Classification Level Classification Version Classification Classification Variant Correspondence Table Correspondence Item Classification Index Classif. Index Entry Case Law

Common vocabulary – Neuchâtel model Terminology Classification family: A classification family comprises a number of classifications, which are related from a certain point of view. Classification: Classification describes the ensemble of one or several consecutive classification versions. It is a "name" which serves as an umbrella for the classification version(s). Classification version: A classification version is a list of mutually exclusive categories representing the version-specific values of the classification variable. A classification version has a certain normative status and is valid for a given period of time. Classification level: A classification structure (classification version or classification variant) is composed of one or several levels. In a hierarchical classification the items of each level but the highest are aggregated to the nearest higher level. A linear classification has only one level. Classification variant: A classification variant is based on a classification version. In a variant, the categories of the classification version are split, aggregated or regrouped to provide additions or alternatives to the standard order and structure of the base version. Correspondence table: A correspondence table expresses the relationship between different versions or variants of the same classification or between versions or variants of different classifications. Classification index: A classification index is an ordered list (alphabetical, in code order etc.) of classification index entries. A classification index relates to one particular classification version or variant. Case law: Case law is an agreed assignment of a classification items to a phenomena where it is not easy for users to classify. The aim is to have the standardised explanation of classifications.

Common vocabulary – SDMX

Common vocabulary – SDMX Standardisation of Metadata common vocabulary Cross-domain concepts Cross-domain code lists Data structure definitions (structural metadata) Metadata structure definitions (reference metadata) File format (XML) Tools More information – www.sdmx.org

Any questions?