Presentation is loading. Please wait.

Presentation is loading. Please wait.

Petr Elias Czech Statistical Office

Similar presentations


Presentation on theme: "Petr Elias Czech Statistical Office"— Presentation transcript:

1 Petr Elias Czech Statistical Office
WHAT IS METADATA? Petr Elias Czech Statistical Office

2 The main goal of statistics
??? ...to produce statistical data, interpret them and make them available to users.

3 Example of data to be published
Average household size Netherlands Published: January 2012 2,7 Survey: Census 2011 Preliminary results

4 Average household size
Statistical metadata Data about statistical data (OECD definition) Information about data processes (of producing and using data) tools involved (UNECE definition) 2,7 Published: January 2012 Average household size Netherlands Survey: Census 2011 Preliminary results

5 Users of metadata Data providers and interviewers Statisticians
questionnaires Statisticians data processing and analyses End-users search for and understanding of data

6 Metadata coverage (GSBPM*)
* GSBPM = Generic Statistical Business Process Model Source:

7 Categories of metadata
Structural metadata Reference metadata = Identification and description of data Description of the content and the quality of data

8 = Names of columns / dimensions
Structural metadata Must be associated with data = Names of columns / dimensions Necessary for: identification, retrieval and navigation through the data understanding the data from matrixes and data cubes

9 Structural metadata – example
Nights spent by non-EU residents inside EU – per population 2005  2006  2007  2008  2009  2010  EU - 27 427  459  458  441  419  473  Austria 1109  1175  1162  1186  1121  1223  Estonia 287  359  356  374  483  Finland 319  379  417  446  401  422  Germany 220  249  246  251  234  268  Hungary 266  270  253  248  213  243  Italy 726  777  772  734  693  755  Luxembourg 472  497  533  519  447  Netherlands 310  327  329  289  316  Slovakia 134  152  151  135  103  117  Slovenia 481  529  585  653  579  632 

10 Reference metadata „Documentation“ covering: Concepts Methodology
e.g. definitions, practical implementation of concepts Methodology e.g. sampling, collection methods, editing processes Quality e.g. timeliness, accuracy

11 Reference metadata – example
ESMS reference metadata structure used by Eurostat (for Census) Contact organisation organisation unit name person function mail adress address phone number fax number Metadata update last certified last posted last update Statistical presentation data description classification system coverage – sector statistical concepts and definitions statistical unit statistical population reference area coverage – time base period Unit of measure Reference period Institutional mandate legal acts and other agreements data sharing Confidentiality policy data treatment Release policy release calendar release calendar access release policy – user access Frequency of dissemination Dissemination format news release publications online database microdata access other Accessibility of documentation documentation on methodology documentation on quality management Quality management quality assurance assessment Relevance user needs user satisfaction completeness Accuracy overall accuracy sampling error non-sampling error Timeliness and punctuality timeliness punctuality Comparability geographical over time Coherence cross domain internal Cost and burden Data revision practice Statistical processing source data frequency of data collection data collection data validation data compilation adjustment Comment

12 Standardisation of metadata
Exercise 1 Team A Why standardisation? Team B Who develops and implements standards? Team C What can be standardised?

13 Standardisation of metadata – WHY?
Common vocabulary Comparability Data exchange – compatibility Reduction of costs

14 Standardisation of metadata – WHO?
Major players International organisations World: UNECE, OECD, IMF, WCO, WHO, World Bank, BIS... EU: European Commission (Eurostat), ECB... National statistical institutes participation in international standardisation projects

15 Standardisation of metadata – WHAT?
Content code lists & classifications (Neuchâtel model, SDMX) variables Technology file structure & format (XML) editting applications (Metadata handler) transmission standards (SDMX)

16 Statistical unit / person
Common vocabulary Population Survey sample Statistical unit / person ... Statistical measure Total Index Median ... Variables Economic activity Marital status ... Measurement unit Piece % Euro ... Classifications NACE Rev.2 ... Code lists Marital status value

17 Common vocabulary Terminology (1/2)
Population: Population is the total membership or population or "universe" of a defined class of people, objects or events. Target population (= scope of the survey): A target population is the population outlined in the survey - objects about which information is to be sought. Survey population (= coverage of the survey): A survey population is the population from which information can be obtained in the survey. Survey sample: A sample is a subset of a population where elements are selected based on a randomised process with a known probability of selection. Statistical unit: An object of statistical survey and the bearer of statistical characteristics. The statistical unit is the basic unit of statistical observation within a statistical survey. Observation unit: Observation units are those entities on which information is received and statistics are compiled. (e.g. establishment, person) Reporting unit: Reporting units are units that supplie the data for a given survey instance. (e.g. enterprise, person) Analytical unit: Analytical units represent real or artificially constructed units, for which statistics are compiled. (e.g. corporation, person)

18 Common vocabulary Terminology (2/2)
Variable: A variable is a characteristic of a unit being observed that may assume more than one of a set of values to which a numerical measure or a category from a classification can be assigned (e.g. income, age, weight, etc. and "occupation", "industry", "disease”). Measurement unit: A measurement unit has a type (e.g. currency: Euro, …) and provides the level of detail (e.g. Euro, 1000 Euro) for the value of the variable. Classification: A classification is a set of discrete, exhaustive and mutually exclusive observations, which can be assigned to one or more variables to be measured in the collation and/or presentation of data. Code list: A code list is a predefined list from which some statistical concepts (coded concepts) take their values. Statistical measure: A summary (means, mode, total, index etc.) of the individual quantitative variable values for the statistical units in a specific group (study domains).

19 Common vocabulary – Neuchâtel model
Developed by Neuchâtel Group RuN Software Werkstatt

20 Common vocabulary – Neuchâtel model
Purpose to define common language & common perception of the structure of classifications and links among them

21 Common vocabulary – Neuchâtel model
Classification Family Classification Item Classification Level Classification Version Classification Classification Variant Correspondence Table Correspondence Item Classification Index Classif. Index Entry Case Law

22 Common vocabulary – Neuchâtel model
Terminology Classification family: A classification family comprises a number of classifications, which are related from a certain point of view. Classification: Classification describes the ensemble of one or several consecutive classification versions. It is a "name" which serves as an umbrella for the classification version(s). Classification version: A classification version is a list of mutually exclusive categories representing the version-specific values of the classification variable. A classification version has a certain normative status and is valid for a given period of time. Classification level: A classification structure (classification version or classification variant) is composed of one or several levels. In a hierarchical classification the items of each level but the highest are aggregated to the nearest higher level. A linear classification has only one level. Classification variant: A classification variant is based on a classification version. In a variant, the categories of the classification version are split, aggregated or regrouped to provide additions or alternatives to the standard order and structure of the base version. Correspondence table: A correspondence table expresses the relationship between different versions or variants of the same classification or between versions or variants of different classifications. Classification index: A classification index is an ordered list (alphabetical, in code order etc.) of classification index entries. A classification index relates to one particular classification version or variant. Case law: Case law is an agreed assignment of a classification items to a phenomena where it is not easy for users to classify. The aim is to have the standardised explanation of classifications.

23 Common vocabulary – SDMX

24 Common vocabulary – SDMX
Standardisation of Metadata common vocabulary Cross-domain concepts Cross-domain code lists Data structure definitions (structural metadata) Metadata structure definitions (reference metadata) File format (XML) Tools More information –

25 Any questions?


Download ppt "Petr Elias Czech Statistical Office"

Similar presentations


Ads by Google