Presentation on theme: "« « CLASSIFICATIONS – a key element in the process of harmonization « Isabel Valente Statistics Portugal/Metadata Unit Work."— Presentation transcript:
« « CLASSIFICATIONS – a key element in the process of harmonization « Isabel Valente (email@example.com)@ine.pt Statistics Portugal/Metadata Unit Work Session on Quality management systems (Q2010) Helsinki– 3 – 6 May, 2010
1 In, Morgado, Isabel, “Metadata and survey documentation Portuguese NSI experience”, European Conference on Quality and Methodology in Official Statistics (Q2004), 24-26, May, 2004, Mainz-Germany. Fig.1 Macro Architecture of the Statistical Metadata System 1
Integrated System of Statistical Classifications (SINE) conceptual model developed by the Neuchâtel group
SINE main phases 2002-2004 - development of the consultation application - replacement of the existing information on classifications in the Portal 2004 – 2005 - enlargement of the information made available - begin the gradual incorporation of code lists - start the development of the management application
SINE main phases 2006-2007 - consolidation of the management application - small adjustments' and improvements in the consultation application Current phase (2008) - consolidation and improvement of the existing model - of harmonization of the existing information
SINE main purposes 1.be a reference base about national, communitarian and international classifications for statistical ends 2.be a reference instrument for the classifications management 3. be an instrument for the harmonization and coordination of classifications
SINE structure Level Item Family Classification Version
Classifications Code lists for observation Code lists for dissemination
What’s the difference between a classification and a code list?
General ideas Classifications more conceptual have a formal base complex structures big dimension system of codification formalized rules about revisions and changes versions are defined Code lists less conceptual don’t have a formal base simple structures small dimension could or not have a system of codification don’t have formalized rules about revisions and changes are not based over the idea of version operational lists of internal use of the institution
Marital status Degree of relationship with the representative of the household Ranks of turnover Size classes of persons employed Sex
What to do? Should those cases be considered classifications or code lists?
Classifications structures which have for base Communitarian or national regulations Methodological manuals Communitarian or international recommendations Reference structures
Consequence The remaining structures (code lists), whenever possible, where approach to those structures Problem encountered Access to the code lists for the dissemination of data in 1st place Access to the classification structure which is part of a recommendation or regulation in 2nd place
Another problem How to distinguish between standard classifications or reference structures from those code lists?
Solution Trying to find distinctive elements in the versions names Norms for the writing of names (naming convention)
General form Main part [+ “,”+formal qualifier] [“+” (“+ informal qualifier +”)”] [+ “-“+ variant n] Qualifier Examples: -Nomenclature of territorial units for statistics, 2002 version -International standard classification of education, 1997 (levels of education) -Types of dwellings (4) Specific form: variant The variant is always the last part of the name and is formed by: “–”+ word “variant” + “variant” number Examples: -CAE Rev.2 (sections C to E) – variant 1 -Classes of net monthly wages (IEFA, €) - variant 1 Constitutent elements of the name version
Rules for the writing of names reference structures keep the original and official name keep the word “nomenclature" or “classification” in the name Informal qualifiers are added to distinguish national classifications from communitarian ones. code lists could or not keep the original name couldn't have the word “nomenclature" or “classification in the name informal qualifiers are added to distinguish the code lists if variants of a reference structure they keep the name or acronym of that classification the names should be general
Another problem Lack of harmonization in the writing form of classifications and code lists as also in its contents
1.Harmonization of the names of –classifications –versions –items labels
Internal rules to SINE for the writing of classifications and versions names Names are initiated by a capital letter, followed by small caps. Exception to that: acronyms, names or words that followed an end point. examples: V00011 - Statistical classification of products by activity in the European Economic Community, 2002 version V00021 -International standard industrial classification of all economic activities, revision 3.1
Internal rules to SINE for the writing of classifications and versions names The names of code lists should use the plural form example: V01610 - Types of primary and lower secondary education Code lists derived from a standard classification have to keep in its own name the acronym or name of the standard classification examples: V01675 - CAE Rev. 3 (total, sections C to N) - variant 2 V01717 - CPA 2008 (legal services) - variant 7
Internal rules to SINE for the writing of classifications and versions names Those code lists have to include the word variant in its name example: V02023 - Activity status (IEFA) - variant 4 Cumulative structures have to include in its name the expression “cumulative” example: –V02069 - Countries (cumulative - air transport companies)
Internal rules to SINE for the writing of classifications and versions names The items labels should be in its extensive form. Abbreviations should be avoided. Exception to that: acronyms or names. Items labels are initiated by a capital letter, followed by small caps.
Problems with the names People give different names to the same things according with the perspective that is followed We should harmonize the expressions used avoiding to name the same things in a differently way
Problems with the names Types of flow Type of rail freight traffic Type of movement in port Type of traffic on the enterprise VersionCodeLabel 00811TTotal 008111National 008112International
Problems with the names However when we have too many versions of the same classification we need elements to distinguish between them.
Lists of countries compulsory harmonization of codes and labels of the items according with the Norm ISO alpha 2. the names of countries in Portuguese must be in accordance with the version approved by the Statistical Council. groupings of countries used in code lists had been centrally created and managed in order to establish a consistent and harmonized base of reference for this end. codes are always independent of the used language so they remain unchangeable in translations.
Activities or products code lists code lists derived from standard classifications had to keep codes and labels equal to those ones when equal. if different should have different codes and labels. for the aggregation of consecutive categories, codes are connected by a hyphen (i.e.: C-D). for the aggregation of non-consecutive categories connection is done by the particle “+” (i.e.: A+C).
Other code lists In code lists that integrate the same classification and without a standard classification for reference is tried to find the structure that is more including. Once found that structure it passes to be the reference structure. New code lists that appear are approached to that structure.
V00253 - Activity status, 2005 CodeLabel 1Actives 11Employed 12Unemployed 121Unemployed seeking first job 122Unemployed seeking new job 2Inactives 21Pupils/students 22Homemakers 23Retired 24Permanent disabled for work 25Others
Other code lists For other code lists where it is not possible to find a standard and in which the categories little varied is promoted to keep unchangeable the codes and labels for the categories that where kept unchangeable.
Use in code lists of certain codes for certain situations total codified with T residual values preferential with 9, or finished in 9 promoted the use of codes and labels of structures already inserted in SINE in detriment of new codifications and formularizations.
Age groups ONU, Standard international age classification five-year and ten year age groups, with the boundaries generally beginning at multiples of five and ten and ending at four and nine ages separated by a hyphen, preceded and followed by a space, thus simplifying the use of particles and becoming them more generalist
Other size classes consecutive classes should be explicitly clear, so they should not repeat equal values in different classes in all items should be explicit what is the target of quantification (i.e.: years, euro, person, etc.). minimum and maximum thresholds should use normalized expressions: –In the lower class “Less than” (i.e.: Less than 30 years). –In the higher class “and more” following the last value immediately used (i.e.: 65 and more years). –The signals “ ”, “≤” and” ≥” should not be used
Other size classes numerical values higher than the thousand have to be separated by a space in order to make the reading between hundreds, thousands, tens of thousands, millions, etc., easier (10 000 000) or alternatively be adopt in its substitution powers of 10 (10 6 )
Conclusions SINE gave to know what exist about classifications widened the term to code lists make classifications structures available: –in a normalized format –in an easy way –at any time –in accordance with the users needs
Conclusions Because of that it was possible: –the detection and correction of errors of writing –harmonization in the form of writing of codes and labels –to implement some harmonization procedures and rules –to improve the clarity and the precision of the terms used –to improve the integration between code lists and standard classifications –harmonization of codes and labels between code lists –reduction of the number of code lists needed by the creation of generic and transversal structures –Time profits –Bigger integration between the different metadata subsystems
Conclusions Classifications systems are a key element for the improvement of the quality and coherence of the existing metadata the existing information