Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to DDI Mogens Grosen Nielsen,

Similar presentations


Presentation on theme: "Introduction to DDI Mogens Grosen Nielsen,"— Presentation transcript:

1 Introduction to DDI Mogens Grosen Nielsen,
Statistics Denmark, Alessio Cardacino ISTAT, ESTP Training Course “Information standards and technologies for describing, exchanging and disseminating data and metadata” Rome, 19–22 June 2018 THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION

2 Agenda Part 1 Objectives and program
Metadata on stastistical information and processes Vision and strategy Users, principles and architecture Introduction to DDI Part 2 DDI use cases: DDI used for questionnaires DDI used to describe a unit dataset DDI used for Cubes DDI used for editing and reporting quality

3 The Vision Statistical information must help users in the “turbulent information-sea” Metadata about content and quality must a) help users in their knowledge processes b) give users precise information about our products International standards and standard software must enable a) Cost efficient solution with few resources b) Sustainable long term solutions c) Common terminology

4 Strategy on quality and metadata
Fulfil user-needs, comply with quality requirements and increased efficiency Principles: a) Metadata integrated into GSBPM, b) reuse of metadata c) metadata used actively Standards: GSBPM, GSIM, DDI, SDMX/SIMS

5 Reusable and active metadata
Active use and reuse of metadata requires improved understanding of the role of metadata in relation to users metadata in relation to production processes metadata-terminology

6 Users of a Statistical Metadata System

7 Business Principles – Code of Practice and Quality Assurance Framework
Institutional environment P1: professional independence P2: mandate for data collection P3: adequacy of ressource P4: quality commitment P5: statistical confidentiality P6: impartiality and objectivity. Statistical procedures P7: sound methodology P8: appropriate statistical procedures P9: non-excessive burden on respondents P10: cost effectiveness. Statistical results P11: relevance P12: accuracy and reliability P13: timeliness and punctuality P14: coherence and comparability P15: accessibility and clarity

8 Principle 7. Sound methodology
Indicator 7.1: The overall methodological framework used for European Statistics follows European and other international standards, guidelines, and good practices. Standard methodological document. The methodological framework and the procedures for implementing statistical processes are integrated into a standard methodological document and periodically reviewed. Explanation of divergence from international recommendations. Divergence from existing European and international methodological recommendations are explained and justified.

9 Principle 7. Sound methodology
Indicator 7.2: Procedures are in place to ensure that standard concepts, definitions and classifications are consistently applied throughout the statistical authority Concepts, definitions, and classifications are defined by the Statistical Authority, are applied in accordance with European and/or national legislation and are documented A methodological infrastructure.

10 Principle 7. Sound methodology
Indicator 7.4: Detailed concordance exists between national classifications systems and the corresponding European systems. Consistency of national classifications. National classifications are consistent with the corresponding European classification systems. Correspondence tables. Correspondence tables are documented and kept up-to-date. Explanatory notes or comments are made available to the public.

11 Principle 10 Cost effectiveness
Indicator 10.4: Statistical authorities promote and implement standardized solutions that increase effectiveness and efficiency. Standardization programmes and procedures for statistical processes A strategy to adopt or develop standards. There is a strategy to adopt or develop standards in various fields e.g. quality management, process modeling, software development, software tools, project management and document management.

12 Principle 15. Accessibility and Clarity
Indicator Statistics and the corresponding metadata are presented, and archived, in a form that facilitates proper interpretation and meaningful comparisons. Dissemination policy Consultations of users about dissemination. Training courses for writing interpretations and press releases. A policy for archiving statistics and metadata.

13 Principle 15. Accessibility and Clarity
Indicator Metadata are documented according to standardized metadata systems. Dissemination of statistical results and metadata. Metadata linked to the statistical product Accordance of metadata with European Standards Metadata independent of the format of publication. Procedures to update and publish metadata Ability to clarify metadata issues. Training courses for staff on metadata..

14 Selected business Principles on metadata
Reuse: Reuse metadata where possible for statistical integration as well as efficiency reasons Statistical business process model: Manage metadata with a focus on the overall statistical business process model (GSBPM) Active metadata: Metadata driven production ensures metadata are up-to-date

15 Business goals (for the on-going project)
General purpose to support the modernization and integration of work at EU and national level through the use of GSBPM , GSIM , SDMX and DDI Specific objectives Improve and standardise work Improved metadata system through the use of GSBPM , GSIM , DDI and SDMX Improved exchange of statistical documentation with EU

16 METADATA DISSEMINA-TION
Research portal Edit and use metadata (Subject matter) Edit and use metadata (Customer and research service) Integration in CMS (dst.dk) Solution concept METADATA INTERNAL METADATA DISSEMINA-TION Metadata portal at Statistics Denmark Intranet (internal metadataportal) Integration of metadata in applications Quality reporting to Eurostat

17 Enterprise architec-ture: Users, Business processes, applications and techno-logy

18 Simplified definition of statistical metadata (from SDMX glossary)
Reference metadata: Conceptual metadata (e.g. definition of income) Methodological and processing metadata (e.g. description of data processing) Quality metadata (e.g. Availability) Structural metadata: Metadata act as identifiers and descriptors of the data (e.g. name on variables, dataset etc)

19 Information objects in GSIM

20 Selected information objects from GSIM*)
*) From Standardisation of Variables and Concept Systems in European Social Statistics

21 Selected information objects from GSIM
CONCEPT conceptual domain defined by Consists of Consists of VARIABLE CATEGORY LIST CATEGORY POPULATION UNITTYPE Person defined by defined by defined by Value domain described by REPRESENTED VARIABLE CODELIST representation decribed by representation decribed by representation decribed by Consist of CODE-ELEMENT IDENTIFICATION COMPOENT MEASURE-COMPONENT INSTANCE VARIABLE Cat LOGICAL RECORD UNITDATASET Defined by consists of UNIT-DATASTRUCTURE decribed by Consists of REGISTER

22 Introduction to DDI

23 DDI: Data Documentation Initiative
What is it? Documentation standard, expressed in open XML standard Many years of experience including use in NSI’s Advantages Common language and understanding Integration of concepts, variables, classifications quality Both for schema and register based statistics Model currently used in Australia, New Zealand, Canada etc. (together with SDMX) Tools available

24 Why DDI Reusability in the definition of metadata Referenced metadata
Support to: metadata banks (Questions, Variables, Codelists, Concepts,...) statistical metadata driven processes survey lifecycle statistical information discovery and documentation multilanguage approach in documenting metadata

25 Statistics and DDI in 60 seconds
Study using Survey Instruments made up of measures about Concepts Questions Universes

26 Statistics and DDI in 60 seconds
with values of Categories/ Codes, Numbers Variables Questions Dimensions Measures and attributes collect used for made up of Used for resulting in used for N-Cubes Data Files Responses

27 History Concept of DDI and definition of needs grew out of the data archival community Established in 1995 Members: Social Science Data Archives (US, Canada, Europe) Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance Membership based alliance Formalized development procedures

28 DDI-C and DDI-L DDI has 2 development lines
DDI Codebook (DDI-C) DDI Lifecycle (DDI-L) Both lines will continue to be improved DDI-C focusing just on single study codebook structures DDI-L focusing on a more inclusive lifecycle model and support for machine actionability

29 Early DDI: Characteristics of DDI-C
Focuses on the static object of a codebook Designed for limited uses End user data discovery via the variable or high level study identification (bibliographic) Only heavily structured content relates to information used to drive statistical analysis Coverage is focused on single study, single data file, simple survey and aggregate data files Variable contains majority of information (question, categories, data typing, physical storage information, statistics)

30 Limitations in DDI-C Treated as an “add on” to the data collection process Focus is on the data end product and end users (static) Limited tools for creation or exploitation The Variable must exist before metadata can be created Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process

31 DDI-L: Designed for Modern Metadata Systems
DDI-L was designed to meet a broad set of requirements typical of modern practices for metadata management and use These practices involve: Centralization of metadata systems (registries, repositories) Emphasis on reuse of metadata for consistency and quality Leveraging metadata assets using “metadata-driven” systems and processes

32 DDI-L: model From DDI Alliance

33 Types of metadata in DDI-L
Metadatatypes: Concepts (“terms”) Studies (“surveys”, “collections”, “data sets”, “samples”, “censuses”, “trials”, “experiments”, etc.) Variables – instance, represented and conceptual (“data elements”, “columns”) Codes & categories (“classifications”, “codelists”) Universes (“populations”, “samples”) N-Cube (“cubes”, “matrices”) Data files (“data sets”, “databases”) For questionnaires Survey instruments (“questionnaire”, “form”) Questions (“observations”) Responses

34 Identification, versioning and maintainability
Identification and versioning a prerequisite for active use and reuse of metadata DDI (and SDMX) follow ISO 11179 All items has a global unique identifier composed of 1) Agency Identifier 2) Item Identifier and 3) Item Version E.g. Codelist: Agency: ‘dk.dst’ a a GUID (Global Unique Identifier) and a version

35 DDI in Colectica - a Glance

36 Use cases Use case study 1: How to build metadata for a simple questionnaire Use case study 2: How build metadata for a unit dataset Use case study 3: How to build metadata and data for an aggregated dataset using N-cube Use case study 4: How metadata can be used to support work on quality-reporting

37 Questions and discussion

38 Use case 1: metadata for implementing a questionnaire
Credits: input from meeting at Eurostat July 2014 and part of presentations by Bryan Fitzpatrick and by Colectica

39 Why use metadata for questionnaires?
Define metadata once Generate documentation PDF, Word, HTML Populate CAI systems Out of the box: Blaise, CASES, CSPro, RedCAP, queXML Custom systems: possible with addins

40 Using DDI Metadata for Questionnaires
DDI has metadata for Questions a simple question goes in a Question Item What is your age in years? a complex question goes in a Multiple Question Item Did you do paid work last week? Full Time or Part Time? How many hours? A Multiple Question Item can contain Question Items or other Multiple Question Items

41 Using DDI Metadata for Questionnaires
Questions can link to one or more Concepts to indicate what the question is seeking to cover Age, Sex, Country, Income, Occupation, ... perhaps to qualify what is being covered E.g. Non-farm income, Tertiary qualifications

42 Using DDI Metadata for Questionnaires
Questions have: Name just a multi-lingual name, not used in questionnaires Text the question that is asked can be conditional, multi-lingual, formatted can even have mixed language Question Intent some elaboration about what is being sought multi-lingual, formatted

43 Using DDI Metadata for Questionnaires
Questions have Response Domains what sort of answer is expected or valid Numeric domain can specify integer of decimal, valid formats and ranges, etc Text domain can specify format, length Category Domain valid list of multi-lingual values not really very much use Code Domain valid list of multi-lingual values with codes a classification

44 Using DDI Metadata for Questionnaires
Questions do not go directly into a questionnaire DDI calls a questionnaire an Instrument questions constitute a library available for use a “Question Bank” questions are selected and assembled into an Instrument the assembling of questions is done with Control Constructs an Instrument identifies a single Control Construct that builds the questionnaire

45 Control constructs Control Constructs are the critical component in building a questionnaire they select the questions they control the flow of the questions branching and looping they insert non-question text “Now I want to ask you about other people in the household” they can compute values they link to Interviewer Instructions structured DDI Interviewer Instructions unstructured external interviewer instructions material

46 Control constructs Several types of Control Constructs
Question Construct selects a Question Item or Multiple Question Item Sequence selects a sequence of other control constructs of any type If-Then-Else defines an If condition with optional ElseIf clauses (multiple) and optional Else clause each condition selects a single Control Construct to include

47 Control constructs Several types of Control Constructs
Loop, Repeat-Until, Repeat-While E.g. to loop over people in a household Statement Item inserts non-question multi-lingual text (conditional, formatted) Computation Item a calculation in some language that is assigned to a Variable

48 Instrument Identifies a single Control Construct to assemble the questionnaire probably a Sequence construct Instruments can have multiple Software specifications basically just identifying “software” used with instrument Colectica: generate code for Blaise, Redcap etc

49 Interviewer instructions
A formal DDI metadata type Organised, structured instructions formatted multi-lingual text may be conditional May link to external, non-DDI material E.g. PDF, Word documents

50 Interviewer instructions
A formal DDI metadata type Organised, structured instructions formatted multi-lingual text may be conditional May link to external, non-DDI material E.g. PDF, Word documents

51 Questionnaire template
*from UNECE

52 DDI modelling in practice: study unit
*from UNECE

53 DDI questionnaire modelling in practice: resource package
*from UNECE

54 DDI questionnaire modelling in practice: module and submodule
*from UNECE

55 DDI questionnaire modelling in practice: statements
Comment Instruction *from UNECE

56 DDI questionnaire modelling in practice: statements
Help Warning *from UNECE

57 DDI modelling in practice: statements
Conditional statement *from UNECE

58 DDI modelling in practice: questions with a single response domain
*from UNECE

59 DDI modelling in practice: questions with a multiple response domain
*from UNECE

60 DDI modelling in practice: questions with a single choice
*from UNECE

61 Steps for creating and publishing a questionnaire
Create check-out and go to metadata package Define concepts (i.e. Gender, Age, Education level and Schooltype) Define categories and codes (used as response domains) Create questions and insert reference to response domains Create instrument and insert defined questions in a simple sequence Connect questions to concepts Generate documentation (for survey designer etc) Show in portal Publish survey: Paper form, Blaise etc

62 Use case study 2: metadata for unit dataset
Credits: input from presentations by Colectica from European DDI conference, Copenhagen, 2015

63 Variable cascade in GSIM, DDI and Colectica
ConceptualVariable RepresentedVariable Variable Variable

64 Selected elements from DDI

65 Logical record A Logical record consists of a sequence of Variables that groups data values for a purpose Data from a questionnaire goes into one or more Logical Records. Logical Records can be linked. E.g. Households and Persons Logical Records are independent of any storage or stored format

66 Physical Instance Holds information about actual data sets produced
links to Physical Structures, Record Layouts, and Logical records provides a central management of data from a collection Physical Instance used to manage data

67 Simple classifications and code lists
DDI holds Classifications as linked Code Schemes and Category Schemes a Category Scheme is a list of Categories flat list of multi-lingual names and descriptions e.g., Country names, Occupation names, etc a Code Schemes selects Categories from Category Schemes, assigns a Code (not multi-lingual), and may specify a hierarchy a Code Scheme may select Categories from multiple Category Schemes multiple Code Schemes may select the same Categories

68 GSIM compliant classification
The GSIM Classification model was drawn from the terminology in the Neuchâtel model In 2012, the first GSIM model including classifications was released. Version 1.0. In December 2013, a version 1.1 update to GSIM was released The Neuchâtel model is now an annex to the GSIM model, and released with it

69 Codebook example

70 Codebook example

71 Study description

72 Study Description: NESSTAR Publisher

73 Study Description: NESSTAR Publisher

74 Study Description: NESSTAR Publisher

75 File Description: Variables groups

76 File Description: Variables groups - NESSTAR Publisher

77 Variable description

78 Variable description NESSTAR Publisher

79 Use case 3: DDI used for cubes
Credits: input from presentations by Colectica from European DDI conference, London, 2014

80 3 Dimensional NCube

81 2 Dimensional NCube

82 Properties of an aggregate
Dimensions Measures Attributes Can append footnotes to the aggregate Attach to the overall structure or to individual cells or to groups of cells

83 Ncubes and variables NCube: re-usable definition of an aggregate structure Dimensions ordered list of Variable references Measures List of measures for each intersection of Dimensions Variable reference Type (count, %, mean, etc.) Attributes Attributes that are applicable to re-usable NCube definition

84 Use case 4: exchange of reference metadata

85 Single Integrated Metadata Structure (SIMS) and reporting formats: ESMS and ESQRS

86 Single Integrated Metadata Structure (SIMS) and reporting formats: ESMS and ESQRS
ESMS: European SDMX Metadata Structure Oriented towards Users ESQRS: European standard for Quality Report Structure

87 Where do I find more information about DDI?
DDI-alliance ( Specification: find user-guide, technical documentation guide and online-field documentation and more (both on DDI-L and DDI-C) Tools: find tools searching by purpose, DDI version and availability Training: find use-cases, glossary etc Colectica support ( Find information about colectica tools, how to manage content in Colectica Designer etc.


Download ppt "Introduction to DDI Mogens Grosen Nielsen,"

Similar presentations


Ads by Google