Presentation is loading. Please wait.

Presentation is loading. Please wait.

DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create.

Similar presentations


Presentation on theme: "DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create."— Presentation transcript:

1 DDI 3.0 Overview Sanda Ionescu, ICPSR

2 DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks. A small international group of interested parties is convened at IASSIST in Quebec City to jumpstart the effort.

3 DDI Background Development History 2000 – DDI Version 1.0 published as a mainly document- and codebook-centric standard. Versions 1.0 through 2.1 (2005) – Backwards compatible (additions, but no deletions: all versions validate against 2.1 DTD) – Based on the same structure.

4 DDI Background Development History February 2003 – Formation of the DDI Alliance: – Self-sustaining membership organization – 32 institutional members around the world http://www.ddialliance.org/DDI/org/structure.html#members http://www.ddialliance.org/DDI/org/structure.html#members Ever-growing number of DDI users

5 DDI Alliance Members

6 DDI Background Development History Version 3.0: 2004-2006: Planning and Development November 2006: Internal Review February 2007: Public Review July 2007: Candidate Draft Release April 2008: DDI 3.0 officially published http://www.ddialliance.org/ddi3/index.html

7 Why DDI 3.0? DDI 3.0 presents new features in response to: Perceived needs of: -Data users Need more detailed/contextual information, especially from development stages -Data producers Need better incentives to document their work in DDI -Data archivists/librarians Need to document complex data collections (hierarchical, longitudinal..) Developments in documenting and archiving data Growing interest in meta-resources (metadata registries) Advances in XML technology Development of schemas (began around 2001)

8 DDI 3.0: New Features DDI 3.0 and the Data Life Cycle Model DDI Versions 1/2 were codebook-centric: Closely followed the structure of traditional print codebooks. Captured data documentation at a single, “frozen” point in time – archiving.

9 DDI 3.0 Life Cycle Orientation DDI 3.0 documents all stages in the life cycle of a data collection: pre-production production post-production new research effort secondary use

10 DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: Allows capture and preservation of metadata generated by different agents at different points in time. Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability.

11 DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: Facilitates tracking changes and updates in both data and documentation. Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.

12 DDI 3.0: New Features Modular Structure Version 1/2: -DDI Instance: Single file, hierarchical design Version 3.0: -DDI Instance: Modular design. Building blocks – modules and schemes.

13 DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroupResource Package Study UnitSubgroupStudy UnitSub(Group) Concepts Data Coll. Logical Pr. etc…

14 DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

15 DDI 3.0 – Structural Overview Modules: Document different aspects of a study, or group of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.) Can live independently (have their own schemas) or may be connected to one another within a hierarchical structure.

16 DDI 3.0 – Structural Overview Schemes: Include collections of sibling “objects” that are traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes. Also, individuals and organizations, geographic structures/locations, physical structures, record layouts, etc. Live within modules (do not have their own schemas) but may be referenced / reused as separate entities.

17 DDI 3.0 Schemes * Courtesy of Achim Wackerow, GESIS-ZUMA

18 DDI 3.0: Modular Structure Version 3.0: -Modular design (modules and schemes): -Allows flexibility in organizing the DDI Instance -Supports life cycle model -Facilitates versioning and maintenance -Facilitates reuse -Supports creation of metadata registries -Supports grouping and comparing studies

19 New / Extended Functionalities in DDI 3.0: Questionnaire Versions 1/2: -No instrument coverage. -Question text only as part of variable description. -No documentation for question flow / conditions. Version 3.0: -Full description of instrument as a separate entity. -Documents specific use of questions: flow, conditions, loops. -Compatible with Computer Assisted Interviewing software.

20 New / Extended Functionalities in DDI 3.0: Complex Data Versions 1/2: -Inadequate representation of complex / hierarchical data Version 3.0: -Detailed documentation for complex / hierarchical data Logical structure of records Record Types and Relationships Relevant variables: key-link, case identification, record type locator Physical layout of records Single “hierarchical” file for all records, multiple rectangular files, relational database, etc.

21 New / Extended Functionalities in DDI 3.0: Aggregate Data Versions 1/2: -Initially designed for microdata only -Aggregate data section added in V 2.1 to support limited representation (Census-type data, delimited files) Version 3.0: -Adds support for tabular, spreadsheet-type, representation of aggregate data -Aggregate data transport option: cell content may be included inline with the data item description

22 New / Extended Functionalities in DDI 3.0: Data Transport Versions 1/2: -None Version 3.0: -In-line inclusion enabled for both aggregate data and microdata

23 New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national Data Comparability Versions 1/2: -None Version 3.0: -Grouping structure documents studies related on one or several dimensions (time, geography, language, etc.) as well as their comparability -Use of inheritance increases markup efficiency and simplifies DDI Instances -Relational information is embedded in the inheritance structure, which makes comparison machine-actionable

24 Grouping and Inheritance in DDI 3.0 * Courtesy of Achim Wackerow, GESIS-ZUMA

25 New / Extended Functionalities in DDI 3.0 Comparability Comparison Module: Maps comparable items from two different schemes: – Concepts, variables, coding schemes, categories, questions, universes – More planned in future versions The comparison is one-to-one, from source to target (i.e. harmonized variable) *Courtesy of Achim Wackerow, GESIS-ZUMA

26 New / Extended Functionalities in DDI 3.0 Comparison Module Comparison Module Content: Relationship between source and target items Textual description of the common aspects of the two items Textual description of the differing aspects of the two items Formal description of the relationship according a controlled vocabulary A value between 0 and 1 expressing the degree of commonality A user-defined property defining the correspondence Description of the derivation process (coding schemes) *Courtesy of Achim Wackerow, GESIS-ZUMA

27 New / Extended Functionalities in DDI 3.0: Increased Multilingual Support Versions 1/2: -Limited Version 3.0: -Support for multiple language use and translations Geburtsjahr Year of Birth

28 DDI 3.0 Specification: Schema-based Versions 1/2: -DTD-based Version 3.0: -Schema-based: Data typing supports machine actionability Use of namespaces supports -Modularity -Extensibility and reuse -Alignment with / use of other standards

29 DDI 3.0 Specification: Machine-actionable Versions 1/2: -Machine-readable Version 3.0: -Machine-actionable: 1. Data typing: increased use of controlled vocabularies and standard codes 2. Larger set of required elements Predictable content = a more consistent base for programming

30 DDI 3.0: Alignment with other metadata standards Versions 1/2: -MARC, Dublin Core (bibliographic standards) Version 3.0: -MARC, DC, but also… -SDMX (Statistical Data and Metadata Exchange) -ISO 11179 (Metadata Registries) -FGDC (Digital Geospatial Metadata) -ISO 19115 (Geographic Information Metadata) -PREMIS (Preservation Metadata), METS (Metadata Encoding and Transmission) – forthcoming…

31 DDI 3.0 Producing DDI 3.0 markup and 2.x to 3.0 mapping Practical considerations Modular design = heavy reliance on internal/external referencing Identification information is mandatory for all identifiable elements (not all elements are identifiable) Schema validation does not check for uniqueness of IDs or accuracy of references (needed for XSLT transformations) Considerably larger number of mandatory fields (need to be included in partial mappings) Machine-actionability = improved processing, but instances not so easy to produce: increased content intelligence makes it more challenging to automate production

32 Converting DDI 2.x to 3.0 Practical challenges Limited conversions (selection of fields) are doable. Full conversions (all fields) – more problematic: – DDI 3.0 heavy reliance on reuse vs. DDI 2.x inclusive, declarative structure: agency, variable-level universe statements difficult to map to schemes – Physical representation (and other sections?) may map incompletely, reducing degree of machine- actionability on converted files

33 DDI 3.0 Use Cases Documenting an on-going, original research project (life cycle orientation – first stages) Documenting secondary use of data (life cyle orientation – secondary analysis/repurposing) Creating concept/question/variable banks Generating multiple delivery formats for data dissemination/discovery Metadata mining for comparison, etc.

34 DDI 3.0 to Document an On-going Research Project DDI 3.0 can be used to document a research project in “real time”, from its inception (study proposal, design) through data collection, processing, and initial data production.

35 Purpose Concepts Universe Geography People/Orgs Questions Instrument Data Collection Data Processing Funding Revisions Submitted Proposal $ € £ Archive/ Repository Publication + + + + Variables Physical Stores Principal Investigator Collaborators Research Staff Data

36 DDI 3.0 to Document an On-going Research Project Advantages: Richer, contextual information made available and preserved. Increased accuracy, as life cycle stages are documented “at the source”. No loss of information as study progresses through its life cycle. Changes in documentation preserved through versioning. Ultimately gives data analysts more information to understand and assess data quality.

37 DDI 3.0 to Document an On-going Research Project User-friendly editing/production tools – Best incentive for generating DDI 3.0 instances http://www.icpsr.umich.edu/DDI/ddi3/workshop/ Seefor markup example

38 DDI Editor-Lite “Smart” tool: – Assigns unique IDs in the background – Stores lists of entries and creates references as needed when relevant items are invoked – Can create multiple DDI fields from a single interface entry, as appropriate – Allows editing/deleting entries – Provides prompts to user to ensure valid output (mandatory nature of entries, special formats required for dates, language codes, etc.)

39 DDI 2.x or DDI 3.0? All DDI 1/2 markup will not have to be migrated to Version 3.0. DDI 2.1 will continue to be maintained in the foreseeable future. DDI 3.0 is the obvious immediate choice for documenting complex files, aggregate data, and cross-studies comparability, as well as creating /supporting metadata registries.

40 DDI at ICPSR Currently using DDI 2.1: – Metadata records: study-level descriptions – Hermes output: variables description Codebook Archival copy SSVD for cross-study variable-level searches

41 DDI at ICPSR Moving to DDI 3.0 ? Main factors to consider: – Specific needs – what does DDI 3.0 do for us that DDI 2.x doesn’t? – Available tools – solid production tools are needed in addition to processing tools.

42 Moving to DDI 3.0? Current efforts: – ICPSR stylesheet for metadata record conversion – ICPSR stylesheet for displaying basic DDI 3 markup http://www.icpsr.umich.edu/DDI/ddi3/workshop/ – Variable description mapping DDI 2.1 -> DDI 3.0; will result in conversion stylesheet – SSVD: compatible with both DDI 2.x and DDI 3.0: currently uploads DDI 2.1 files and outputs DDI 3.0 variable descriptions. May ultimately be used as a conversion tool.

43 SRO-ICPSR “Data Documentation and Dissemination” project Common RELATIONAL DATABASE model for data documentation - Compliant with DDI 3.0 - Common RELATIONAL DATABASE model for data documentation - Compliant with DDI 3.0 - Blaise output SAS/SPSS/ Stata files DDI 2.x DDI 3.0 Other… Client Applications…Web Applications… SRO:ICPSR: ICPSR: Variable-level Search ICPSR projects will be able to use documentation generated by SRO projects… MQDS

44 Developing DDI 3.0 tools DDI Foundation Tools Program http://tools.ddialliance.org – DExT – Data Exchange Tools Initiative – StatsProgs2DDI – URN generator – DDI 3.0 parser/validator

45 Questions?


Download ppt "DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create."

Similar presentations


Ads by Google