DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:
Status on the Mapping of Metadata Standards
ICPSR-SRO Shared Data Model Project Mary Vardigan Director, DDI Alliance.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Metadata at ICPSR Sanda Ionescu, ICPSR.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Overview of key concepts and features
Inside View of DDI Version 3.0: Structural Reform Group Report Presented to IASSIST 25 May 2005 Edinburgh Scotland UK.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
Demonstration of a Blaise Instrument Documentation System “BlaiseDoc” Gina-Qian Cheung May 25, 2005 Institution for Social Research University of Michigan.
DDI URN Enabling identification and reuse of DDI metadata IDSC of IZA/GESIS/RatSWD Workshop: Persistent Identifiers for the Social Sciences Joachim Wackerow.
InterPARES Project Joanne Evans, School of Information Management and Systems, Monash University Description Cross-domain Description Cross Domain - Metadata.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Putting DDI 3.0 to Work for You!
A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004.
Codebook Centric to Life-Cycle Centric In the beginning….
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Metadata : Setting the Scene or a Basic Introduction Wendy Duff University of Toronto, Faculty of Information Studies.
 Name and organization  Have you worked with DDI before? (2 or 3)  If not, are you familiar with XML?  What kind of CAI systems do you use?  Goals.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
ISO Standards: Status, Tools, Implementations, and Training Standards/David Danko.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
ISO as the metadata standard for Statistics South Africa
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
Statistics New Zealand Classification Management System Andrew Hancock Statistics New Zealand Prepared for 2013 Meeting of the UN Expert Group on International.
DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
Introduction to DDI 3.0 Sanda Ionescu ICPSR CESSDA Expert Seminar, September 2007.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
A CIDOC CRM – compatible metadata model for digital preservation
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
Metadata Management and Tools August 1, 2013 Data Curation Course.
Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
DDI AND EXPERIENCES AT ICPSR Prepared for Expert Seminar Finnish Social Science Data Archive Tampere, Finland September 1-2, 2000.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Slide 1 SDTSSDTS FGDC CWG SDTS Revision Project ANSI INCITS L1 Project to Update SDTS FGDC CWG September 2, 2003.
Tutorial on XML Tag and Schema Registration in an ISO/IEC Metadata Registry Open Forum 2003 on Metadata Registries Tuesday, January 21, 2003; 4:45-5:30.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
TIC Updates EDDI 2010 Wendy Thomas – 6 Dec Schedule and Process Changes Production schedule is moving to: – Summer / Winter release schedule January.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Metadata models to support the statistical cycle: IMDB
Data Management: Documentation & Metadata
Enhancing ICPSR metadata with DDI-Lifecycle
PREMIS Tools and Services
Data Model.
Metadata in Digital Preservation: Setting the Scene
Question Banks, Reusability, and DDI 3.2 (Use Parameters)
Database Design Hacettepe University
in the data production process
Prepared by Peter Boško, Luxembourg June 2012
Semantic Statistics DDI Lifecycle: Moving Forward Outcome of the Recent Workshops in Dagstuhl Joachim Wackerow.
The role of metadata in census data dissemination
Presentation transcript:

DDI 3.0 Overview Sanda Ionescu, ICPSR

DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create a new standard for documenting social science data, to replace OSIRIS tagged codebooks. A small international group of interested parties is convened at IASSIST in Quebec City to jumpstart the effort.

DDI Background Development History 2000 – DDI Version 1.0 published as a mainly document- and codebook-centric standard. Versions 1.0 through 2.1 (2005) – Backwards compatible (additions, but no deletions: all versions validate against 2.1 DTD) – Based on the same structure.

DDI Background Development History February 2003 – Formation of the DDI Alliance: – Self-sustaining membership organization – 32 institutional members around the world Ever-growing number of DDI users

DDI Alliance Members

DDI Background Development History Version 3.0: : Planning and Development November 2006: Internal Review February 2007: Public Review July 2007: Candidate Draft Release April 2008: DDI 3.0 officially published

Why DDI 3.0? DDI 3.0 presents new features in response to: Perceived needs of: -Data users Need more detailed/contextual information, especially from development stages -Data producers Need better incentives to document their work in DDI -Data archivists/librarians Need to document complex data collections (hierarchical, longitudinal..) Developments in documenting and archiving data Growing interest in meta-resources (metadata registries) Advances in XML technology Development of schemas (began around 2001)

DDI 3.0: New Features DDI 3.0 and the Data Life Cycle Model DDI Versions 1/2 were codebook-centric: Closely followed the structure of traditional print codebooks. Captured data documentation at a single, “frozen” point in time – archiving.

DDI 3.0 Life Cycle Orientation DDI 3.0 documents all stages in the life cycle of a data collection: pre-production production post-production new research effort secondary use

DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: Allows capture and preservation of metadata generated by different agents at different points in time. Enables investigators, data collectors and producers to document their work directly in DDI, thus increasing the metadata’s visibility and usability.

DDI 3.0 and the Data Life Cycle Model Advantages of Life Cycle orientation: Facilitates tracking changes and updates in both data and documentation. Benefits data users, who need information from the full data life cycle for optimal discovery, evaluation, interpretation, and re-use of data resources.

DDI 3.0: New Features Modular Structure Version 1/2: -DDI Instance: Single file, hierarchical design Version 3.0: -DDI Instance: Modular design. Building blocks – modules and schemes.

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroupResource Package Study UnitSubgroupStudy UnitSub(Group) Concepts Data Coll. Logical Pr. etc…

DDI Version 3.0 Modules -- Structural Overview -- DDI Instance Study UnitGroup Conceptual Component Data Collection Logical Product Physical Data Product Physical Instance Archive Organizations Conceptual Component Data Collection Logical Product Archive Study UnitGroup Comparative

DDI 3.0 – Structural Overview Modules: Document different aspects of a study, or group of studies, following the data through their life cycle (Conceptual Components, Data Collection, Logical Product, Physical Instance, etc.) Can live independently (have their own schemas) or may be connected to one another within a hierarchical structure.

DDI 3.0 – Structural Overview Schemes: Include collections of sibling “objects” that are traditionally components of a variable description: Concepts, Universes, Questions, Variable Labels and Names, Categories, Codes. Also, individuals and organizations, geographic structures/locations, physical structures, record layouts, etc. Live within modules (do not have their own schemas) but may be referenced / reused as separate entities.

DDI 3.0 Schemes * Courtesy of Achim Wackerow, GESIS-ZUMA

DDI 3.0: Modular Structure Version 3.0: -Modular design (modules and schemes): -Allows flexibility in organizing the DDI Instance -Supports life cycle model -Facilitates versioning and maintenance -Facilitates reuse -Supports creation of metadata registries -Supports grouping and comparing studies

New / Extended Functionalities in DDI 3.0: Questionnaire Versions 1/2: -No instrument coverage. -Question text only as part of variable description. -No documentation for question flow / conditions. Version 3.0: -Full description of instrument as a separate entity. -Documents specific use of questions: flow, conditions, loops. -Compatible with Computer Assisted Interviewing software.

New / Extended Functionalities in DDI 3.0: Complex Data Versions 1/2: -Inadequate representation of complex / hierarchical data Version 3.0: -Detailed documentation for complex / hierarchical data Logical structure of records Record Types and Relationships Relevant variables: key-link, case identification, record type locator Physical layout of records Single “hierarchical” file for all records, multiple rectangular files, relational database, etc.

New / Extended Functionalities in DDI 3.0: Aggregate Data Versions 1/2: -Initially designed for microdata only -Aggregate data section added in V 2.1 to support limited representation (Census-type data, delimited files) Version 3.0: -Adds support for tabular, spreadsheet-type, representation of aggregate data -Aggregate data transport option: cell content may be included inline with the data item description

New / Extended Functionalities in DDI 3.0: Data Transport Versions 1/2: -None Version 3.0: -In-line inclusion enabled for both aggregate data and microdata

New / Extended Functionalities in DDI 3.0: Longitudinal / Time Series / Cross-national Data Comparability Versions 1/2: -None Version 3.0: -Grouping structure documents studies related on one or several dimensions (time, geography, language, etc.) as well as their comparability -Use of inheritance increases markup efficiency and simplifies DDI Instances -Relational information is embedded in the inheritance structure, which makes comparison machine-actionable

Grouping and Inheritance in DDI 3.0 * Courtesy of Achim Wackerow, GESIS-ZUMA

New / Extended Functionalities in DDI 3.0 Comparability Comparison Module: Maps comparable items from two different schemes: – Concepts, variables, coding schemes, categories, questions, universes – More planned in future versions The comparison is one-to-one, from source to target (i.e. harmonized variable) *Courtesy of Achim Wackerow, GESIS-ZUMA

New / Extended Functionalities in DDI 3.0 Comparison Module Comparison Module Content: Relationship between source and target items Textual description of the common aspects of the two items Textual description of the differing aspects of the two items Formal description of the relationship according a controlled vocabulary A value between 0 and 1 expressing the degree of commonality A user-defined property defining the correspondence Description of the derivation process (coding schemes) *Courtesy of Achim Wackerow, GESIS-ZUMA

New / Extended Functionalities in DDI 3.0: Increased Multilingual Support Versions 1/2: -Limited Version 3.0: -Support for multiple language use and translations Geburtsjahr Year of Birth

DDI 3.0 Specification: Schema-based Versions 1/2: -DTD-based Version 3.0: -Schema-based: Data typing supports machine actionability Use of namespaces supports -Modularity -Extensibility and reuse -Alignment with / use of other standards

DDI 3.0 Specification: Machine-actionable Versions 1/2: -Machine-readable Version 3.0: -Machine-actionable: 1. Data typing: increased use of controlled vocabularies and standard codes 2. Larger set of required elements Predictable content = a more consistent base for programming

DDI 3.0: Alignment with other metadata standards Versions 1/2: -MARC, Dublin Core (bibliographic standards) Version 3.0: -MARC, DC, but also… -SDMX (Statistical Data and Metadata Exchange) -ISO (Metadata Registries) -FGDC (Digital Geospatial Metadata) -ISO (Geographic Information Metadata) -PREMIS (Preservation Metadata), METS (Metadata Encoding and Transmission) – forthcoming…

DDI 3.0 Producing DDI 3.0 markup and 2.x to 3.0 mapping Practical considerations Modular design = heavy reliance on internal/external referencing Identification information is mandatory for all identifiable elements (not all elements are identifiable) Schema validation does not check for uniqueness of IDs or accuracy of references (needed for XSLT transformations) Considerably larger number of mandatory fields (need to be included in partial mappings) Machine-actionability = improved processing, but instances not so easy to produce: increased content intelligence makes it more challenging to automate production

Converting DDI 2.x to 3.0 Practical challenges Limited conversions (selection of fields) are doable. Full conversions (all fields) – more problematic: – DDI 3.0 heavy reliance on reuse vs. DDI 2.x inclusive, declarative structure: agency, variable-level universe statements difficult to map to schemes – Physical representation (and other sections?) may map incompletely, reducing degree of machine- actionability on converted files

DDI 3.0 Use Cases Documenting an on-going, original research project (life cycle orientation – first stages) Documenting secondary use of data (life cyle orientation – secondary analysis/repurposing) Creating concept/question/variable banks Generating multiple delivery formats for data dissemination/discovery Metadata mining for comparison, etc.

DDI 3.0 to Document an On-going Research Project DDI 3.0 can be used to document a research project in “real time”, from its inception (study proposal, design) through data collection, processing, and initial data production.

Purpose Concepts Universe Geography People/Orgs Questions Instrument Data Collection Data Processing Funding Revisions Submitted Proposal $ € £ Archive/ Repository Publication Variables Physical Stores Principal Investigator Collaborators Research Staff Data

DDI 3.0 to Document an On-going Research Project Advantages: Richer, contextual information made available and preserved. Increased accuracy, as life cycle stages are documented “at the source”. No loss of information as study progresses through its life cycle. Changes in documentation preserved through versioning. Ultimately gives data analysts more information to understand and assess data quality.

DDI 3.0 to Document an On-going Research Project User-friendly editing/production tools – Best incentive for generating DDI 3.0 instances Seefor markup example

DDI Editor-Lite “Smart” tool: – Assigns unique IDs in the background – Stores lists of entries and creates references as needed when relevant items are invoked – Can create multiple DDI fields from a single interface entry, as appropriate – Allows editing/deleting entries – Provides prompts to user to ensure valid output (mandatory nature of entries, special formats required for dates, language codes, etc.)

DDI 2.x or DDI 3.0? All DDI 1/2 markup will not have to be migrated to Version 3.0. DDI 2.1 will continue to be maintained in the foreseeable future. DDI 3.0 is the obvious immediate choice for documenting complex files, aggregate data, and cross-studies comparability, as well as creating /supporting metadata registries.

DDI at ICPSR Currently using DDI 2.1: – Metadata records: study-level descriptions – Hermes output: variables description Codebook Archival copy SSVD for cross-study variable-level searches

DDI at ICPSR Moving to DDI 3.0 ? Main factors to consider: – Specific needs – what does DDI 3.0 do for us that DDI 2.x doesn’t? – Available tools – solid production tools are needed in addition to processing tools.

Moving to DDI 3.0? Current efforts: – ICPSR stylesheet for metadata record conversion – ICPSR stylesheet for displaying basic DDI 3 markup – Variable description mapping DDI 2.1 -> DDI 3.0; will result in conversion stylesheet – SSVD: compatible with both DDI 2.x and DDI 3.0: currently uploads DDI 2.1 files and outputs DDI 3.0 variable descriptions. May ultimately be used as a conversion tool.

SRO-ICPSR “Data Documentation and Dissemination” project Common RELATIONAL DATABASE model for data documentation - Compliant with DDI Common RELATIONAL DATABASE model for data documentation - Compliant with DDI Blaise output SAS/SPSS/ Stata files DDI 2.x DDI 3.0 Other… Client Applications…Web Applications… SRO:ICPSR: ICPSR: Variable-level Search ICPSR projects will be able to use documentation generated by SRO projects… MQDS

Developing DDI 3.0 tools DDI Foundation Tools Program – DExT – Data Exchange Tools Initiative – StatsProgs2DDI – URN generator – DDI 3.0 parser/validator

Questions?