Workshop on Metadata Standards and Best Practices November 19-20 th, 2007 Session 4 The Data Documentation Initiative Technical Overview Pascal Heus Open.

Slides:



Advertisements
Similar presentations
Workshop on Metadata Standards and Best Practices November 19-20th, 2007 Session 1 Leveraging Metadata Standards in RDC Pascal Heus Open Data Foundation.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Workshop on Metadata Standards and Best Practices November th, 2007 Session 2 Metadata specifications for socio-economic science and supporting initiatives.
11th Annual Federal CASIC Workshops Washington, DC, March 6 - 8, 2007 Session WP4 Metadata challenges and solutions for socio-economic data Pascal Heus.
IZA Data Service Center DDI/SDMX Workshop Wiesbaden, Germany, June 18th 2008 The Data Documentation Initiative (DDI) Arofan Gregory / Pascal Heus
10th Annual Open Forum for Metadata Registries New York, NY, July 9-11, 2007 Track 3 – Future Directions Metadata challenges and solutions for socio-economic.
3rd International Digital Curation Conference Washington, DC, Dec 2007 Paper Presentations: Interoperability, Metadata & Standards Data Documentation Initiative:
ODaF Europe 2008 Colchester, UK, April 14-15, 2008 DDI Landscape Pascal Heus Open Data Foundation
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Status on the Mapping of Metadata Standards
ODaF Europe 2008 Colchester, UK, April 14-15, 2008 Metadata in social science and the Open Data Foundation Pascal Heus Open Data Foundation
ODaF Europe 2009 Virtual Research and Collaborative Center Pascal Heus, Open Data Foundation Tim Mulcahy, National Opinion Research Center
International Household Survey Network (IHSN) Microdata Management Toolkit Trevor Croft MICS3 Data Archiving, Dissemination and Further.
MICS4 Survey Design Workshop Multiple Indicator Cluster Surveys Survey Design Workshop Data Archiving.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
DLI Training Nesstar Workshop
Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Metadata Management at GESIS-ZA Reiner Mauer GESIS – Data Archive and Data Analysis CESSDA-Expert Seminar Odense, September 11th 2008.
Foundational Objects. Areas of coverage Technical objects Foundational objects Lessons learned from review of Use Case content Simple Study Simple Questionnaire.
InterPARES Project Joanne Evans and Lori Lindberg Description Cross-domain Describing and analyzing the recordkeeping capabilities of metadata sets Joanne.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Inside View of DDI Version 3.0: Structural Reform Group Report Presented to IASSIST 25 May 2005 Edinburgh Scotland UK.
StatCat Building a Statistical Data Finder ssrs.yale.edu/statcat Steven Citron-Pousty Ann Green Julie Linden Yale University.
DDI 3.0 Conceptual Model Chris Nelson. Why Have a Model Non syntactic representation of the business domain Useful for identifying common constructs –Identification,
Präsentationstitel IAB-ITM Find the right tags in DDI IASSIST 2009, 27th-30th Mai 2009 IAB-ITM Finding the Right Tags in DDI 3.0: A Beginner's Experience.
Codebook Centric to Life-Cycle Centric In the beginning….
Managing the Metadata Lifecycle The Future of DDI at GESIS and ICPSR Peter Granda, ICPSR Meinhard Moschner, GESIS Mary Vardigan, ICPSR Joachim Wackerow,
Reducing Metadata Objects Dan Gillman November 14, 2014.
 Name and organization  Have you worked with DDI before? (2 or 3)  If not, are you familiar with XML?  What kind of CAI systems do you use?  Goals.
The International Household Survey Network IHSN IHSN Secretariat PARIS21 Steering Committee, 14 November 2007.
Modernizing the Data Documentation Initiative (DDI-4) Dan Gillman, Bureau of Labor Statistics Arofan Gregory, Open Data Foundation WICS, 5-7 May 2015.
IPUMS to IHSN: Leveraging structured metadata for discovering multi-national census and survey data Wendy L. Thomas 4 th Conference of the European Survey.
ISO as the metadata standard for Statistics South Africa
Data Documentation Initiative (DDI): Goals and Benefits Mary Vardigan Director, DDI Alliance.
ESCWA SDMX Workshop Session: Role in the Statistical Lifecycle and Relationship with DDI (Data Documentation Initiative)
Overview of DDI Arofan Gregory METIS October 5-7, 2011.
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
DDI: Capturing metadata throughout the research process for preservation and discovery Wendy Thomas NADDI 2012 University of Kansas.
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
DDI 3.0 Overview Sanda Ionescu, ICPSR. DDI Background Development History 1995 – A grant-funded project initiated and organized by ICPSR proposes to create.
DDI-RDF Discovery Vocabulary A Metadata Vocabulary for Documenting Research and Survey Data Linked Data on the Web (LDOW 2013) Thomas Bosch.
4 April 2007METIS Work Session1 Metadata Standards and Their Support of Data Management Needs Daniel W. Gillman Bureau of Labor Statistics Paul Johanis.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
SDMX Standards Relationships to ISO/IEC 11179/CMR Arofan Gregory Chris Nelson Joint UNECE/Eurostat/OECD workshop on statistical metadata (METIS): Geneva.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
February 17, 1999Open Forum on Metadata Registries 1 Census Corporate Statistical Metadata Registry By Martin V. Appel Daniel W. Gillman Samuel N. Highsmith,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia, September, 2011 Documentation and Cataloguing in Data.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Evolution of Data Documentation Providing Social Science Data Services Jim Jacobs, 2008.
Secure Epidemiology Research Platform (SERPent) Kick Start Meeting - April 15 th, 2010 Pascal Heus
DDI and the Lifecycle of Longitudinal Surveys Larry Hoyle, IPSR, Univ. of Kansas Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
SDMX IT Tools Introduction
TIC Updates EDDI 2010 Wendy Thomas – 6 Dec Schedule and Process Changes Production schedule is moving to: – Summer / Winter release schedule January.
DDI and GSIM – Impacts, Context, and Future Possibilities
The Re3gistry software and the INSPIRE Registry
Enhancing ICPSR metadata with DDI-Lifecycle
2. An overview of SDMX (What is SDMX? Part I)
Updates on the XSLT stylesheets for DDI
Arofan Gregory METIS October 5-7, 2011
Capitalising on Metadata
The role of metadata in census data dissemination
DDI and GSIM – Impacts, Context, and Future Possibilities
Presentation transcript:

Workshop on Metadata Standards and Best Practices November th, 2007 Session 4 The Data Documentation Initiative Technical Overview Pascal Heus Open Data Foundation

Open Data Foundation – IZA 2007/11 Outline XML refresher DDI Background DDI 1/2.x –Status / Tools DDI 3.0 –Use cases –Need for tools Conclusions / Q&A Thanks to the DDI Alliance and GESIS for slides on DDI

Data Foundation – IZA 2007/11 XML to the rescue! XML is driving todays web service oriented architecture of the Internet and Intranets Using XML, we can capture, structure, transform, discover, exchange, query, edit and secure metadata and data XML is platform & language independent and can be used by everyone XML is both machine and human readable XML is non-proprietary, public domain and many open tools exist Domain specific standards are available!

Data Foundation – IZA 2007/11 XML Technical Overview Capture XML Structure DTD XSchema Transform XSL, XSLT XSL-FO Discover Registries Databases Exchange Web Services SOAP REST Search XPath XQuery Manage Software XForms

Data Foundation – IZA 2007/11 What is DDI? The Data Documentation Initiative (DDI) is an XML format for capturing metadata about survey data and register data Data files may remain in their native formats (ASCII files which may be delimited or fixed-width) or may be captured as XML It used to be designed to describe codebooks, and was mainly useful for data archives and libraries –Versions 1.*/2.* Now, it can be used for any type of data collection –Version 3.0 –Focus on survey instrumentation and microdata, but also can describe aggregates

Data Foundation – IZA 2007/11 Background Concept of DDI and definition of needs grew out of the data archival community Established in 1995 as a grant funded project initiated and organized by ICPSR Members: –Social Science Data Archives (US, Canada, Europe) –Statistical data producers (including US Bureau of the Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada) February 2003 – Formation of DDI Alliance –Membership-based alliance –Formalized development procedures

Data Foundation – IZA 2007/11 Growth of the DDI Structure 2000 – DDI 1.0 –Simple survey –Archival data formats –Microdata only 2003 – DDI 2.0 –Aggregate data (based on matrix structure) –Added geographic material to aid geographic search systems and GIS users

Data Foundation – IZA 2007/11 Characteristics of DDI 1.0/2.0 Focuses on the static object of a codebook Designed for limited uses –End user data discovery via the variable or high level study identification (bibliographic) –Only heavily structured content relates to information used to drive statistical analysis Coverage is focused on single study, single data file, simple survey and aggregate data files Variable contains majority of information (question, categories, data typing, physical storage information, statistics)

Data Foundation – IZA 2007/11 1/2.x Schema Organized in 5 sections –docDsrc: information about the XML document itself: metadata preparation, version, –stdyDscr: detailed information about the survey Title, year, coverage, sampling, data collection/cleaning, quality, contact, access policy, –fileDscr: describes files in the dataset –dataDscr: describes the data structure Variable: name, label, code, summary statistics, definitions, literal question, interviewer instructions, weights, grouping, etc. Cubes: aggregated data –othMat: additional documentation See examples

Data Foundation – IZA 2007/11 Limitations Treated as an add on to the data collection process Focus is on the data end product and end users (static) Limited tools for creation or exploitation The variable must exist before metadata can be created Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process

Data Foundation – IZA 2007/11 Requirements for 3.0 Improve and expand the machine-actionable aspects of the DDI to support programming and software systems Support CAI instruments through expanded description of the questionnaire (content and question flow) Support the description of data series (longitudinal surveys, panel studies, recurring waves, etc.) Support comparison, in particular comparison by design but also comparison-after-the fact (harmonization) Improve support for describing complex data files (record and file linkages) Provide improved support for geographic content to facilitate linking to geographic files (shape files, boundary files, etc.)

Data Foundation – IZA 2007/11 Approach Shift from the codebook centric model of early versions of DDI to a lifecycle model –providing metadata support from data study conception through analysis and repurposing of data Shift from an XML Data Type Definition (DTD) to an XML Schema model –to support the lifecycle model, reuse of content and increased controls to support programming needs Redefine a single DDI instance to include a simple instance –similar to DDI 1.*/2.* which covered a single study and complex instances covering groups of related studies. –Allow a single study description to contain multiple data products (for example, a microdata file and aggregate products created from the same data collection). Incorporate the requested functionality in the first published edition

Data Foundation – IZA 2007/11 Designing to support registries Resource package –structure to publish non-study-specific materials for reuse Extracting specified types of information into schemes –Universe, Concept, Category, Code, Question, Instrument, Variable Allowing for either internal or external references Providing comparison mapping –Target can be external harmonized structure

Data Foundation – IZA 2007/11 Relationship to Other Standards Dublin Core –Mapping of citation elements –Option for DC namespace basic entry ISO – Geography –Search requirements –Support for GIS users METS –Designed to support profile development OAIS –Reference model for the archival lifecycle SDMX –Completely mapping to and from DDI NCubes –Designed to be used with registries ISO/IEC –Variable linking representation to concept and universe –Optional data element construct in ConceptualComponent that allows for complete ISO/IEC structure as a maintained item

Data Foundation – IZA 2007/11 Development of DDI – Acceptance of a new DDI paradigm –Lifecycle model –Shift from the codebook centric / variable centric model to capturing the lifecycle of data –Agreement on expanded areas of coverage 2005 –Presentation of schema structure –Focus on points of metadata creation and reuse 2006 –Presentation of first complete 3.0 model –Internal and public review 2007 –Vote to move to Candidate Version –Establishment of a set of use cases to test application and implementation 2008 –March: anticipated vote to publish DDI 3.0

Data Foundation – IZA 2007/11 XML Schemas and 3.0 Modules (one is not necessarily the other) XML Schemas –Each.xsd file is a xml schema –Some xml schemas are modules –Some xml schemas are substitution sets or sub-modules –Some xml schemas simply contain elements that are used by multiple schemas or may require more frequent updates –Some xml schemas are external

Data Foundation – IZA 2007/11 XML Schemas and 3.0 Modules (one is not necessarily the other) Modules –Reflect closely related sets of information similar to the sections of DDI 1.*/2.* DTD –Modules can be held as separate XML instances and be included in a large instance by either inclusion or reference –All modules are maintainable, identifiable packages –Each module has its own XML namespace

Data Foundation – IZA 2007/11 XML SCHEMAS archive comparative conceptualcomponent datacollection dataset dcelements DDIprofile ddi-xhtml11 ddi-xhtml11-model-1 ddi-xhtml11-modules-1 group inline_ncube_recordlayout instance logicalproduct ncube_recordlayout organization physicaldataproduct physicalinstance reusable simpledc studyunit tabular_ncube_recordlayout xml set of xml schemas to support xhtml

Data Foundation – IZA 2007/11 Basic Structure/Organization DDI 3.0 is divided into modules Each contains a set of related metadata Reusable metadata is divided into schemes Modules reflect the steps of the data lifecycle

Data Foundation – IZA 2007/11 DDI 3.0 Modules Main modules are: –Study Unit (contains a simple study description) –Conceptual Component –Data Collection (survey instruments, questions, sources) –Logical Product (concepts, variables, codes, categories) –Physical Storage (describes patterns of storage and physical instances/files) –Archive (organizations and processing events) –Group (comparing and grouping study units) –Comparative (allows for explicit comparisons between grouped studies) See also descriptions.htmlhttp:// descriptions.html

Data Foundation – IZA 2007/11 Maintainable Schemes (thats with an e not an a) Concept Scheme Universe Scheme Question Scheme Control Construct Scheme Category Scheme Code Scheme Variable Scheme Packages of reusable metadata maintained by a single agency

Data Foundation – IZA 2007/11 DDI 3.0 Look at schema (Candidate release 2) Look at examples (prototype XML)

DDI Lifecycle View and Use Cases

Data Foundation – IZA 2007/11 Our Initial Thinking… The metadata payload from version 2.* DDI was re-organized to cover these areas.

Data Foundation – IZA 2007/11 Wrapper For later parts of the lifecycle, metadata is reused heavily from earlier Modules. The discovery and analysis itself creates data and metadata, re- used in future cycles.

Data Foundation – IZA 2007/11 Realizations Many different organizations and individuals are involved throughout this process –This places an emphasis on versioning and exchange between different systems There is potentially a huge amount of metadata reuse throughout an iterative cycle –We needed to make the metadata as reusable as possible Every organization acts as an archive (that is, a maintainer and disseminator) at some point in the lifecycle –When we say archive in DDI 3.0, it refers to this function

Data Foundation – IZA 2007/11 DDI 3.0 Lifecycle Model Metadata Reuse

Data Foundation – IZA 2007/11 Use Cases Study design/survey instrumentation Questionnaire generation/data collection and processing Data recoding, aggregation and other processing Data dissemination/discovery Archival ingestion/metadata value-add Question/concept/variable banks DDI for use within a research project Capture of metadata regarding data use Metadata mining for comparison, etc. Generating instruction packages/presentations

Data Foundation – IZA 2007/11 Study Design/Survey Instrumentation This use case concerns how DDI 3.0 can support the design of studies and survey instrumentation –Without benefit of a question or concept bank

Data Foundation – IZA 2007/11 Questionnaire Generation, Data Collection, and Processing This use case concerns how DDI 3.0 can support the creation of various types of questionnaires/CAI, and the collection and processing of raw data into microdata. Algenta working on DDI 3.0 based software

Data Foundation – IZA 2007/11 Data Recoding, Aggregation, etc. This use case concerns how DDI 3.0 can describe recodes, aggregation, and similar types of data processing. Relevant to both producer and researcher

Data Foundation – IZA 2007/11 Data Dissemination/Data Discovery This use case concerns how DDI 3.0 can support the discovery and dissemination of data. Highly relevant to researchers

Data Foundation – IZA 2007/11 Archival Ingestion and Metadata Value-Add This use case concerns how DDI 3.0 can support the ingest and migration functions of data archives and data libraries.

Data Foundation – IZA 2007/11 Question/Concept/Variable Banks This use case describes how DDI 3.0 can support question, concept, and variable banks. These are often termed registries or metadata repositories because they contain only metadata – links to the data are optional, but provide implied comparability. The focus is metadata reuse. Concept classification very important to researchers

Data Foundation – IZA 2007/11 DDI For Use within a Research Project This use case concerns how DDI 3.0 can support various functions within a research project, from the conception of the study through collection and publication of the resulting data. Direct use in RDC

Data Foundation – IZA 2007/11 Capture of Metadata Regarding Data Use This use case concerns how DDI 3.0 can capture information about how researchers use data, which can then be added to the overall metadata set about the data sources they have accessed. Data use and user feedback crucial to improve overall quality and future data production (relevance)

Data Foundation – IZA 2007/11 Metadata Mining for Comparison, etc. This use case concerns how collections of DDI 3.0 metadata can act as a resource to be explored, providing further insight into the comparability and other features of a collection of data.

Data Foundation – IZA 2007/11 Generating Instruction Packages/Presentations This use case concerns how DDI 3.0 can support automation around the instruction of students and others.

Data Foundation – IZA 2007/11 Tools DDI 1/2.x –IHSN Microdata Management Toolkit ( –Nesstar ( – –Dextris ( DDI 3.0 –Foundation Tools Platform –UKDA DExT –DDI 3.0 Use case –Algenta SurveyWiz –Dextris (

Data Foundation – IZA 2007/11 DDI for RDC A small set of DDI 1/2.x tools are available today –Users can generate from internal databases, use conversion utilities (see DDI web site) or software like Nesstar Publisher and the IHSN Microdata Management Toolkit DDI 3.0 has a much broader scope and provides both core and advanced functionalities that will require management tools –Next generation metadata framework is being build as the standard is begin finalized –The DDI Foundation Tools Program is an umbrella for implementers startup toolkit

Data Foundation – IZA 2007/11 Conclusions The first generation of DDI is suitable for data archives interested in the preservation of metadata and discovery by users DDI 3.0 focus on the entire life cycle of the survey and is suitable for many different uses. –More relevant to RDC environment DDI 3.0 calls for coordinated efforts for building relevant tools for producers, archives, researchers and other users