Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metadata Management and Tools August 1, 2013 Data Curation Course.

Similar presentations


Presentation on theme: "Metadata Management and Tools August 1, 2013 Data Curation Course."— Presentation transcript:

1 Metadata Management and Tools August 1, 2013 Data Curation Course

2 Outline General information about metadata Metadata and the data life cycle DDI – a specification for documenting social, behavioral and economic data Exercises

3 Defining Metadata Metadata are commonly described as “data about data” Metadata serve as “bridge” between data producer and data user Metadata bring data to life, helping user to interpret and understand data

4 Simple Example Bad Better… Best (Rich, Structured) Best (Rich, Structured)

5 Importance of Metadata John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the issues in using secondary data.* http://www.youtube.com/watch?v=xlQMVV7 VJtAhttp://www.youtube.com/watch?v=xlQMVV7 VJtA * Video courtesy of MANTRA Research Data Management Training -- http://datalib.edina.ac.uk/mantra/

6 Concerns About Creating Metadata ConcernSolution workload required to capture accurate robust metadata incorporate metadata creation into data development process – distribute the effort time and resources to create, manage, and maintain metadata include in grant budget and schedule readability / usability of metadata use a standardized metadata format discipline specific information and ontologies ‘profile’ standard to require specific information and use specific values DataONE Education Module: Metadata. DataONE. Retrieved July 19, 2013

7 Metadata Types Types of metadata, by content: * –Descriptive: Intellectual content and contextual information relevant to understanding and interpreting data –Technical: Physical and digital features of a data resource –Structural: Configuration of a resource, connections and relationships among parts, or among related resources * Adapted from Jenn Riley, Seeing Standards: A Visualization of the Metadata UniverseSeeing Standards: A Visualization of the Metadata Universe

8 Metadata and the Data Life Cycle Metadata–driven life cycle: Metadata are created, but also used and reused at every stage of the data life cycle Ideally, metadata continue to accumulate to provide a complete record of the evolution of a dataset

9 Metadata and the Data Life Cycle Rich metadata = smooth life cycle, high quality data

10 Structured Metadata Enhances the value and usability of metadata A consistent, predictable metadata structure enables –More effective searches –Automated management and processing –Resource sharing –Interoperability Standardization leads to greater efficiency

11 Metadata Standards Examples Dublin Core Data Documentation Initiative (DDI) Ecological Markup Language (EML) Astronomy Visualization Darwin Core FGDC Content Standard for Digital Geospatial Metadata (CSDGM) ISO 19115/19139 Geographic information

12 Standards Cartoon courtesy of XKCD.com

13 What is DDI? A metadata standard of and for the community Two major development lines –DDI Codebook –DDI Lifecycle Metadata for both human and machine consumption Additional specifications: –Controlled vocabularies –RDF vocabularies for use with Linked Data

14 DDI Background and History Its development started in the mid-1990s, as a grant-funded effort initiated and organized by ICPSR, with international participation First version published in February 2000

15 Background and History Continued The DDI Alliance was formed in 2003 to support and develop the DDI standardDDI Alliance http://www.ddialliance.org/ Ever-growing number of DDI users; large multinational projects –CESSDA data portal (20 European data archives) –International Household Survey Network – IHSN (developing countries from Africa, Asia, former Soviet Union, and more recently, Latin America)

16 DDI Members and Projects Worldwide

17 DDI Specification The first versions of DDI (1.0 through 2.1) were document- and codebook-centric Version 3.0 was published in April 2008 to document the data life cycle

18 RDF Vocabularies for Semantic Web DDI-RDF Discovery Vocabulary o For publishing metadata about datasets into the Web of Linked Data o Based on DDI Codebook and DDI Lifecycle XKOS o RDF vocabulary for describing statistical classifications, which is an extension of the popular SKOS vocabulary Publication expected in second half of 2013

19 DDI of the Future Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions Complete data life cycle coverage Broadened focus for new research domains Simpler specification that is easier to understand and use including better documentation

20 Benefits of DDI Approach Rich content (currently over 800 items) Metadata reuse across the life cycle Machine-actionability Data management and curation Support for longitudinal data and comparison

21 Metadata Reuse

22 DDI Alignment with Other Metadata Standards MARC: DDI-C, DDI-L Dublin Core: DDI-C, DDI-L SDMX (Statistical Data and Metadata Exchange):DDI-L ISO 11179 (Metadata Registries): DDI-L FGDC (Digital Geospatial Metadata): DDI-L ISO 19115 (Geographic Information Metadata): DDI-L PREMIS (Preservation Metadata), METS (Metadata Encoding and Transmission): under consideration

23 DDI-L or DDI-C? DDI-L –Complex data (hierarchical, longitudinal, comparative) –Metadata-driven survey design (building questionnaires) –Multiple languages –Detailed geographic information –Metadata reuse across the data life cycle –Reusable resources: question/concept/variable banks, registries of organizations and individuals, etc.

24 DDI-L or DDI-C? DDI-C –Documentation of simple, survey-type data –Catalog records, involving mainly study-level descriptions (most new features in DDI-L relate to documenting data at item/variable level) Both DDI-C and DDI-L may be used within the same organization ICPSR uses DDI-C but has translation to DDI-L for study-level records

25 DDI-C Structure and Contents DDI-C main sections: 1.Document Description Self-referencing information about the DDI instance at hand. Usually for internal use, not publicly displayed 2.Study Description General information about the study. Input is usually the introductory part of a codebook, describing the study scope, methodology, topical/temporal coverage, etc. In DDI-C this section also includes data access and availability information 3.File Description Describes physical characteristics of data file(s) – name, format, structure, dimensions 4.Data Description Detailed description of each variable, including variable groups if applicable. Special subsection for documenting census-type aggregate data 5.Other (Study Related) Materials References, or contains materials used in the production of the study or useful in the analysis of the data For complete content and Tag Library see http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html

26 Study-level DDI Elements at ICPSR Study ID (Number, DOI) Title, Alternate Title Author/Primary Investigator Bibliographic Citation Funding Information Abstract Keywords/Topic Classification Series Information Geographic Coverage Time Period Covered Time Method Date(s) of Collection Mode of Collection Universe Sampling Unit of Analysis Response Rates Weighting Information Data Type Extent of Processing Access Conditions/Restrictions Version History

27 Study-level DDI at ICPSR Leveraged in several ways o Data discovery -- Forms basis of Solr/Lucene faceted search o Repurposing -- Record is reused across ICPSR’s topical archive sites o Interoperating -- Records shared with Data-PASS, ODESI, and CESSDA archives o Study Overview -- Becomes PDF overview bundled with each download Example: www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103 www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103

28 DDI at ICPSR: Study-level Metadata Editor

29

30 Variable-level DDI elements at ICPSR Variable name and ID Variable label Question text Descriptive variable text Category labels and values (responses) Category statistics (frequencies) Summary statistics Variable format Notes

31 Variable-level DDI at ICPSR Variable-level DDI leveraged in several ways o Search -- Permits search of variables within a dataset/series o Search across ICPSR -- Serves as foundation for Social Science Variables Database o Integration with online analysis o Codebook with frequencies -- Enables generation of PDF documentation Example: http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30 103/datasets/1/variables/Q25 http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30 103/datasets/1/variables/Q25

32 Tools for generating DDI metadata Nesstar Publisher –DDI-C, study, file, and variable level Colectica –DDI-L configuration, study and variable level –Both DDI-C and DDI-L compatible (import and export) –Exports DDI and PDF, HTML, RTF documentation (no need to re-convert to presentation formats) Colectica for Excel

33 Tools continued XCONVERT (SDA Berkeley)XCONVERT –DDI-C, variable level: converts SAS, SPSS, or Stata syntax into DDI-XML, without frequencies StatTransfer (v. 11)StatTransfer –DDI-L, variable level: no frequencies MQDS tool –Exports Blaise to DDI-L to create study documentation

34 Tools continued More DDI tools can be found here: http://www.ddialliance.org/resources/tools

35 Questions?


Download ppt "Metadata Management and Tools August 1, 2013 Data Curation Course."

Similar presentations


Ads by Google