Metadata Issues Underlying the Development of a Data Repository for Evolutionary Biology Sarah Carrier, SILS, Master’s Student Jackson Dube, Visiting Scholar, SILS/MRC Jane Greenberg, Associate Professor, Director SILS/Metadata Research Center, UNC-CH Ruth Monnig, Doctoral Research Assistant, SILS/MRC
Overview 1. Metadata defined 2. Role of metadata in a repository 3. Range of metadata standards 4. Issues 5. Discussion
The Knowledge Network for Biocomplexity (KNB)
Metadata Example for a specimen Family: Pinaceae Species: Pinus serotina Date identified: County: Pasquotank County Location collected: Woodland Border, 2.3 miles north east of Nisonton Collected by: Harry E. Ahles Pinus serotina
Metadata for a Water Quality Study 1. Jordan Lake Study Study Procedure Information 5. The following 15 variables were used to measure water quality over a two-year period. 6. Salinity is described as XXXXX
Metadata Data about the content, quality, condition, and other characteristics of data (FGDC Glossary, 1992) Additional information necessary for data to be useful (Musik, 1997) Resource = data = object = entity = document = data object
Why metadata? Facilitate discovery Permit use – intellectual and technical Manage and preserve Secure Help advance the field of evolutionary biology
Range of published data objects Table, graph Dataset Research methods / procedures –Bayesian inference of phylogeny –Meta-analysis –Computational biology
Metadata continuum FGDC/ CSGSM EML Dublin Core DDI
The Knowledge Network for Biocomplexity (KNB) *
The Knowledge Network for Biocomplexity (KNB) * ontologies Data structures
Issues Cost –More metadata, more cost to produce –Less metadata, cost to users Metadata creation –Who, when, how? –Incentivizing Preservation, sustainability –Data object and associated metadata Open access (“a loaded word”) –What levels of access/rights should be supported
Discussion Topics Range of data objects Granularity (metadata) Users: Needs, greater use Additional issues….
Metadata types and properties Metadata “type”Property, etc. *Resource/data discovery Title, subject ProvenanceCreator, source Terms and condition metadata (intellectual use) Access rights, manipulation rights. Structural metadata (technical use) Software and hardware needs *Resource = data = object = entity = document = data object
Range of metadata standards Schemes (just a few…) LSID TEI Header; MARC bibliographic format, Dublin Core EAD FGDC/CSGSM; NBII EML DDI ODRL (Creative Commons Profile) A Core PREMISCharacteristics Objectives and principles Domains –Environment –Object type/format Architectural Layout –Extent –Level of Complexity Flat, hierarchical –Granularity
Range of metadata standards Data structure standards Data communication standards Data value standards –Content representation, ontologies, authority files Data syntax standards Data models, architectures/packaging