Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Classifying Scientific Data Objects with Bibliographic Relationship

Similar presentations

Presentation on theme: "“Classifying Scientific Data Objects with Bibliographic Relationship"— Presentation transcript:

1 “Classifying Scientific Data Objects with Bibliographic Relationship
Taxonomies” ~~~~~~ ASIS&T 2007 Annual Meeting Sunday, October 21, 2007 Jane Greenberg, Frances McColl Professor; Director, SILS/Metadata Research Center School of Information and Library Science Univ. of North Carolina at Chapel Hill

2 Overview DRIADE – (Digital Repository of Information and Data for Evolution) Accomplishments: basic metadata challenges Functional requirements, underlying metadata Instantiation Motivation Research design Preliminary results Conclusions -adaptation natural selection

3 Small science, open access
Small science repositories (SSR) Knowledge Network for Biocomplexity (KNB), Marine Metadata Initiative (MMI) Evolutionary biology Publication process Supplementary data (Molecular Biology and Evolution, American Naturalists) “Author,” “deposition date,” not “subject” “species,” ”geo. locator” Data deposition (Genbank, TreeBase, Morphbank) NESCent and SILS/Metadata Research Center ecology, paleontology, population genetics, physiology, systematics + genomics

4 DRIADE’s Goals DRIADE Team
NESCent PI: Todd Vision, Director of Informatics and Associate Professor, Biology, UNC Hilmar Lapp, Assistant Director of Informatics Ryan Scherle, Data Repository Architect UNC/SILS/MRC PI: Jane Greenberg, Associate Professor, SILS and MRC Sarah Carrier, Research Assistant Abbey Thompson, DRIADE R.A./SILS Masters Student Hollie White, Doctoral Fellow Amy Bouck, Biology, Ph.d.student DRIADE’s Goals One-stop deposition and shopping for data objects Support the acquisition, preservation, resource discovery, and reuse of heterogeneous digital datasets Balance a need for low barriers, with higher-level … data synthesis

5 One stop shopping— an option
DRIADE (Dryad) Depositor/s One stop deposition Specialized Repositories Genbank TreeBase Morphbank PaleoDB DRIADE -Data objects supporting published research Journals & journal repositories One stop shopping— an option Researcher/s

6 Accomplishments: basic metadata challenges

7 Accomplishments: addressing basic metadata challenges
Functional requirements Consensus building (Dec. 06) + SSR workshops (May ‘07) First class object, **SHARING & REUSE** Analysis metadata functions in existing repositories (JCDL, 2007) Functional model OAIS (Open Archival Information System), DSpace Level one application profile Namespace schemas: Dublin Core Data Documentation Initiative (DDI) Ecological Metadata Language (EML) PREMIS Darwin Core Modular scheme: Journal citation Data objects (Carrier, et al., 2007)

8 Functional requirements
Project Goals/priorities GBIF KNB NSDL ICPSR MMI Heterogeneous digital datasets Long-term data stewardship Tools and incentives to researchers Minimize technical expertise and time required Intellectual property rights Datasets coupled w/published research

9 <DRIADE application profile>
Bibliographic Citation Module dcterms:bibliographicCitation/Citation information DOI Data Object Module dc:creator/Name* dc:title/Data Set # dc:identifier/Data Set Identifier PREMIS:fixity/(hidden) dc:relation/DOI of Published Article* DDI:<depositr>/Depositor DDI:<contact>/Contact Information dc:rights/Rights Statement dc:description/Description # dc:subject/Keywords dc:coverage / Locality Required* dc:coverage/Date Range Required* dc:software/Software* dc:format/File Format dc:format/File Size dc:date/(Hidden) Required dc:date/Date Modified* Darwin Core: species/ Species, or Scientific* Key * = semi-automatic # = manual Everything else is automatic

10 DRIADE metadata application profile….organic…
Level 1 – initial repository implementation Application profile: Preservation, access/resource discovery, (limited use of CVs) Level 2 – full repository implementation Level 1++ expanded usage, interoperability, preservation; administration; greater use of CV and authority control; data sharing and reuse Level 3 – “next generation” implementation Considering Web 2.0 functionalities, Semantic Web

11 Instantiation (Classifying Scientific Data Objects with Bibliographic Relationships Taxonomies)

12 Motivation DRIADE goals: Data objects = first class objects
Data preservation, sharing, use/re-use, validation, repeatability Is it important to know history of a data object? How can we support accurate and effective tracking of the life-cycle of data object? Data objects = first class objects Data structures as works (Coleman, 2002) Units of analysis, intellectual products of activity Work = “propositions expressed (ideational content)”, and “expressions of the propositions” (Smiraglia, 2001) Data objects are content carriers (Greenberg, 2007)

13 Motivation Research and…to explain a“work”; bibliographic families; and instantiation Metadata and Bibliographic Control Models FRBR (Functional Requirements for Bibliographic Records) DCAM (Dublin Core Abstract Model) RDF (Resource Description Framework) Coleman (2002) Cutter (1904) Leaser (1999) Ranganathan: embodied/expressed Smiraglia (1999,2001, 2002,etc.) Tillet (1991, 1992) Vellucci (1995) Wilson (1983) Yee (1995)

14 Research design Research questions Method
Objective: Build an open access repository that accurately reflects the “disciplinary knowledge structures” (relationship among data objects”) Research questions Do scientists (developing scientists) view data objects as works? Do they understand different “instantiations”? Do they think instantiation tracking is important? Method Instantiation identification test and survey Participants: Scientists, research and publication

15 Procedures, etc. Verify participant suitability, contextual introduction Read instantiation definitions and view instantiation diagram (next slide) Example Instantiation scenario Scenario: Sherry collects data on the survival and growth of the plant Borrichia frutescens (the bushy seaside tansy)… Question: What is the relationship between Sherry’s paper data sheet and her excel spreadsheet? Answer: Equivalent | Derivative | Whole-part | Sequential (circle one) 8 scenarios (randomized); survey

16 Data object relationships
Equivalence Derivative Whole-part Sequential B (=data set A annotated) A (=data set) A (=data set in Excel) A (=same data set in SAS) A (=same data set on paper) C (=data set A revised) A1 (=part 1 of a data set) A2 (=part 2 of a data set) A (=data set) A1 (=a subset of A) Smiraglia, Tillet

17 Results 7 participants (Biochemistry, Biology, Botany/Plant Ecology)
3 Faculty/research associates; 21+ yrs. research/pub. 4 Research assistants; 3 (1-3 yrs.), 1 (5 yrs.) Instant./ rel. A B Failing A Failing B Equivalent 100% n/a Derivative 6 of 7 (86%) 1 BS 1 MS Whole/part 1 PhD Sequential 3 of 7 (43%) 2 BS,

18 Results Use of data by others? ht=halftime; s=seldom; na=no answer
Instant./ rel. Created X data before Apply to Work “freq.” Other app. Support last publ. y/n Equivalent 7 6 (86%) 1 (h/t) 6/1 Derivative 5 (71%) 2 (h/t) Whole/part 2 (s) 4/2 (1 na) Sequential 5 3 (43%) 1 (h/t),1(s) 2/3 (2 na) Use of data by others? 1-5, 1 person 6-20, 3 people 21-99, 1 person 100+, 2 people (PhD) How important is it to track data life-cycle (relationships)? 0, not 3 very 4 extremely (2Ph.D, 1 MS, 1 BS)

19 Results (participant comments)
Validity: “Whenever anything changes, there’s the ability to make mistakes” Sheer quantity of data: “We have 30 years worth of data, tracking changes is important” The impact of changes in scientific nomenclature: “As time goes on, datasets must be maintained to reflect current understanding of taxonomy and nomenclature, to allow connection of old data and attributes of that data to be associated correctly with new data and attributes of that data. This is a giant problem.”

20 Conclusions and next steps
Use of data created by others increases over time In general, more seasoned scientists better grasp Sequential data presented the most difficulty (less seasoned sci.) Unanimous support, “very  extremely important” Bibliographic control relationship are applicable to describing the relationships between data objects Next steps: Continue to collect data (30-50) Study additional types of instantiations Develop applications that effectively track data object life-cycle How do we encode these relationships? Manually/automatically? Who should create them? Curator, scientists, collaboratively? Should we maintain a vocabulary? Consider implications for other disciplines

21 Thank you!! Jane Greenberg, Associate Professor, and Director, SILS Metadata Research Center

Download ppt "“Classifying Scientific Data Objects with Bibliographic Relationship"

Similar presentations

Ads by Google