Making Data Accessible Yolanda Gil USC/ISI February 20, 2015 "To deposit or not to deposit, that is the question - journal.pbio.1001779.g001"

Slides:



Advertisements
Similar presentations
Managing References : Mendeley
Advertisements

Guy McGarva, EDINA National Data Centre Rajendra Bose, DCC and School of Informatics University of Edinburgh Tuesday 15 May 2007 CLADDIER Project Workshop,
CrossRef Linking and Library Users “The vast majority of scholarly journals are now online, and there have been a number of studies of what features scholars.
Tried-and-true: Are you familiar with the website creator from other school projects? Has your teacher recommended this site as one to use? Remember to.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
Reference Management Software Tools Mendeley. Table of Contents: Part A Background/Location Signup/Login Import References Organize (Manage) References.
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
Data citation from the perspective of a scholarly publisher Lyubomir Penev TDWG Data Citation Workshop, New Orleans, Oct 2011 ViBRANT.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
1 Persistent identifiers, long-term access and the DiVA preservation strategy Eva Müller Electronic Publishing Centre Uppsala University Library, Sweden.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Data Citation Index Todd King PDS/PPI UCLA Megan Force Digital Research Analyst - Physical Science Thomson Reuters.
Making sense of doi: /01/503C303E9B551 Digital Object Identifiers DOIs.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Data Publishing Workflows: Strategies and Standards
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Digital Library Architecture and Technology
Presented by DOI Create: TERN as a use-case Siddeswara Guru
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
GLOBAL BIODIVERSITY INFORMATION FACILITY Dr Vishwas Chavan Senior Programme Officer for DIGIT Data Citation Mechanism and.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Internet / Internet Research ACR/TSM 251 Luke E. Reese September 16, 2010.
Data Citation: the next big thing… ?!?! 1 Victoria University 20 Nov
Data sharing & reuse Library – RDM Support Project Basic training course for information specialists.
APARSEN WP22 Identifiers and Citability APARSEN WP22 Identifiers and Citability Some key results Fondazione Rinascimento Digitale Emanuele Bellini, Chiara.
GRAD 521, Research Data Management Winter 2014 – Lecture 14 Amanda L. Whitmire, Asst. Professor.
Advantages and disadvantages of current reference and digital objects linking models in scientific information space Radovan Vrana, M.Sc. Department of.
What can publishers do to support data? Dryad’s perspective STM Annual US Conference - April 22, 2015 Meredith Morovati Executive Director Illustration.
S YCAMORE S CHOLARS ISU Institutional Repository.
Open access & visibility Management Digital Preservation ORA: Purposes.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Joint Declaration of Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman &
Preserving and Sharing Data: Best Practices & Requirements for Selecting a Data Sharing Repository
What is APA Style? Manuscript and documentation format of The American Psychological Association (APA). Specific and in-depth information about APA Style.
VIVO and Scholarly Repositories: Synergistic Opportunities.
BIOL 155 STUDENTS Spring, 2011 California State University, Los Angeles GETTING THE MOST OUT OF THE LIBRARY.
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
NOAA Data Citation Procedural Directive 8 November 2012 DAARWG.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Data Management Plans JENNIFER L. THOEGERSEN, DATA CURATION LIBRARIAN NURAMP WORKSHOP SERIES OCTOBER 8, 2015.
Guidance and Training for School Admin Teams FINDING AND ATTRIBUTING OPENLY LICENSED RESOURCES.
Responsible Data Use: Copyright and Data Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review Date.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Persistent Identifiers (PIDs) & Digital Objects (DOs) Christine Staiger & Robert Verkerk SURFsara.
PLAGIARISM copying another's work or borrowing someone else's original ideas PLAGIARIZE steal and pass off (the ideas or words of another) as one's own.
Copyright and Data Matthew Mayernik National Center for Atmospheric Research Section: Responsible Data Use Version 1.0 October 2012 Copyright 2012 Matthew.
A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.
CNR – National Research Council, Rome (IT) Central Library ‘G. Marconi’ National Centre for Grey Literature and National ISSN Centre CNR – National Centre.
TOWARDS A DATA CITATION STANDARD FOR GEOSS I. McCallum, H.-P. Plag & S. Fritz.
NIH BioCADDIE / Force11 Data Citation Pilot Kickoff Meeting Nine Zero Hotel, Boston MA, 3 February 2016 Introduction: Tim Clark, Maryann Martone and Joan.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
Actionable Identifiers an introduction Joan Starr California Digital Library.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Digital Object Identifiers Update ESIP Data Stewardship Committee Meeting May 16, 2016 Presenters: Nate James, ESDIS Lalit Wanchoo, ADNET Systems Inc.
A centre of expertise in digital information management 10 minute practical guide to the JISC Information Environment (for publishers!)
Funded by GFBio – Education module Publish Lesson in Data Publishing.
Sarah Whitcher Kansa (Open Context / Alexandria Archive Institute)
How NOT to share your data: Avoiding data horror stories
Publishing software and data
Introduction to electronic resources management
A step-by-step guide to DOI registration
E-resource evaluation tips
OpenML Workshop Eindhoven TU/e,
Publishing data and metdata From iRODS to repositories
Agenda (AM) 9:30-10:15 Introduction to RDA
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

Making Data Accessible Yolanda Gil USC/ISI February 20, 2015 "To deposit or not to deposit, that is the question - journal.pbio g001" by Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) - Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014) Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e doi: /journal.pbio Licensed under CC BY 4.0 via Wikimedia Commons - _journal.pbio g001.png#mediaviewer/File:To_deposit_or_not_to_deposit,_that_is_the_question_-_journal.pbio g001.png

Current Practice: Problematic  Data is often not made available in publications  Lack of reproducibility  Data made available through investigator’s URL  URL does not resolve (ie ‘’rotten’’)

Other Motivations

Current Practice: Better  Data paper  Data published in a repository

Best Practices

Goals Today 1. Understand what those best practices mean 2. Understand how to implement those best practices

Best Practices (1 of 5)

Popular Data Repositories Not CuratedCurated. "Pangaea logo hg" by Hannes Grobe/AWI - Own work. Licensed under CC BY 3.0 via Wikimedia Commons - _logo_hg.png

Directories of Research Data Repositories adwiki/Data_repositories

Best Practices (2 of 5)

Recommended Metadata General  Dataset name/title  Description  Creator(s)  Publication date  License  Publisher/contact  Version  Resource type  Location of the data Domain Specific  Categories  Keywords/tags  Related links Typical of digital libraries, eg the Dublin Core standard (

Recommended Metadata General  Dataset name/title  Description  Creator(s)  Publication date  License  Publisher/contact  Version  Resource type  Location of the data Domain Specific  Categories  Keywords/tags  Related links

Recommended: CC-BY and CC0 Choose a License

Best Practices (3 of 5)

Manual Accessibility UNIQUE ID & METADATA  connected_drug_file/ DATA  ghlConnectedDrugs.txt

Machine Accessibility Data model specifies how to query the data available

Best Practices (4 of 5)

Main Types of Unique Identifiers 1. URL 2. Digital Object Identifier 3. Permanent URL (PURL) "Fingerprint detail on male finger" by Frettie - Own work. Licensed under CC BY 3.0 via Wikimedia Commons - er/File:Fingerprint_detail_on_male_finger.jpg

URL/URI Minimal effort to create No guarantee of persistence i.e., almost guaranteed it will not have persistence e.g., gradstudents/joesmith/awesome data/ "Internet1" by Rock Own work. Licensed under GFDL via Wikimedia Commons -

Persistent URL (PURL) The same PURL can be resolved to different Web address over time You always refer to your data with the same PURL: data.html Today you are in grad school and tell purl.org to resolve it to: roup/awesomedata.html Tomorrow you have graduated and tell purl.org to resolve it to: roup/awesomedata.html It is easy to create your own PURLs, just remember to update whenever you move the data Go to (and others) "Internet1" by Rock Own work. Licensed under GFDL via Wikimedia Commons -

Best Practices (5 of 5)

Data Citation Time of retrieval Authors Date of publication Permanent unique identifier Repository

Data Papers

What if…  … I have several datasets in several files?  Create a DOI for each file and a DOI for the whole set  … the data is from a public repository?  Create a DOI+metadata and mention the original source in the metadata, point to the original data source  … the data is from a colleague?  Get permission in advance and make an agreement, then do as with the data from a public repository  … the data comes from many sources?  Credit each source, create URIs as needed  Can create a table with “microattributions” that summarize each data source  … the data comes from a database?  Create a file (or files) from it  … the data has many versions?  Create a DOI for either each slice or each snapshot

Goals Today 1. Understand what those best practices mean 2. Understand how to implement those best practices

Suggested Approach 1. Create a public entry for your dataset with a permanent unique identifier Go to figshare.com, create an account Create an entry for your dataset 2. Specify the metadata Including license -- choose from rg/licenses 3. Upload/point to the data Voilà! Figshare will give you a data citation

How to use the data citation in your paper?  Citation goes in the References section  How to cite the data? You choose:  With an in-text pointer as you would cite any other paper  With an in-text pointer in a special “Data Resources” section  With an in-text pointer in the Acknowledgements section