Presentation on theme: "Introduction to DataCite Adam Farquhar, PhD Head of Digital Library Technology, The British Library President, DataCite June, 2010."— Presentation transcript:
Introduction to DataCite Adam Farquhar, PhD Head of Digital Library Technology, The British Library President, DataCite June, 2010
2 The British Library Exists for everyone who wants to do research – for academic, personal, and commercial purposes. Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences… Receives a copy of every item published in the UK. Holds over 150 million items, with 3 million items added each year. Used by over 16,000 people each day (on site and online).
3 Data and the Digital Landscape Seismic measurements taken by a geologist. Genetic data collected by a medical researcher. A survey of public opinions collected by a sociologist.
4 Data: The Foundation of Research Data is a crucial component of the scholarly record Re-acquisition may be impossible Datasets are essential to the British Librarys mission to advance the Worlds knowledge
5 Widening Gap No effective way to link between datasets and articles No widely used method to identify datasets No widely used method to cite datasets Articles Underlying data
6 As a result… Datasets are Difficult to discover Difficult to access Being lost
7 Datasets – First Class Citizens? Data is difficult to manage after project funding ceases Informal networks provide the primary means of sharing Only 21% use a national or international facility Datasets are not included in impact analysis Good luck finding it or getting permission to use it (your discipline may vary) Source: UKRDS Study
8 DataCite – An Award Winning Global Consortium DataCite aims to: Establish easier access to scientific research data Increase acceptance of research data Support archiving of data for verification and re-use
9 DataCite – Supporting the Research Community DataCite: Supports researchers by enabling them to locate, identify, and cite research datasets with confidence Supports data centres by providing persistent identifiers for datasets, workflows and standards for data publication Supports publishers by enabling research articles to be linked to the underlying data
10 Digital Object Identifiers (DOIs) offer a solution Mostly widely used identifier for scientific articles Researchers, authors, publishers know how to use them Put datasets on the same playing field as articles DataCite uses DOIs for Data: DataCite : Data Centres :: CrossRef : Publishers Dataset Yancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA. doi: /PANGAEA URLs are not persistent (e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).
11 Membership From Canada to Australia Currently twelve members across nine countries Over 800,000 records registered with DOI names so far
12 Rapid Progress Builds on Foundational Work TIB begins to issue DOIs for datasets Paris Memorandum DataCite Association founded in London 7 members 12 members All members assigned DOIs Over 800,000 items registered Pilot projects with Data Centres Production services with Data Centres Shared technical infrastructure Integrated services with key partners
13 DataCite – Roles and Responsibilities The DataCite registration agency Maintains the resolution infrastructure Maintains a searchable database of metadata Manages identifiers over the long term Establishes and shares best practice Publishing agents (data centres, research institutes, publishers) are responsible for Quality assurance Content storage and access Creating the identifier Creating and updating metadata
14 DataCite Structure DataCite Member Institution Data Centre Member Institution Data Centre … Carries Works with International DOI Foundation Global Handle System Member Associate Stakeholder
15 Strengths and Weaknesses of DOI DOIs have some strong advantages Accepted by researchers and scientists Mature infrastructure Put datasets on the same playing field as articles But perceived as Expensive The current IDF business model favours larger registration agencies Publisher oriented The largest registration agency is the publisher-oriented CrossRef
16 The Cost of Visibility 0.01 – 1 Collection Production Storage Quality Assurance Metadata 50 – 500 (approx 1% of data creation cost) 5,000 – 5,000,000 DOI Assignment Management
17 BL – Search Our Catalogue
18 DE Service – Elsevier Science Direct
19 Research Data in Articles
20 Publishing Primary Data
21 Rapidly Growing Ecosystem Microsoft works with CDL to embed DataCite into Excel plug-in UK National Sound Archive assigns DataCite DOIs to archival recordings Dryad integrates DataCite DOIs into publisher workflows for supplementary material and datasets in US ANDS integrates DataCite DOIs into dataset services Thieme Publishing Group uses DataCite DOIs to link articles and primary research data (at FIZ) Active discussions with key research information service providers and data centres
22 What Next? Require clear unambiguous citations for datasets Integrate links to datasets into delivery platforms Integrate into workflows for researchers, data centres, and publishers Collaborate to understand roles and responsibilities among publishers, data centres, and libraries Improve attribution and credit for data producers Roll out services DataCite supports researchers by enabling them to locate, identify, and cite research datasets with confidence We welcome your comments, questions, and ideas! Contact: adam.farquhar bl.uk jan.brase tib.uni- hannover.de