Because good research needs good data Funded by: Perspectives on Digital Curation, Data and Publishing Why, How, Where? Kevin Ashley Director, DCC

Slides:



Advertisements
Similar presentations
Preservation, access and re-use of Research Data The STM view on publishing datasets Presented at the DataCite Summer Meeting 2010 Hannover, 8 June 2010.
Advertisements

Introduction to DataCite Adam Farquhar PhD Head of Digital Library Technology, The British Library President, DataCite June 2010.
Frighteningly Sane or The first steps to Madness?.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
Southampton University Research e-Prints: e-Prints Soton School of Medicine Discussion 19 Jan 2005 Pauline Simpson Elizabeth.
Critical Reading Strategies: Overview of Research Process
Because good research needs good data Funded by: Data Curation – DCC role Kevin Ashley Director, DCC High Heid Yin,
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
Alma Swan Key Perspectives Ltd Truro, UK UUK Workshop on Research Information and Management London, 5 December 2007 OVERVIEW: The communication and effectiveness.
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
A centre of expertise in digital information management UKOLN is supported by: Monica Duke Project.
Lorrie Apple Johnson Lead Librarian, Information Analysis & Services Office of Scientific and Technical Information (OSTI) National Academy of Sciences.
(Research) dataset metadata - requirements Kevin Ashley Digital Curation Centre Reusable with attribution:
Working in collaboration with data centres Elizabeth Newbold, The British Library Presented at: DataCite Annual Conference Nancy France August 25, 2014.
UCL Library Services and Research Data Management – a case study Martin Moyle UCL Library Services ODE Workshop, LIBER Conference, 27 June 2012.
Because good research needs good data Funded by: Research Data Management: What’s happening & why should you care? Kevin Ashley Director, DCC
IDENTIFIERS & THE DATA CITATION INDEX DISCOVERY, ACCESS, AND CITATION OF PUBLISHED RESEARCH DATA NIGEL ROBINSON 17 OCTOBER 2013.
How to Read a Technical Paper Locking and Consistency 10/7/05.
Dr. Gayatri Paul Senior Documentation Assistant Indian Association for the Cultivation of Science Kolkata – And Dr. Swapan Deoghuria Scientist -
Indian Journal of Physics: A Scientometric Analysis
Data Publishing Workflows: Strategies and Standards
A tutorial on how to compute H-index using Web of Science database.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
THE DATA CITATION INDEX AN INNOVATIVE SOLUTION TO EASE THE DISCOVERY, USE AND ATTRIBUTION OF RESEARCH DATA MEGAN FORCE 22 FEBRUARY 2014.
Jake Blanchard – University of Wisconsin – August 2007.
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Writing a scientific paper Maxine Eskenazi Meeting 1 - Overall Structure and Content of a Paper.
Evolving Roles in Scholarly Communications Susan Reilly, APA, Frascati, 7th Nov, 2012.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
Data Citation: the next big thing… ?!?! 1 Victoria University 20 Nov
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
UC3 Standards and Best Practices for Datasets and Other Supplemental Journal Article Materials UC3 Stephen Abrams Patricia Cruse John Kunze.
Thomas HeckeleiPublishing and Writing in Agricultural Economics1 Publishing and Writing in Agricultural Economics Promotionskolleg Agrarökonomik 1Introduction.
Citing Data Sets in the Literature: ORNL DAAC Practices Robert Cook, Suresh SanthanaVannan, and Daine Wright Environmental Sciences Division Oak Ridge.
Department of Chemical Engineering Project IV Lecture 3: Literature Review.
E - Physical Sciences & Engineering Jeff Pache IEE
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
1 Literature review. 2 When you may write a literature review As an assignment For a report or thesis (e.g. for senior project) As a graduate student.
How to write a basic research article to be relevant for the readers of European Urology Jean-Nicolas CORNU Associate Editor European Urology.
(Linked) Data Curation challenges Kevin Ashley Director, Digital Curation Centre Reusable with attribution: CC-BY The.
Scholarly communications Discussion group Linked Data Workshop May 2010.
Preserving the Scientific Record: Case Study 2 – Arctic Temperature Variability Data Matthew Mayernik National Center for Atmospheric Research Version.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
Because good research needs good data Funded by: Data and the web manager Kevin Ashley Director, DCC CC-BY.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Open access and subscription journals: implications for low- and middle-income countries Moderated by Subhasree Raghavan Presented by Emma Veitch and Paul.
Teaching Climate Change: Lessons from the Past 2006 Workshop Montana State University, Bozeman Mt Teaching with Real Data: Paleoclimatology Resources for.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
What is data citation & why do we care? What’s been happening here and overseas? How ready are you for data citation? 1 Welcome! Image:
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
1. Literature Review Hart (1998) defined the literature review as “the use of ideas in the literature to justify the particular approach to the topic,
Creating Documentation and Metadata: Creating a Citation for Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Copyright.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
1 Publications = Data, Data = Publications A Semantic Publishing Vision Prof. Dr. Stefan Gradmann Humboldt-Universität zu Berlin / School of Library and.
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA …………………………………………………………………………………………………… LOUISE CORTI …………………….…………………………….… UK DATA ARCHIVE.
What is Referencing? ♦ Acknowledging ideas or information from other peoples’ work in your own writing.
Data Citation and You: The new AGU guidelines for data citation
How the Library can support your project or dissertation
Searching skills for researching
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
A tutorial on how to compute H-index using Web of Science database
Publishing software and data
Linking persistent identifiers at the British Library
Jay Bhatt Drexel University Libraries
CNI Spring 2010 Membership Meeting
How to publish your research
OpenML Workshop Eindhoven TU/e,
Chief Librarian & Curator Natural History Museum of Los Angeles County
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
DELNET – Developing Library Network
Presentation transcript:

Because good research needs good data Funded by: Perspectives on Digital Curation, Data and Publishing Why, How, Where? Kevin Ashley Director, DCC High Heid Yin,

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA DATA PUBLISHING

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA TURN into a diagram Idea – funding – collection – analysis - publish

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA

Overview Why care about citing and/or referencing data? Data is different – and that matters Approaches, and their strengths & weaknesses

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA WHY CARE? #1

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA LIDAR & RADAR images of ice cloud – H. Ruschennberg

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA WHY CARE? #2

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA University funding – the future is scary

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA The Data Behind The Graph Data in support of publication should be as accessible as the publication itself Allows challenge, replication, understanding Often undertaken by publishers, or ventures associated with them Sometimes associated items to DOI of paper, sometimes objects with own DOI

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Integrity goes further… The data that I publish is not always the data that I collected Sometimes, that matters

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA WHY CARE? #3

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Impact Making data accessible increases citation rates Better for authors; better for publishers Piwowar, Day & Fridsma (2007): 45% of studies make data accessible They receive 85% of citations Caution: correlation is not causation

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA WHY CARE? #4

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA I your data! I dont what you said about it.

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA E.G….. Your data on rock types of Crete supported a publication in a geological journal Your data on rock types of Crete supports my theory about the sources of pigments used on Minoan pottery I wont be publishing in a geological journal Your conclusions about events 1 billion years ago have no relevance for me

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA WHY CARE? #5

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Understanding Biodiversity We dont understand what drives it What helps, hinders speciation We believe it to be good No one project or data source is enough

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Research on Biodiversity… Requires many different data sources Not all will be published Not all publications are for similar research reasons, so… Citing the publication is (often) irrelevant Some is research data, other government or reference data There are probably gaps that need filling

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA WHY CARE? #6

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Peoples lives may depend on it Watch Josh SAGE Bionetworks

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Data Is Different

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA

Where can it happen Global, international Nationally Institution By Subject Research Group

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA

Approaches at publication Giving data digital object identifiers (DOI) E.g. DataCite Capturing data subsets at point of publication Freezing those subsets somewhere Publication-led approach

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Who are the actors Scholars, librarians, publishers… … arent enough They arent even the only people doing this now. They are a very big part of the answer, though Curation happens before, after, without, publication

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA One datacenter: NOAA General form for citing published World Data Center for Paleoclimatology Data: Anderson, D.W., W.L. Prell, and N.J. Barratt Estimates of sea surface temperature in the Coral Sea at the last glacial maximum. Paleoceanography 4(6): Data archived at the World Data Center for Paleoclimatology, Boulder, Colorado, USA General form for citing unpublished World Data Center for Paleoclimatology Data: Rind, D General Circulation Model Output Data Set. IGBP PAGES/World Data Center for Paleoclimatology Data Contribution Series # NOAA/NCDC Paleoclimatology Program, Boulder, Colorado, USA. Citation for Data archived via a Data Cooperative: McAndrews, J.H Martin Pond pollen record. In E.C. Grimm et al., editors, North American Pollen Database. IGBP PAGES/World Data Center for Paleoclimatology. NOAA/NCDC Paleoclimatology Program, Boulder, Colorado, USA. Citation for group of contributors that is too large to cite individually: Contributors of the International Tree-Ring Data Bank, IGBP PAGES/World Data Center for Paleoclimatology, NOAA/NCDC Paleoclimatology Program, Boulder, Colorado, USA.

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA But data changes, and is big When many publications use bits of one dataset… When a dataset changes hourly…. … and is petabyte-sized …snapshots dont cut it They also lose context of original

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Publication A Publication C Publication D Publication B Data Object A Data Object B Data Object C Data Object D Original Source Data

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Data Management vs Curation UC3 view Curation value can be added… by enabling creative use, reuse, in whole or part or in aggregation… Facilitate by: Persistent citation & actionable reference Discovery of content & contextual description Annotation for enriched description

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Citing big data accurately Dont make copies – keep change records Create reference mechanisms that allow reference to a specific change point C.f. Memento technique for referencing web pages Requires cooperation between curator & referencer

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA On Citing Data Peter Buneman. How to cite curated databases and how to make them citable. In Proceedings of the 18th Conference on Scientific and Statistical Database Management, pages , July 2006 Some serious computer science – some for a very general audience

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Bunemans desiderata Let C be a citation, the thing cited D1: For any citation C, shall be fixed D2: Any citable thing T should contain a C such that = T D3: Databases should be citable at multiple levels of coarseness D4: If C and P are citations and is coarser than, then location info in P should be in C D5: Versioning is done at database level

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA Acknowledgements De Spullenmannen & Harmen G Zijp ( Jaxa.jp (satellite image) NOAA NDAD/Crown Copyright/Happy Computers Ltd

4 th Bloomsbury Conference on e-publishing, UCL, London – Kevin Ashley, DCC CC-BY-SA

Summary Thinking of data solely as adjunct to publication is too narrow a view Current practice may not extend easily Data is often living – treat it as such Theres more to the world than scholarly research Hidden data is wasted data