Presentation is loading. Please wait.

Presentation is loading. Please wait.

Because good research needs good data Funded by: Perspectives on Digital Curation, Data and Publishing Why, How, Where? Kevin Ashley Director, DCC

Similar presentations


Presentation on theme: "Because good research needs good data Funded by: Perspectives on Digital Curation, Data and Publishing Why, How, Where? Kevin Ashley Director, DCC"— Presentation transcript:

1 Because good research needs good data Funded by: Perspectives on Digital Curation, Data and Publishing Why, How, Where? Kevin Ashley Director, DCC director@dcc.ac.uk High Heid Yin,

2 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA DATA PUBLISHING

3 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA TURN into a diagram Idea – funding – collection – analysis - publish

4 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA

5 Overview Why care about citing and/or referencing data? Data is different – and that matters Approaches, and their strengths & weaknesses

6 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA WHY CARE? #1

7 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA LIDAR & RADAR images of ice cloud – H. Ruschennberg

8 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA WHY CARE? #2

9 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA University funding – the future is scary

10 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA The Data Behind The Graph Data in support of publication should be as accessible as the publication itself Allows challenge, replication, understanding Often undertaken by publishers, or ventures associated with them Sometimes associated items to DOI of paper, sometimes objects with own DOI

11 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Integrity goes further… The data that I publish is not always the data that I collected Sometimes, that matters

12 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA WHY CARE? #3

13 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Impact Making data accessible increases citation rates Better for authors; better for publishers Piwowar, Day & Fridsma (2007): 45% of studies make data accessible They receive 85% of citations Caution: correlation is not causation

14 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA WHY CARE? #4

15 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA I your data! I dont what you said about it.

16 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA E.G….. Your data on rock types of Crete supported a publication in a geological journal Your data on rock types of Crete supports my theory about the sources of pigments used on Minoan pottery I wont be publishing in a geological journal Your conclusions about events 1 billion years ago have no relevance for me

17 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA WHY CARE? #5

18 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Understanding Biodiversity We dont understand what drives it What helps, hinders speciation We believe it to be good No one project or data source is enough

19 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Research on Biodiversity… Requires many different data sources Not all will be published Not all publications are for similar research reasons, so… Citing the publication is (often) irrelevant Some is research data, other government or reference data There are probably gaps that need filling

20 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA WHY CARE? #6

21 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Peoples lives may depend on it Watch Josh Sommer @ SAGE Bionetworks http://sagecongress.org/WP/presentations

22 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Data Is Different

23 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA

24

25

26

27

28 Where can it happen Global, international Nationally Institution By Subject Research Group

29 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA

30 Approaches at publication Giving data digital object identifiers (DOI) E.g. DataCite Capturing data subsets at point of publication Freezing those subsets somewhere Publication-led approach

31 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Who are the actors Scholars, librarians, publishers… … arent enough They arent even the only people doing this now. They are a very big part of the answer, though Curation happens before, after, without, publication

32 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA One datacenter: NOAA General form for citing published World Data Center for Paleoclimatology Data: Anderson, D.W., W.L. Prell, and N.J. Barratt. 1989. Estimates of sea surface temperature in the Coral Sea at the last glacial maximum. Paleoceanography 4(6):615-627. Data archived at the World Data Center for Paleoclimatology, Boulder, Colorado, USA General form for citing unpublished World Data Center for Paleoclimatology Data: Rind, D. 1994. General Circulation Model Output Data Set. IGBP PAGES/World Data Center for Paleoclimatology Data Contribution Series #1994-012. NOAA/NCDC Paleoclimatology Program, Boulder, Colorado, USA. Citation for Data archived via a Data Cooperative: McAndrews, J.H. 1996. Martin Pond pollen record. In E.C. Grimm et al., editors, North American Pollen Database. IGBP PAGES/World Data Center for Paleoclimatology. NOAA/NCDC Paleoclimatology Program, Boulder, Colorado, USA. Citation for group of contributors that is too large to cite individually: Contributors of the International Tree-Ring Data Bank, IGBP PAGES/World Data Center for Paleoclimatology, NOAA/NCDC Paleoclimatology Program, Boulder, Colorado, USA.

33 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA But data changes, and is big When many publications use bits of one dataset… When a dataset changes hourly…. … and is petabyte-sized …snapshots dont cut it They also lose context of original

34 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Publication A Publication C Publication D Publication B Data Object A Data Object B Data Object C Data Object D Original Source Data

35 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Data Management vs Curation UC3 view Curation value can be added… by enabling creative use, reuse, in whole or part or in aggregation… Facilitate by: Persistent citation & actionable reference Discovery of content & contextual description Annotation for enriched description

36 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Citing big data accurately Dont make copies – keep change records Create reference mechanisms that allow reference to a specific change point C.f. Memento technique for referencing web pages Requires cooperation between curator & referencer

37 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0 RowF1F2F3F4 12.3YM0 21NF0 33NF0 42YM300 54Nf0

38 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA On Citing Data Peter Buneman. How to cite curated databases and how to make them citable. In Proceedings of the 18th Conference on Scientific and Statistical Database Management, pages 195-203, July 2006 Some serious computer science – some for a very general audience

39 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Bunemans desiderata Let C be a citation, the thing cited D1: For any citation C, shall be fixed D2: Any citable thing T should contain a C such that = T D3: Databases should be citable at multiple levels of coarseness D4: If C and P are citations and is coarser than, then location info in P should be in C D5: Versioning is done at database level

40 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA Acknowledgements De Spullenmannen & Harmen G Zijp (http://www.spullenmannen.nl/)http://www.spullenmannen.nl/ Jaxa.jp (satellite image) NOAA NDAD/Crown Copyright/Happy Computers Ltd

41 4 th Bloomsbury Conference on e-publishing, UCL, London – 20100624 Kevin Ashley, DCC CC-BY-SA

42 Summary Thinking of data solely as adjunct to publication is too narrow a view Current practice may not extend easily Data is often living – treat it as such Theres more to the world than scholarly research Hidden data is wasted data


Download ppt "Because good research needs good data Funded by: Perspectives on Digital Curation, Data and Publishing Why, How, Where? Kevin Ashley Director, DCC"

Similar presentations


Ads by Google