Presentation is loading. Please wait.

Presentation is loading. Please wait.

A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations


Presentation on theme: "A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

1 a centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ Curation: making data suitable for re-use Chris Rusbridge Presentation at FIBS Seminar

2 a centre of expertise in data curation and preservation FIBS January 2007 Contents Science and digital curation What to do with your data: frontiers of practice Repository frontiers

3 a centre of expertise in data curation and preservation FIBS January 2007 Digital Curation Centre Mission “The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation”

4 a centre of expertise in data curation and preservation FIBS January 2007 SDSS (Visual) TWOMASS (Infrared) Slide from Rajendra Bose

5 a centre of expertise in data curation and preservation FIBS January 2007 Slide from Rajendra Bose

6 a centre of expertise in data curation and preservation FIBS January 2007 New discovery… National Virtual Observatory Johns Hopkins press release: “Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.”

7 a centre of expertise in data curation and preservation FIBS January 2007 Curation Data increasingly important as evidence Key part of the scholarly record Experimental verifiability (the basis of science) Allows additional interpretations Unrepeatable observations & experiments (particularly environmental in broadest sense) Legal, compliance & transactions Cultural resources

8 a centre of expertise in data curation and preservation FIBS January 2007 What kinds of data? Observations eg UARS (Upper Atmosphere) Level 0: telemetry UARS Level 1: measured physical parameters (post calibration?) Derived data UARS Level 2: calculated geophysical? profiles UARS level 3: gridded, interpolated? Combined data Crafted data Eg annotated gene/protein databases Descriptive (meta)data

9 a centre of expertise in data curation and preservation FIBS January 2007 What to do with it? Keep as part of experiment Deposit in institutional or discipline repository Possible time-limited embargos Cite it “Publish” in support for articles

10 a centre of expertise in data curation and preservation FIBS January 2007 Internet Archaeology: publication with data (sadly, a preservation nightmare!)

11 a centre of expertise in data curation and preservation FIBS January 2007 What are the reusability issues? Data not neutral to hypothesis Hard to know the risks & pitfalls of a particular dataset Data not self-describing: hard to find appropriate data Hard to “understand” data once found Hard to use data once understood

12 a centre of expertise in data curation and preservation FIBS January 2007 What to do about it? Build curation/reusability into your workflow Curation begins before creation What’s easy at first becomes (impossibly) hard later Describe your data (metadata) Keep experimental parameters (technical, who, what, when, where etc) Keep data descriptions (schemas, “representation information”, etc) Keep data! Use standard/agreed formats for data Make ownership & restrictions clear Explain how to cite your data

13 a centre of expertise in data curation and preservation FIBS January 2007

14 a centre of expertise in data curation and preservation FIBS January 2007 Data resource stages Curated data is created… Observations? Fixed! Or Acquired… Data brought/bought from outside Ingest Development Derived, refined, combined, processed data Potentially many stages

15 a centre of expertise in data curation and preservation FIBS January 2007 Context Data meaningless without context Linkage Metadata of many kinds Workflow! Provenance Authenticity Computational lineage

16 a centre of expertise in data curation and preservation FIBS January 2007 NASA University research group1 research group3 local decision- making body University research group2 Slide from Rajendra Bose

17 a centre of expertise in data curation and preservation FIBS January 2007 Access and re-use Ethics and rights control access Weak in expressing this long-term Collaboration tools Annotation, discussion, review Re-use leading to change and development “Publication” Not just in “print” Underlying data should be “published”, too Citation…

18 a centre of expertise in data curation and preservation FIBS January 2007 Citation needs… An efficient way to reference and access “archived” past states of a changing dataset (work in progress, Buneman et al) Not important for original observations Don’t mess with those data Less important for incremental datasets Later stuff should not invalidate earlier Very important for revisable datasets Eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change Eg Mapping… OS maps represent a huge database that changes on a daily basis

19 a centre of expertise in data curation and preservation FIBS January 2007 Who are the curation players?

20 a centre of expertise in data curation and preservation FIBS January 2007 Curation: Individual “Small science 2-3 times more data than Big science”, but much more at risk PhD student? RA? PI? Administrator? IT support? Data potentially on local hard drives, or at best shared network drives May be inadequately protected Liable for policy-led deletion on resignation Individual “knows” too much Documentation/metadata unlikely to be adequate Future: gone!

21 a centre of expertise in data curation and preservation FIBS January 2007 Department: eCrystals Partnership with Institutional Repository Specialist department archive (& national service) Workflow recording of lab parameters (R4L) Public & private elements Trying to build eCrystals federation (eBank 3) Future: likely to continue

22 a centre of expertise in data curation and preservation FIBS January 2007 Institution: Cambridge Chemistry 175,000 small molecule structures in CML Alongside Archaeology, Manuscripts, Learning Materials, etc No library curation skills; dependent on research group enthusiast Collection isolated from other Chemistry (Only 5 UK institutional repositories claim to hold data) Future: assured…

23 a centre of expertise in data curation and preservation FIBS January 2007 Community: LOCKSS? Self-selected group of collectors: closest to genuine open activity (despite Alliance)? Traditionally libraries collecting eJournals Model respects IPR No domain expertise; rely on origins Data limitations… Future: potentially very persistent (low cost, high reliability, attack resistance, distributed)

24 a centre of expertise in data curation and preservation FIBS January 2007 Discipline: Atmospheric Science Strong believer in need for domain scientists as curators Significant participant in “community proxy” agenda-setting activities Internationally fragmented resources Future: mostly dependent on grant funding (but strong commitment)

25 a centre of expertise in data curation and preservation FIBS January 2007 Discipline: Pharmacology International Scientific Union Attempting to build credit for data contributions Future: extremely limited funding

26 a centre of expertise in data curation and preservation FIBS January 2007 Discipline: Bio/Health UK PubMedCentral! (you heard about this earlier)

27 a centre of expertise in data curation and preservation FIBS January 2007 Issues: Nature article 23 June 05 Databases in Peril 51 out of 89 biological databases contacted reported they were struggling financially 7 have closed Several being updated in owner’s spare time (Notes that not all deserve long term support) [Nucleic Acids Research reports 858 databases in 2006!] Major issue: money

28 a centre of expertise in data curation and preservation FIBS January 2007 Publisher: Crystallography Publisher and Scientific Union Created key domain crystallographic standard (CIF) Strong motivator for deposit of structure data Consistent quality checks DOIs used for structure data Future: publishing business model Slide from IUCr

29 a centre of expertise in data curation and preservation FIBS January 2007 National bodies: British Library Serious and robust approach Legal deposit powers & responsibilities as driver Oriented primarily towards “cultural heritage” (broadly interpreted) Little data, no science domain experience Future: strong future commitment

30 a centre of expertise in data curation and preservation FIBS January 2007 National bodies: TNA/NDAD Specialist archive for government datasets Understand government regulations, dynamics & requirements Subject generalists; disconnected from associated science Technology specialists (understand databases) Future: likely to pass eventually to The National Archives

31 a centre of expertise in data curation and preservation FIBS January 2007 3rd parties: Portico Specific area: eJournals Depends on publisher agreements No data or domain science expertise Future: commitment from Mellon + publishers + subscriptions, good funding mix

32 a centre of expertise in data curation and preservation FIBS January 2007 3rd Parties: Iron Mountain? Records management IS a curation problem Organisations like this very likely to branch out No domain science expertise Future: business case, viability, stock market…

33 a centre of expertise in data curation and preservation FIBS January 2007 Institutions & the network Institutions have fundamental sustainability Disciplines have domain knowledge advantage but sustainability is an issue Can we get the best of both?

34 a centre of expertise in data curation and preservation FIBS January 2007 Intersections… Institution 1 Institution 2 Institution 3 etc Discipline 1 XX Discipline 2 XX Discipline 3 XX etc

35 a centre of expertise in data curation and preservation FIBS January 2007 Who are the curation players again?

36 a centre of expertise in data curation and preservation FIBS January 2007 BEWARE WEB 2.0!!!

37 a centre of expertise in data curation and preservation FIBS January 2007 Thank you


Download ppt "A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike."

Similar presentations


Ads by Google