Presentation is loading. Please wait.

Presentation is loading. Please wait.

A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations


Presentation on theme: "A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

1 a centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ Tomorrow, and tomorrow, and tomorrow: the players on the curation stage Chris Rusbridge Presentation at OCLC

2 a centre of expertise in data curation and preservation OCLC October 2006 "To-morrow, and to-morrow, and to-morrow, Creeps in this petty pace from day to day, To the last syllable of recorded time; And all our yesterdays have lighted fools The way to dusty death. Out, out, brief candle! Life's but a walking shadow; a poor player, That struts and frets his hour upon the stage, And then is heard no more: it is a tale Told by an idiot, full of sound and fury, Signifying nothing." Shakespeare: Macbeth

3 a centre of expertise in data curation and preservation OCLC October 2006 Dunsinane Hill Photo by Fabrice

4 a centre of expertise in data curation and preservation OCLC October 2006

5 a centre of expertise in data curation and preservation OCLC October 2006

6 a centre of expertise in data curation and preservation OCLC October 2006 Contents Curation and the Digital Curation Centre Science and Data Citations The poor players of data curation Sustainability of curated data Macbeth again…

7 a centre of expertise in data curation and preservation OCLC October 2006 Curation Data increasingly important as evidence Experimental verifiability (the basis of science) Unrepeatable observations & experiments (particularly environmental in broadest sense) Legal, compliance & transactions Cultural resources Preservation view vs Publishing view

8 a centre of expertise in data curation and preservation OCLC October 2006 Lynch remarks Closing the Curation Conference 3 views of digital curation Finite process, handover to preservation Whole life process, evolving object(s) Collection as a living thing

9 a centre of expertise in data curation and preservation OCLC October 2006 Digital curation? Digital preservation Static For later use

10 a centre of expertise in data curation and preservation OCLC October 2006 Digital curation? Digital preservationDigital curation StaticDynamic Long-term For later useIn use now (and the future)

11 a centre of expertise in data curation and preservation OCLC October 2006 Digital curation Digital curation & preservation StaticDynamic Long-term For later useIn use now (and the future) maintaining and adding value to a trusted body of digital information for current and future use

12 a centre of expertise in data curation and preservation OCLC October 2006 Mission The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation

13 a centre of expertise in data curation and preservation OCLC October 2006 Organisation to Engage & Collaborate Industry research collaborators standards bodies testbeds & tools communities of practice: users community support & outreach research development co-ordination service definition & delivery management & admin support Associates Network curation organisations eg DPC

14 a centre of expertise in data curation and preservation OCLC October 2006 Organisation to Engage & Collaborate: Leads Industry research collaborators standards bodies testbeds & tools communities of practice: users Bath Edinburgh CCLRC Glasgow Edinburgh Associates Network curation organisations eg DPC

15 a centre of expertise in data curation and preservation OCLC October 2006 Associated work DCC LOCKSS Technical Support Service (Lots of Copies Keep Stuff Safe) DCC SCARP Project Disciplinary approaches to sharing, curation, re- use and preservation EU projects associated CASPAR Digital Preservation Europe PLANETS

16 a centre of expertise in data curation and preservation OCLC October 2006 Phase 2 Externally-moderated, reflective self- evaluation completed Phase 2 proposal (2007/10) to JISC Accepted: focus on science data, reduced scale EPSRC-funded Research continues until 2007/8

17 a centre of expertise in data curation and preservation OCLC October 2006 2nd International Digital Curation Conference Research & invited presentations Glasgow, 21/22 November, 2006 Please register at: http://www.dcc.ac.uk/events/dcc-2006/

18 a centre of expertise in data curation and preservation OCLC October 2006

19 a centre of expertise in data curation and preservation OCLC October 2006 Data resource stages Curated data is created… Observations? Fixed! Or Acquired… Data brought/bought from outside Ingest Development Derived, refined, combined, processed data Potentially many stages

20 a centre of expertise in data curation and preservation OCLC October 2006 SDSS (Visual) TWOMASS (Infrared) Slide from Rajendra Bose

21 a centre of expertise in data curation and preservation OCLC October 2006 Slide from Rajendra Bose

22 a centre of expertise in data curation and preservation OCLC October 2006 New discovery… National Virtual Observatory Johns Hopkins press release: Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.

23 a centre of expertise in data curation and preservation OCLC October 2006 Context Data meaningless without context Linkage Metadata of many kinds Workflow! Provenance Computational lineage Authenticity

24 a centre of expertise in data curation and preservation OCLC October 2006 NASA University research group1 research group3 local decision- making body University research group2 Slide from Rajendra Bose

25 a centre of expertise in data curation and preservation OCLC October 2006 Access and re-use Ethics and rights control access Weak in expressing this long-term Collaboration tools Annotation, discussion, review Re-use leading to change and development Publication Not just in print Underlying data should be published, too Citation…

26 a centre of expertise in data curation and preservation OCLC October 2006 CLADDIER citation investigation My last example was an MST data set held at the BADC, and I was suggesting something like this (for a citation): Natural Environment Research Council Mesosphere-Stratosphere-Troposphere Radar at Aberystwyth Internet British Atmospheric Data Centre (BADC) 1990 badc.nerc.ac.uk/data/mst/v3/upd15032006 http://featuretype.registry/verticalProfile 200409031205 Sep 21 2006 http://badc.nerc.ac.uk/data/mst/v3/ (Made up tags!) Bryan Lawrence Weblog

27 a centre of expertise in data curation and preservation OCLC October 2006 CLADDIER 2: Version of record Role of Publisher: add value provision of catalogue metadata some commitment to maintenance of the resource at the AvailableAt url some commitment to the resource being conformant to the description of the Feature some commitment to the maintenance of the mapping between the identifier [LocalID] and the resource. Bryan Lawrence Weblog

28 a centre of expertise in data curation and preservation OCLC October 2006 CLADDIER 3: persistence Wayback Machine Only snapshots (eg only 2004 version of Bryans home page!) WebCite allows the creater of content to submit URLs for [archiving], thus ensuring when one writes an academic document, the material will be archived, and the citation will be persistent But no real help for data… … only allow [data citation] when we believe in the persistence of the organisation making the data available… Bryan Lawrence Weblog

29 a centre of expertise in data curation and preservation OCLC October 2006

30 a centre of expertise in data curation and preservation OCLC October 2006 Citation OWL Web Ontology Language Reference W3C Proposed Recommendation 15 December 2003 This version : http://www.w3.org/TR/2003/PR-owl-ref-20031215/ Latest version: http://www.w3.org/TR/owl-ref/ Previous version: http://www.w3.org/TR/2003/CR-owl-ref-2003081 Needs a stable resource to cite… (FRBR works & expressions?)

31 a centre of expertise in data curation and preservation OCLC October 2006 Citation… The date alone (as in common web citation approaches) is not enough! Cited object likely to have changed… Citation should link to the cited object as it was! [6] The CIA World Factbook. www.cia.gov/cia/publications/factbook/. Retrieved on 8 Jan 2006.

32 a centre of expertise in data curation and preservation OCLC October 2006 Citation needs… An efficient way to reference and access archived past states of a changing dataset (work in progress, Buneman et al) Not important for original observations Dont mess with those data Less important for incremental datasets Later stuff should not invalidate earlier Very important for revisable datasets Eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change Eg Mapping… OS maps represent a huge database that changes on a daily basis

33 a centre of expertise in data curation and preservation OCLC October 2006 XML Archiver Relational Database XML Archive at time t - 1 XML Archive at time t XMLArch: System Architecture Pre-processor Version Merger Data Extractor XML Snapshot at time t Carwyn Edwards

34 a centre of expertise in data curation and preservation OCLC October 2006 Who are the curation players?

35 a centre of expertise in data curation and preservation OCLC October 2006 Curation: Individual Small science 2-3 times more data than Big science, but much more at risk PhD student? RA? PI? Administrator? IT support? Data potentially on local hard drives, or at best shared network drives May be inadequately protected Liable for policy-led deletion on resignation Individual knows too much Documentation/metadata unlikely to be adequate Tomorrow: gone!

36 a centre of expertise in data curation and preservation OCLC October 2006 Department: eCrystals Specialist department archive (& national service) Workflow recording of lab parameters (R4L) Public & private elements Trying to build eCrystals federation (eBank 3) But… ReciprocalNet? French COD efforts? Fragmented discipline! Tomorrow: likely to continue

37 a centre of expertise in data curation and preservation OCLC October 2006 Institution: Cambridge Chemistry 175,000 small molecule structures in CML Alongside Archaeology, Manuscripts, Learning Materials, etc No library curation skills; dependent on research group enthusiast Collection isolated from other Chemistry Tomorrow: assured…

38 a centre of expertise in data curation and preservation OCLC October 2006 Community: CDL Shared effort from group of institutions Comparison OhioLink? Document tradition, not data Passive role re collections Rely on departmental & domain expertise Tomorrow: assured…

39 a centre of expertise in data curation and preservation OCLC October 2006 Community: SDSC? Data specialists Multiple disciplines Distinct from domains; curation dependent on external expertise Research ethos Tomorrow: dependent on grant/contract income & research priorities

40 a centre of expertise in data curation and preservation OCLC October 2006 Community: LOCKSS? Self-selected group of collectors: closest to genuine open activity (despite Alliance)? Traditionally libraries collecting eJournals Model respects IPR No domain expertise; rely on origins Data limitations… Tomorrow: potentially very persistent (low cost, high reliability, attack resistance, distributed)

41 a centre of expertise in data curation and preservation OCLC October 2006 Discipline: Archaeology Staffed by archaeologist curators Understand special legal issues Strong relationship with community & peers Internationally still fragmented? Tomorrow: dependent on research council grants + deposit funding

42 a centre of expertise in data curation and preservation OCLC October 2006 Discipline: Astronomy Part of major international effort Expensive shared facilities, global reach Well integrated into community Enable new science Tomorrow: assured by community (another large facility)

43 a centre of expertise in data curation and preservation OCLC October 2006 Discipline: Atmosphere Strong believer in need for domain scientists as curators Significant participant in community proxy agenda-setting activities Internationally fragmented resources Tomorrow: mostly dependent on grant funding (but strong commitment)

44 a centre of expertise in data curation and preservation OCLC October 2006 Discipline: Pharmacology International Scientific Union Attempting to build credit for data contributions DB ownership rotates Tomorrow: extremely limited funding

45 a centre of expertise in data curation and preservation OCLC October 2006 Discipline: Social Sciences Mature! Staffed by Social Science curators Alert to opportunities Able to appraise material offered Strong relationship to discipline Tomorrow: assured through broad mix of funding streams

46 a centre of expertise in data curation and preservation OCLC October 2006 Publisher: Crystallography Publisher and Scientific Union Created key domain crystallographic standard (CIF) Strong motivator for deposit of structure data Consistent quality checks DOIs used for structure data Tomorrow: publishing business model Slide from IUCr

47 a centre of expertise in data curation and preservation OCLC October 2006 National bodies: British Library Serious and robust approach Legal deposit powers & responsibilities as driver Oriented primarily towards cultural heritage (broadly interpreted) Little data, no science domain experience Tomorrow: strong future commitment

48 a centre of expertise in data curation and preservation OCLC October 2006 National bodies: TNA/NDAD Specialist archive for government datasets Understand government regulations, dynamics & requirements Subject generalists; disconnected from associated science Technology specialists (understand databases) Tomorrow: likely to pass eventually to The National Archives

49 a centre of expertise in data curation and preservation OCLC October 2006 National bodies: NOAA (etc) Government body making serious data available Domain scientists curate data Operates in current political context (!) Tomorrow: reasonably assured but some un- funded mandates?

50 a centre of expertise in data curation and preservation OCLC October 2006 3rd parties: OCLC? Should this be community? Demand driven No domain science expertise: rely on origins Tomorrow: business case

51 a centre of expertise in data curation and preservation OCLC October 2006 3rd parties: Portico Specific area: eJournals Depends on publisher agreements No data or domain science expertise Tomorrow: commitment from Mellon + publishers + subscriptions, good funding mix

52 a centre of expertise in data curation and preservation OCLC October 2006 3rd Parties: Iron Mountain Records management IS a curation problem Organisations like this very likely to branch out No domain science expertise Tomorrow: business case, viability, stock market…

53 a centre of expertise in data curation and preservation OCLC October 2006 Institutions & the network Institutions have some fundamental sustainability Disciplines live in the network; sustainability is an issue Can we get the best of both?

54 a centre of expertise in data curation and preservation OCLC October 2006 Intersections… Institution 1 Institution 2 Institution 3 etc Discipline 1 XX Discipline 2 XX Discipline 3 XX etc

55 a centre of expertise in data curation and preservation OCLC October 2006 Who are the curation players again?

56 a centre of expertise in data curation and preservation OCLC October 2006 Project StORe findings Discipline commonality from survey (Miller, UKDA, 2006): 2-way links between data & publication useful Barriers to actual deposit of data/outputs Sharing data important, likely between colleagues Perceived inconsistency across repositories Most common searching: Google type Researchers favour self-reliance rather than library support Recognise need for common minimum metadata Aim for pilot linking middleware demonstrator Creating small scale silos of information with institutional repositories is not … a compelling information management strategy in the Google age (Heery & Anderson for JISC, 2005)

57 a centre of expertise in data curation and preservation OCLC October 2006 Sustainability: tomorrow is the emerging worry Sustainability work package in DCC (new grant!) JISC/NDIIPP meeting addressed it AHRC report draft soon Research Information Network report draft JISC study on sustainable IT systems for HE Recent ARL/NSF workshop, NSF strategy

58 a centre of expertise in data curation and preservation OCLC October 2006 Sustainability of what? Repository as an organisation Repository as a service Repository as a system Repositories as a network (federation?) Collections and objects supported by repositories Commit to collection: contract the manager!

59 a centre of expertise in data curation and preservation OCLC October 2006 Social factors Commitment essential… much more than anything else (cf persistent identifiers) Funder requirements express social determination Policy & grant application forms, selection criteria Monitoring essential Legal, ethical, IPR impacts all significant Public good questions Academic credit (citations?) Free-loaders (embargos?) Disciplines are different! Workforce skills: researcher, data librarian/scientist

60 a centre of expertise in data curation and preservation OCLC October 2006 Sustainability a function of... Commitment Goals Value and cost Business model Time Environment Domain knowledge and information Dimensions (how much stuff) Technical approaches Usage

61 a centre of expertise in data curation and preservation OCLC October 2006 So, tomorrow… Digital data repositories already sustained > 30 years How? Vision, leadership, commitment Libraries, archives, museums sustained 100s of years How? Aggregate value proposition Perception now under threat! Collectively we need to identify the next steps toward digital data sustainability, for tomorrow, and tomorrow, and tomorrow!

62 a centre of expertise in data curation and preservation OCLC October 2006 Macbeth again… "To-morrow, and to-morrow, and to-morrow, Creeps in this petty pace from day to day, To the last syllable of recorded time; …it is a tale Told by an idiot, full of sound and fury, Signifying nothing."

63 a centre of expertise in data curation and preservation OCLC October 2006 Mission (impossible?) To that last syllable of recorded time Keep our tales forever full of significance! Thank you


Download ppt "A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike."

Similar presentations


Ads by Google