Presentation is loading. Please wait.

Presentation is loading. Please wait.

A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.

Similar presentations


Presentation on theme: "A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike."— Presentation transcript:

1 a centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.http://creativecommons.org/licenses/by-nc- sa/2.5/scotland/ Standing on the Shoulder? Curation and the Record of Science Chris Rusbridge JISC/CNI 2006

2 a centre of expertise in data curation and preservation JISC/CNI 2006 Contents Curation Sustainability Data resources Context Access and re-use Citation, archiving and preserving Breaking news: OAIS Review

3 a centre of expertise in data curation and preservation JISC/CNI 2006 If I have seen a little further it is by standing on the shoulders of giants Newtons letter to Hooke (1676); possibly a snide remark linked to Hookes stature - attributed to Bernard of Chartres by John of Salisbury, 1159 (Metalogicon) Citation of evidence base fundamental

4 a centre of expertise in data curation and preservation JISC/CNI 2006 Curation Data increasingly important as evidence Experimental verifiability (the basis of science) Unrepeatable observations & experiments (particularly environmental in broadest sense) Legal, compliance & transactions Cultural resources For evidential value, data must be curated

5 a centre of expertise in data curation and preservation JISC/CNI 2006 Curation Maintaining and adding value to a trusted body of digital information for current and future use

6 a centre of expertise in data curation and preservation JISC/CNI 2006 Lynch remarks Closing the 2005 Curation Conference 3 views of digital curation Collection as a living thing Whole life process, evolving object(s) Finite process, handover to preservation

7 a centre of expertise in data curation and preservation JISC/CNI 2006

8 a centre of expertise in data curation and preservation JISC/CNI 2006 Sustainability and exit strategy Most critical resource for curation: present and future money supply! Plan for the long term, but have a succession plan Sustained approach not project mentality

9 a centre of expertise in data curation and preservation JISC/CNI 2006 Data resource stages Curated data is created… Observations? Fixed! Or Acquired… Data brought/bought from outside Ingest Development Derived, refined, combined, processed data Potentially many stages

10 a centre of expertise in data curation and preservation JISC/CNI 2006 NASA University research group1 research group3 local decision- making body University research group2 Slide from Rajendra Bose

11 a centre of expertise in data curation and preservation JISC/CNI 2006 Some illustrations: UK census 1881 census (UKDA) Hand-written individual return forms: data conversion issue (reference form available): digitisation and access issues 1961 census (TNA/NDAD) First using computers to analyse (first major UK-wide computer project?); individual returns closed until 2062: data preservation issue!!! 2001 census (ONS/CDU) Data corrections and adjustments: curation issue

12 a centre of expertise in data curation and preservation JISC/CNI 2006 Khosrow Hejazian

13 a centre of expertise in data curation and preservation JISC/CNI 2006 Student databases Glasgow: 1960s flat files Converted to Indexed Sequential Converted to IDMS-X ~1983 Converted to Ingres ~1994 still current All students since 1960s All prior students who have returned All General Council <100 years Think of what has changed in that time! Faculties, depts, grade structures, regulations… Curation problem!

14 a centre of expertise in data curation and preservation JISC/CNI 2006 Another university Also 3rd or 4th generation system Previous data not carried forward Available on tapes Lets hope they are properly looked after, re- tensioned, metadata & documentation available… Dataset preservation nightmare! (Urban myth? Told by senior manager!)

15 a centre of expertise in data curation and preservation JISC/CNI 2006 Curation of s Lots of metadata and context (RFC 822) Often highly distributed Split conversations Unknown numbers of copies Personal choice of clients Legal requirements! Controlled filing and controlled deletion needed…

16 a centre of expertise in data curation and preservation JISC/CNI 2006

17 a centre of expertise in data curation and preservation JISC/CNI 2006 SDSS (Visual) TWOMASS (Infrared) Slide from Rajendra Bose

18 a centre of expertise in data curation and preservation JISC/CNI 2006 Slide from Rajendra Bose

19 a centre of expertise in data curation and preservation JISC/CNI 2006 Example… National Virtual Observatory Johns Hopkins press release: Scientists working to create the NVO, an online portal for astronomical research unifying dozens of large astronomical databases, confirmed discovery of [a] new brown dwarf recently. The star emerged from a computerized search of information on millions of astronomical objects in two separate astronomical databases. Thanks to an NVO prototype, that search, formerly an endeavor requiring weeks or months of human attention, took approximately two minutes.

20 a centre of expertise in data curation and preservation JISC/CNI 2006 Context Data meaningless without context Linkage Metadata of many kinds Workflow! Provenance Computational lineage Authenticity

21 a centre of expertise in data curation and preservation JISC/CNI 2006 Access and re-use Ethics and rights control access Weak in expressing this long-term Collaboration tools Annotation, discussion, review Re-use leading to change and development Publication Not just in print Underlying data should be published, too Citation…

22 a centre of expertise in data curation and preservation JISC/CNI 2006 Citation OWL Web Ontology Language Reference W3C Proposed Recommendation 15 December 2003 This version : Latest version: Previous version: Needs a stable resource to cite…

23 a centre of expertise in data curation and preservation JISC/CNI 2006 Citation… The date alone (as in common web citation approaches) is not enough! Cited object likely to have changed… Citation should link to the cited object as it was! [6] The CIA World Factbook. Retrieved on 8 Jan 2006.

24 a centre of expertise in data curation and preservation JISC/CNI 2006 Citation needs… An efficient way to reference and access archived past states of a changing dataset (work in progress, Buneman et al) Less important for original observations Dont mess with those data Less important for incremental datasets Later stuff should not invalidate earlier Very important for revisable datasets Eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change

25 a centre of expertise in data curation and preservation JISC/CNI 2006 XML Archiver Relational Database XML Archive at time t - 1 XML Archive at time t XMLArch: System Architecture Pre-processor Version Merger Data Extractor XML Snapshot at time t Carwyn Edwards

26 a centre of expertise in data curation and preservation JISC/CNI 2006 Preservation Use preserves Money preserves Redundancy good, monoculture bad? LOCKSS-type & other approaches… Bits are fragile and robust Dont rely on portable media Look after them well Technology changes… How fast? What impact? Metadata matters! (Know what youve got)

27 a centre of expertise in data curation and preservation JISC/CNI 2006 Preservation We cant do it alone Collective responsibility We cant rely on anyone else Institutional responsibility

28 a centre of expertise in data curation and preservation JISC/CNI 2006 Its about time… From the very short Good management (dont under-estimate but dont over-estimate) Through the medium term Curation: use it or lose it Gather ye metadata while ye may! Preservation relay To the very long term High commitment, high cost, high risk Harder to do en masse

29 a centre of expertise in data curation and preservation JISC/CNI 2006 OAIS Announcement of a Comment Period for the Five Year Review of the Reference Model for an Open Archival Information System (OAIS) Standard … must be reviewed every five years and a determination made to reaffirm, modify, or withdraw the existing standard. …any revision must remain backward compatible with regard to major terminology and concepts. … we do not plan to expand the general level of detail … reduce ambiguities and fill in any missing or weak concepts Make suggestions and express interest until 30/10/06

30 a centre of expertise in data curation and preservation JISC/CNI 2006 Are we standing on the hard shoulder (the road side) waiting for a ride? Or are we supporting the shoulders of giants (building the evidence bases for future science)?


Download ppt "A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike."

Similar presentations


Ads by Google