A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.

Slides:



Advertisements
Similar presentations
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
Advertisements

S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
A centre of expertise in data curation and preservation EAOLUG :: RSC :: Cambridge23 May 2006 Funded by: This work is licensed under the Creative Commons.
Breakout 1 Socio-legal etc. Every discipline will be different & each data centre will have different answers to questions. Use a questionnaire and send.
A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation UKOLN Open ForumIWMW June 2006 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
A centre of expertise in data curation and preservation National FoI Group Birmingham07 March 2007 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
A centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike.
A centre of expertise in data curation and preservation MIS Seminar :: University of Edinburgh :: 2 October 2006 Funded by: This work is licensed under.
Beyond Publication A passage through Project StORe Graham Pryor, University of Edinburgh.
1 The Australian Partnership for Sustainable Repositories Margaret Henty Digital Futures Industry Briefing November 8, 2006.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Research Data Service at the IT Pro Forum HEIDI IMKER, DIRECTOR.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
The British Library’s METS Experience The Cost of METS Carl Wilson
A centre of expertise in data curation and preservation Digital Curation Centre/ Edinburgh eScience Collaborative Workshop – 12th June 2008 Funded by:
© HATII, University of Glasgow Introduction to the UK ’ s Digital Curation Centre Prof Seamus Ross Visiting Fellow at Oxford Internet Institute ,
Presenter Name Hosting Institution Date OPENNESS: CONTRIBUTE, ACCESS, USE ACRL Scholarly Communications Roadshow: From Understanding to Engagement.
A centre of expertise in digital information management UKOLN is supported by: Building Capacity and Capability for Data : Requirements,
Good practice in Research Data Management Module 6: Tools, training and support.
Roundtable : Digital Resources humanities data, digital libraries and eScholarship – partnerships and purposes Digital and eScholarship Services, University.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Challenges & opportunities in the preservation of (digital) information: the case of European research libraries Museo de las Ciencias Teatro de UNIVERSUM.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
… because good research needs good data DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton Funded by: This work is licensed.
Research Services Introduction to research data management - a humanities case study Slides provided by DaMaRO Project, University of Oxford.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Supporting further and higher education The UK FAIR Programme: OAI in context Chris Awre OAI3, CERN, February 2004.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Ensuring access to the record of science: driving changes in the role of research libraries APE2014 Berlin, 29 th January Susan Reilly Projects Manager.
IS 201 Information Structures Winter 2000 Information Characteristics, Uses, and Management Instructors: Jonathan Furner & Anne Gilliland-Swetland.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Peter Burnhill Director (Phase One) Funders: Aims & Organisation Digital Curation Centre a centre of expertise in data curation and preservation.
Preservation Strategies: Emerging standards for preservation Ronald Weaver National Snow and Ice Data Center Version 1.0 Review Date.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Data Practices across Disciplines: Informing Collections & Curation Carole L. Palmer Melissa H. Cragin, Tiffany Chao, & Nic Weber Center for Informatics.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
The KB e-Depot long-term preservation of scientific publications in practice Marcel Ras, National library of The Netherlands.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Symposium on Global Scientific Data Infrastructures Panel Two: Stakeholder Communities in the DWF Ann Wolpert, Massachusetts Institute of Technology Board.
A centre of expertise in data curation and preservation ICA-SUV Seminar :: September 2006 ::Reykjavík Funded by: This work is licensed under the Creative.
A centre of expertise in digital information management 1 UKOLN is supported by: Approaches to Archiving Professional Blogs Hosted in the.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
April 14, 2005MIT Libraries Visiting Committee Libraries Strategic Plan Theme III Work to shape the future MacKenzie Smith Associate Director for Technology.
11 Researcher practice in data management Margaret Henty.
Institutional Repositories July 2007 Intellectual property management : the DISA experience Dr D Peters DISA: Digital Innovation South Africa.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
A Training Program for Shareable Metadata Metadata for You & Me is a collaboration between the University of Illinois Library and Indiana University. This.
Preservation of Digital Data by Christian Wellner Based on: Howard Besser. Digital longevity. In: Maxine Sitts (ed.) Handbook for Digital Projects: A Management.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Introduction to Research Data Management Joy Davidson and Sarah Jones Digital Curation Centre
| 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services May 20, 2016 Publishing The Full Research Cycle To Support.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
Acknowledgments Funding provided by the Jewett Foundation Introduction Data collected in ocean sciences, whether generated from research or operational.
8 November 2012, Penn State Harrisburg Linda Friend University Libraries Publishing & Curation Services.
Publish your Data on the Tropical Data Hub Seeding the Commons Project Australian National Data Service e-Research Centre James Cook University This work.
Open Exeter Project Team
Paolo Budroni, University of Vienna
Moving on : Repository Services after the RAE
Introduction to Metadata
VI-SEEM Data Repository
Introduction to Research Data Management
Research Data Management
Presentation transcript:

a centre of expertise in data curation and preservation Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA. Curation of Scientific Data: Challenges for Institutions & their Repositories Chris Rusbridge The Adaptable Repository 3 May 2007, Sydney

a centre of expertise in data curation and preservation APSR 2007 Contents Science and digital curation Why are data important? What kinds of data? What to do with data: frontiers of practice Repository challenges Changing practice

a centre of expertise in data curation and preservation APSR 2007 Digital Curation Centre Mission “The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation”

a centre of expertise in data curation and preservation APSR 2007

a centre of expertise in data curation and preservation APSR 2007 Records of science Data increasingly important as evidence Key part of the scholarly record (public good) Unrepeatable observations & experiments Experimental verifiability (the basis of science) Would Chang retractions have been reduced if his first data were available? Allows additional interpretations Legal and compliance See APSR/AERES report for good examples CHANG, G., ROTH, C. B., REYES, C. L., PORNILLOS, O., CHEN, Y.-J. & CHEN, A. P. (2006) Retraction of Pornillos et al., Science 310 (5756) Retraction of Reyes and Chang, Science 308 (5724) Retraction of Chang and Roth, Science 293 (5536) Science Magazine,

a centre of expertise in data curation and preservation APSR 2007 What kinds of data? Observations eg UARS (Upper Atmosphere) Level 0: telemetry UARS Level 1: measured physical parameters (post calibration?) Derived data UARS Level 2: calculated geophysical? profiles UARS level 3: gridded, interpolated? Combined data Crafted data Eg annotated gene/protein databases Descriptive (meta)data

a centre of expertise in data curation and preservation APSR 2007 Retaining research data means… Data secure against loss (within group) Communal repository (secure data store) Re-usable, sharable information As above, plus active curation (eg bio- informatics) Long term preservation of information Be clear what you are trying to do!

a centre of expertise in data curation and preservation APSR 2007 … or the data trajectory is… Hard drive  lost (crash) Hard drive  DVD  Cardboard box  Loft  Skip/dumpster  lost Sometimes this is a very bad thing Sometimes these are the right options! © Marita Bushell

a centre of expertise in data curation and preservation APSR 2007 Long term bit storage… A solved problem? Just requires well- understood good data management practices? Wrong! For very large datasets over very long time, there are significant problems… BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T. J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys '06. Leuven, Belgium, ACM.

a centre of expertise in data curation and preservation APSR 2007 How Well Must We Preserve? Keep a petabyte for a century – With 50% chance of remaining completely undamaged Consider each bit decaying independently – Analogy with radioactive decay That's a bit half­life of 10**18 years – One hundred million times the age of the universe That's a very demanding requirement – Hard to measure – Even very unlikely faults will matter a lot Slide from David Rosenthal, LOCKSS

a centre of expertise in data curation and preservation APSR 2007 What to do about curation Build curation/reusability into science workflow Curation begins before creation What’s easy at first becomes (impossibly) hard later Describe data (metadata schemas, “representation info”, etc) Keep experimental parameters (technical, who, what, when, where) Keep ability to process Keep data!

a centre of expertise in data curation and preservation APSR 2007 What to do about curation - 2 Use standard/agreed formats for data Make ownership & restrictions clear, & explain how to cite data Offer for deposit in institutional or discipline repository Appraisal and selection essential Possible time-limited embargos “Publish” data in support of articles

a centre of expertise in data curation and preservation APSR 2007 Internet Archaeology: publication with data

a centre of expertise in data curation and preservation APSR 2007 Database as book… Buneman (early pilot) work on IUPHAR database MySQL to XML database Historic to logical schema XML via XSLT to LaTeX

a centre of expertise in data curation and preservation APSR 2007 The StORe vision Seamless transport from research data to research publications and vice versa Bi-directional links proven in social science e-research but capable of export to other disciplines Slide from Graham Pryor

a centre of expertise in data curation and preservation APSR 2007 What are the reusability issues? Data not neutral to hypothesis Hard to know the risks & pitfalls of a particular dataset Data not self-describing: hard to find appropriate data (but see Murray-Rust on Googling InChI etc) Hard to “understand” data once found Really need information, not data! Hard to use data once understood

a centre of expertise in data curation and preservation APSR 2007 Context Data meaningless without context Metadata of many kinds Representation information… from data to information Linkage and connection between datasets Provenance Authenticity/integrity Computational lineage

a centre of expertise in data curation and preservation APSR 2007 Access and re-use Ethics and rights control access Weak in expressing this long-term Collaboration tools Annotation, discussion, review (see DART…) Re-use leading to change and development “Publication” Not just in “print” Underlying data should be “published”, too

a centre of expertise in data curation and preservation APSR 2007 Data citation issues… Citation for human readers and machine use cases Granularity: database, record, item Citation of changing objects Version change (eg W3C practice: no version = latest, vs bibliographic: no version = first) An efficient way to reference and access “archived” past states of more rapidly changing dataset, eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change (work in progress, Buneman et al) Standards conflict and immature (NLM best?) Citation ESSENTIAL for motivating quality academic work on data management and curation

a centre of expertise in data curation and preservation APSR 2007 Who does data curation? Individuals Departments or groups Institutions, often through libraries Communities Disciplines Publishers National services Other 3rd parties…

a centre of expertise in data curation and preservation APSR 2007 Who are the curation players?

a centre of expertise in data curation and preservation APSR 2007 Repository challenges Data are different: you’ll need some domain knowledge Appraisal/selection harder Broader range of formats Appropriate “standards” for longevity? XML-based? What metadata are needed? Descriptive, to find the dataset Context and background Provenance “Representation information” to connect data to information (whatever gives meaning to data)

a centre of expertise in data curation and preservation APSR 2007 Data from MIT DSpace Political Science

a centre of expertise in data curation and preservation APSR 2007 Repository challenges - 2 May distort your repository Size Number of objects Rate of deposit Nature of use Databases may be dynamic Databases may need to be accessed in situ Rights and ethical limitations hard to describe and enforce Need to build links to publications (cf StORe) Need to build discipline links across repositories…

a centre of expertise in data curation and preservation APSR 2007 Cultural change If we build it, will they come? NO!! Outreach important: communication with scientists and researchers is hard graft Cultural change to new approach requires more: Incentives, rewards and mandates Successful exemplars (well publicised) Discipline-oriented approach (one size does not fit all)

a centre of expertise in data curation and preservation APSR 2007 Australian context? In the emerging context of the Research Quality Framework, and the expected National Collaborative Research Infrastructure Strategy, curation can only increase in importance!

a centre of expertise in data curation and preservation APSR 2007 Thank you