Centre for eResearch The University of Auckland Research Data–Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eResearch 24.

Slides:



Advertisements
Similar presentations
Presentation by Priyanka Sawarkar
Advertisements

Information Professionals and Learning Object Repositories … more than just metadata quality … Sarah Currier Stòr Cùram Project Librarian JISC X4L Repository.
MAIN MESSAGE key reasons enumerated ->please read speaker notes Research. Report. Reposit. Deposit your scholarly research - it’s as easy as 1, 2, 3 id.
How to Write a Data Management Plan Gareth Cole, Data Curation Officer, Open Access Team.
Institutional Repository for CDU What’s in your bottom drawer? Ruth Quinn, Director Library and Information Access Charles Darwin University.
December 2008 MRC Data Support Services (DSS) Chris Morris 13 th February 2009 Sharing Research Data: Pioneers, Policies and Protocols The seventh cat.
Data Management What? Why? How?. 2 What do we mean by … Managing your Research (aka Data) … Ensuring physical integrity of files and helping to preserve.
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Good practice in Research Data Management Module 6: Tools, training and support.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Managing Research Data – The Organisational Challenge at Oxford James A J Wilson Friday 6 th December,
Login / Upload / Share Deposit your scholarly research - it’s as easy as 1, 2, 3 MAIN MESSAGE key reasons enumerated ->please read speaker notes id / who.
Information Architecture MMR Briefing 16 January 2014 Presenter: Dan Whitcher.
VOA3R Virtual Open Access Agriculture & Aquaculture Repository: sharing scientific and scholarly research related to agriculture, food, and environment.
Making Connections: SHARE and the Open Science Framework Jeffrey Open Repositories 2015.
Configuration Management (CM)
UVa Library Research Data Services
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
1 Knowledge & Knowledge Management “Knowledge is power” to “Sharing K is power” Yaseen Hayajneh, PhD.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Statistics New Zealand’s End-to-End Metadata Life-Cycle ”Creating a New Business Model for a National Statistical Office if the 21 st Century” Gary Dunnet.
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Series 2013 Data Management at the National Climate Change and Wildlife Science Center.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
Choosing Between Data Sharing Repositories for Engineering Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
Bill Roberts, PresDB 07 Database Preservation: A success story and an unsolved problem Bill Roberts 23 March 2007 PresDB, Edinburgh.
WGS Data management course Try-out , Hugo Besemer.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Warwick Cathro Assistant Director-General Resource Sharing and Innovation National Library of Australia Trove – a service built on collaboration OCLC Asia.
Metadata By N.Gopinath AP/CSE Metadata and it’s role in the lifecycle. The collection, maintenance, and deployment of metadata Metadata and tool integration.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
11 Researcher practice in data management Margaret Henty.
Linking Embargoed Datasets: A Plan for Improving How Research Data Can Be Shared, Linked and Tracked Arlington, VA, November 19, 2015 Anita de Waard VP.
Data Coordinators Conference – 2014 Laura Marroquin CASEWORKER/JCMS Specialist Everything New Data Coordinators Should Know.
IoT Meets Big Data Standardization Considerations
Data Management Plans PAUL H. BERN, PH.D. APRIL 3, 2014.
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
Working with your archive organization: Broadening your user community Robert R. Downs, PhD Socioeconomic Data and Applications Center (SEDAC) Center for.
Santi Thompson - Metadata Coordinator Annie Wu - Head, Metadata and Bibliographic Services 2013 TCDL Conference Austin, TX.
HETUS Pilot Group 8 Privacy procedures and ethical issues Kimberly Fisher, Centre for Time Use Research – co-ordinator External consultant Kai Ludwigs.
SciencePAD Open Software for Open Science Alberto Di Meglio – CERN.
Library and IT Services. Marc van den Berg Some UvT facts.
Using the DMPTool for data management plans Kathleen Fear February 27, 2014.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Writing a successful data management plan Kathleen Fear October 17, 2013.
Why ANDS? 16 May, 2011 Mathew Wyatt. Trends towards open data  Data science  Gov 2.0  Research 2.0  Open Science  Freedom of Information.
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
OAIS (archive) OAIS (archive) Producer Management Consumer.
TOWARDS AN ARCHITECTURE FOR NATIONAL DATA SERVICES Ian Foster Director, Computation Institute Argonne National Laboratory & The University of
Publish your Data on the Tropical Data Hub Seeding the Commons Project Australian National Data Service e-Research Centre James Cook University This work.
Committed to making the world’s scientific and medical literature
ELIXIR Core Data Resources and Deposition Databases
Institutional role in supporting open access, open science, open data
General Finnish DMP Guidance
Role of peer review in journal evaluation
Sophia Lafferty-hess | research data manager
Research Data Management
Research Data Management
Repository Platforms for Research Data Interest Group: Requirements, Gaps, Capabilities, and Progress Robert R. Downs1, 1 NASA.
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

Centre for eResearch The University of Auckland Research Data–Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eResearch 24 Symonds St

Centre for eResearch The University of Auckland Outline The need The approach The progress to date

Centre for eResearch The University of Auckland

Centre for eResearch The University of Auckland

Centre for eResearch The University of Auckland

Centre for eResearch The University of Auckland What do we need? Data storage services that are reliable And that are quick to provision Clarity over what we are entitled to Services that integrate well into research workflows and practices

Centre for eResearch The University of Auckland What do others need? Open data Discoverable data Well documented or self-describing data Reliable data (or at least the quality is assessed) Data that can automatically configure itself to our needs

Centre for eResearch The University of Auckland Turning the question around Not: How do I want to describe the data? But rather: What does a future data consumer need to know? So, as well as asking: How do we share our data? …we should also be asking: What kinds of data descriptions are demonstrably useful to facilitate data reuse?

Centre for eResearch The University of Auckland

Centre for eResearch The University of Auckland Andrew Treloar, Australian National Data Service (ANDS)

Centre for eResearch The University of Auckland Storage Solutions Backup Service: taking a snapshot of the current stare, just in case… File Sharing Service: (like DropBox) for collaborative work on ‘active data’ Data Publishing Service: (like figshare) to publish & discover data, capture metadata and track impact Archive Service: planned, long-term data preservation for important resources Fast Data Transfer for accessing remote science equipment

Centre for eResearch The University of Auckland Data Lifecycle Services Creating Data Management Plans (DMPs): good practice, and increasingly required by funders Data Ethics Advice: privacy, encryption, access considerations, disposal Database Design: for ease of data management and preservation Data Publication advice: including metadata, ownership and licensing

Centre for eResearch The University of Auckland Data Management Plans (DMPs) They describe how (and when) you will take care of, and share, your data They show funding agencies they can trust you in turning their money into data They help IT support groups understand your needs They allow the institution to know what we have, and when we can delete it.

Centre for eResearch The University of Auckland

Centre for eResearch The University of Auckland figshare

Centre for eResearch The University of Auckland Discover data

Centre for eResearch The University of Auckland Manage data

Centre for eResearch The University of Auckland Publish data

Centre for eResearch The University of Auckland Long term Research Data Challenges 1.Storing unprecedented volumes of data (and accelerating) – Data production passed storage capacity in 2007 – Cost differential is increasing, Rate of data production is increasing 2.Describing what we have in ways that are helpful to future users (and our future selves) – Metadata and Semantics for describing content (this tends to be producer-focused) – But also use-case metadata and emergent relationships (tends to be consumer-focused) 3.Finding what we need, in the context of our current task – semantically-enabled search engines that can use the above descriptions, (ideally from within analytical tools and workflows) 4.Working out what we do not need to keep – Because it will not be used again or offers no ‘information gain’ – Because it is easier to recreate than to store 5.Governing data collections well, within their communities of use – effective governance of data resources – quality control strategies, including peer review and rewarding excellence

Centre for eResearch The University of Auckland Manifesto for Open Science 1.Remove restrictions on data re-use 2.Data and metadata should be persistent and linked 3.Build or learn strong descriptions of data to aid automated discovery and human comprehension 4.Expose the provenance and uncertainty where possible 5.Develop indicators of data quality 6.Encourage and support secondary use of data – Provide a contextual measure of Fitness For Purpose (FFP), to connect researchers with useful resources

Centre for eResearch The University of Auckland Questions?

Centre for eResearch The University of Auckland

Centre for eResearch The University of Auckland 1. Online submission of data for publication with basic metadata *2. Editor verifies that the data is within the scope of the collection 3. Automated tools check data for obvious omissions and errors. 4. Online tools ingest and integrate data & generate tables of statistics **5. Potential errors and omissions reported to data author and/or editors 6. Data author acts on this feedback 7. Automated checks verify that data set is complete and standardised. ***8. Data editor confirms that resubmitted data and metadata are correct 9. Independent peer review of data 10. Author responds to referees’ comments 11. Editor makes a publishing decision based on quality standard achieved by data set, (including reject and revise and resubmit). ****12. Data and metadata are published online. The data has its own identity that tracks its use, or is integrated into the authoritative subject databases *****13. Papers are published that consumed the data and any errors found have been corrected Biodiversity data should be published, cited, and peer reviewed Mark J. Costello1,, William K. Michener2, Mark Gahegan3, Zhi-Qiang Zhang4, Philip E. Bourne5

Centre for eResearch The University of Auckland Storage devices: Cost vs Value vs Use $0.60/GB $1.2m 80K-100K $0.026/GB $52K 0 $0.05/GB $100K 200 >$1/GB $2m 250K-2m $0.12/GB/pa $240K/pa 0 2PB = IOPS Tim Chaffe, UoA Chief Architect: Valid as at 8/4/2015 with the proviso that this market is changing monthly 3D 2.5” SSD form factor, 10TB, 20TB by 2017 $0.05/GB Also 3D mSSD form factors aiming for 2TB (laptop dimensions) Near Future