An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
Engaging repository policy with preservation Steve Hitchcock and Neil Jefferies* Preserv 2 Project School of Electronics and Computer Science (ECS), Southampton.
IRs: towards preservation services Steve Hitchcock Preserv Project Intelligence Agents Multimedia Group, School of Electronics and Computer Science (ECS),
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
New Developments in Library and Archives Canadas ETD Program 11 th International Symposium on ETDs Aberdeen, Scotland, June 5, 2008 Sharon Reeves, Manager,
A Future for UK theses, University of London, Senate House, 22-Jan-2004 E-thesis submission workflow issues Simon J. Bevan Information Systems Manager.
Electronic theses - the next stage, 27-Sep-2004, The British Library E-thesis submission – a case study Simon J. Bevan Information Systems Manager Cranfield.
Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
Pulling it all together… with thanks to Sheila Anderson.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
Data Management Planning Tool: connecting researchers to resources University of California Curation Center Team California Digital Library July 14, 2011.
Merritt: A Micro-Services-Based Curation Repository University of California Curation Center California Digital Library November 18, 2010.
Niagara Portal Introduction January 2007 Scott Muench - Technical Sales Manager.
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
A Micro-Services-Based Approach for Curation and Preservation Solutions Stephen Abrams Patricia Cruse John Kunze Perry Willett University of California.
Cost Modeling for Sustainable Services Stephen Abrams Patricia Cruse John Kunze University of California Curation Center California Digital Library Preservation.
Welcome to the Minnesota SharePoint User Group. Agenda Quick Intro Announcements and News Document Management Content Types Records Management Q&A.
Preserving E-Prints: Scaling the Preservation Mountain Sheila Anderson, Arts and Humanities Data Service Stephen Pinfield, University of Nottingham.
INSTITUTIONAL REPOSITORIES UPDATE Joan K. Lippincott, Coalition for Networked Information (CNI) Simon Fraser University June 26, 2009.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
ICDL-Contentra Workshop 29 th November /11/2013 Contentra Technologies Confidential (RajuB)1.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
The Merritt Curation Repository Features, Uses, and Benefits University of California Curation Center California Digital Library UC Berkeley, August 13,
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Merritt Fixity Authenticity for Managed Digital Assets University of California Curation Center California Digital Library April 7, 2011.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
Use of METS in CDL Digital Special Collections Brian Tingle.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
ArcGIS Workflow Manager An Introduction
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Libraries as Partners in Research: the UC Curation Center’s Tools and Services UC3 Team University of California Curation Center California Digital Library.
Trends & Challenges in Digital Object Storage Infrastructure: Notes from the National Digital Stewardship Alliance (NDSA) Infrastructure Working Group.
Curation Micro-Services “It’s a Series of Tubes” Curation Micro-Services “It’s a Series of Tubes”
UC3 Standards and Best Practices for Datasets and Other Supplemental Journal Article Materials UC3 Stephen Abrams Patricia Cruse John Kunze.
A disaggregated model for preservation of E-Prints Gareth Knight SHERPA DP Project Arts and Humanities Data Service.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
U.S. Department of the Interior U.S. Geological Survey CDI Webinar Sept. 5, 2012 Kevin T. Gallagher and Linda C. Gundersen September 5, 2012 CDI Science.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
DMPTool for Data Management Plans Sherry Lake Senior Scientific Data Consultant University of Virginia Library Laine Farley Executive Directory California.
April 10, 2009CDL Users Council1 Digital Curation Services at CDL Perry Willett Digital Preservation Project Manager California Digital Library.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
Tools and Services for Managing Research Patricia Cruse University of California Curation Center California Digital Library.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
UC3 Services In-Depth: Data Curation for Practitioners 2012 Workshop.
Millman—Nov 04—1 An Update on Digital Libraries David Millman Director of Research & Development Academic Information Systems Columbia University
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
A Technical Overview Bill Branan DuraCloud Technical Lead.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
An Introduction to EZID University of California Curation Center Team California Digital Library August, 2011 UC3 Summer Webinar Series.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Joseph JaJa, Mike Smorul, and Sangchul Song
An Introduction to the Merritt Curation Repository
CNI Spring 2010 Membership Meeting
Presentation transcript:

An Introduction to the Merritt Curation Repository University of California Curation Center Team California Digital Library June 9, 2011 UC3 Summer Webinar Series

First, a word about the webinar series… A forum for timely topics of interest to the UC community – Highlighting projects, services, and developments in the areas of digital preservation, web archiving, and data curation – Intended to raise awareness of issues, and provide information on useful resources and services available to the UC community – 2nd and 4th Thursday of the month, and as scheduled, featuring UC3 staff and UC librarians, content managers, and technologists Teleconference+1 (866) , access code # Webconferencehttp://bit.ly/jdjMAPhttp://bit.ly/jdjMAP Teleconference+1 (866) , access code # Webconferencehttp://bit.ly/jdjMAPhttp://bit.ly/jdjMAP

First, a word about the webinar series… Some logistics… – Participant phones will be muted during the formal presentation, but we will be monitoring the online chat – Slides, Q & A, and web and voice recordings will be posted after each presentation – Schedule available at – Please suggest additional topics! – Take the short survey

Now on with the show… Today’s topic is an introduction to the Merritt curation repository – Who is it for? – What can it do? – Why use it? – What does it cost? – Next steps? – Q & A

What keeps you up at night? Are there standards or best practices I should be aware of? How much will it cost? How can I transfer my content to an appropriate curation environment How do I know my content is safe? What’s the best strategy to ensure permanent availability? Do I need to create new derivatives just for preservation purposes? How can I get a persistent reference to my content? What if my content needs to evolve over time? Can I control who can see my content? I have a good discovery platform; how can I add preservation services?

“There’s an app for that” Are there standards or best practices I should be aware of? How much will it cost? How can I transfer my content to an appropriate curation environment How do I know my content is safe? What’s the best strategy to ensure permanent availability? Do I need to create new derivatives just for preservation purposes? How can I get a persistent reference to my content? What if my content needs to evolve over time? Can I control who can see my content? I have a good discovery platform; how can I add preservation services? Automatic replication and high-availability redundancy Periodic fixity audit Simple submission UI/API METS “feeder” duplicates existing DPR workflow Simple submission UI/API METS “feeder” duplicates existing DPR workflow Model free No packaging, format, or metadata requirements Model free No packaging, format, or metadata requirements Strongly versioned Integration with EZID and DataCite Curator-defined access control rules Modular micro- services “toolkit” UC3 consultation Storage at $1.04/GB/year

Merritt repository Merritt is available for use by all members of the UC community – Libraries/archives/museums – ORU/MRUs – Faculty/staff Centrally hosted by UC3/CDL on behalf of the UC community – Economies of scale – Shared experience and expertise Mediated through campus libraries

Modes of use: dark archive Pro-active preservation, but no expectation of direct end user access – Legacy DPR content contributed by campus libraries – Cultural heritage texts, master images, sound, moving image, data sets – All DPR content will be automatically migrated to Merritt

Modes of use: bright archive Provide preservation and end user access – NIH Healthy Pathways project on bio-demographics Multi-institutional: UC Davis, University of Colorado, University of Virginia, Syddansk University (Denmark) Need to restrict access to project partners initially, with eventual public access

Modes of use: bright archive Content discovery: search

Modes of use: bright archive Content discovery: search

Modes of use: bright archive Content discovery: browse

Modes of use: bright archive Content discovery: browse

Modes of use: preservation “back end” Preservation only; content discovery/delivery provided by well-known external systems – Using direct hooks into Merritt to retrieve content – eScholarship Open access publishing – Open Context Archaeological data publishing – Investigating integration with Islandora/Drupal and Alfresco

Modes of use: distributed data grids DataONE “Enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it”

More information Online help FAQ User’s guide UC3 contact

Merritt cost model UC3 provides technical infrastructure, data center hosting, staff, monitoring, maintenance, enhancements, help, outreach, consultation, etc. Contributors are charged only for storage used, at the UC3 recovery rate of $1.04/GB/year Developing an “endowment” model: Pay once, preserve forever Will soon extend model for non-UC contributors How does this compare? Cost of a physical book in RLF † $ 4.62/year Cost of a digital book in HathiTrust ‡ $ 0.15/year Cost of a digital book in Merritt$ 0.06/year How does this compare? Cost of a physical book in RLF † $ 4.62/year Cost of a digital book in HathiTrust ‡ $ 0.15/year Cost of a digital book in Merritt$ 0.06/year † Gary Lawrence (2007) Internal analysis, CDL; ‡ Paul Courant and Matthew Nielsen (2010), On the cost of keeping a book, HathiTrust.

Average collection sizes and costs CollectionObjectsSizeAnnual cost CA DOE reports 8, GB$ Cal Cultures GB$ eScholarship46, GB$ A “cost calculator” spreadsheet is available at A “cost calculator” spreadsheet is available at

Average ETD size and cost CampusETD titlesSizeAnnual cost Berkeley GB$ Davis GB$ Irvine GB$ 6.30 Los Angeles GB$ Riverside GB$ 3.10 San Diego GB$ 9.02 San Francisco * GB$ 9.05 Santa Barbara GB$ 5.25 Santa Cruz GB$ 2.50 Based on 2009 holdings in ProQuest * UCSF based on total ETD holdings in Merritt

Average research data size and cost Almost 50% of all research data is less than 1 GB Source: Science 331:6018 (February 11, 2011): SizePercentageAnnual cost < 1 GB48.3 %< $ – 100 GB32.0 % $ 1.04 – GB – 1 TB12.1 % $ – 1, > 1 TB 7.6 %> $ 1,040.00

Next steps UC3 is working with campus partners to determine ongoing development and collection priorities Annotation Notification Transformation Characterization Fixity / Linked data Replication IdM/Authn/Authz Ingest, Access Inventory, Queuing Storage and Identity Technology watch Metadata standards Policy and business model Data management guidelines Object and collection modeling New content acquisition

Next steps In production Model-free objects Submission via UI and API Persistent identifiers Format identification Version provenance Automated replication Automated fixity audit Role-based access control Collections Semantic index and search Object/version/file download In progress Simplified update Enhanced characterization (JHOVE2) Faceted search and browse (XTF) CMS/DAMS-like function (Islandora) In planning Simplified batch UCTrust integration Linked data Transformation Notification Annotation Support for NGTS/DLSTF recommendations We welcome your feedback on needs and priorities! We welcome your feedback on needs and priorities!

Simplified update Variant form of object update requiring the submission of only the changed components Client-side tools to simplify the creation of batch manifests #%checkm_0.7 #%profile | #%prefix | mrt: | #%prefix | nfo: | #%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hash | m | md5 | #%eof #%checkm_0.7 #%profile | #%prefix | mrt: | #%prefix | nfo: | #%fields | nfo:fileUrl | nfo:hashAlgorithm | nfo:hash | m | md5 | #%eof

Enhanced characterization JHOVE2 next-generation framework for format- aware characterization – Automated extraction and inference of extensive technical metadata significant for preservation analysis and planning "Module": { "scope": "ICCModule“, "Header": { "scope": "ICCHeader“, "ProfileSize": { "unit": "byte“, "value": },"ProfileVersionNumber": " “,"ProfileDeviceClass_raw": "spac“,"ProfileDeviceClass_descriptive": "ColorSpace Conversion profile“,"ColourSpace_raw": "RGB “,"ColourSpace_descriptive": "rgbData“,"ProfileConnectionSpace_raw": "Lab “,"ProfileConnectionSpace_descriptive": "labData“

Enhanced discovery via XTF eXtensible Text Framework – CDL developed/supported open source discovery platform – Robust, scalable faceted search and browse

CMS/DAMS-like function Many campuses are looking for CMS/DAMS solutions Investigating integration with Islandora to provide a Drupal CMS/DAMS front-end to Merritt

Questions?

Upcoming webinars Date/timeTopic Wednesday, June 15 12:30 pm Data Sharing by Scientists: Practices and Perceptions Carol Tenopir, Univ. Tennessee Mike Frame, USGS Thursday, June 30 2:00 pm The Data Management Planning Tool (DMP Tool) Trisha Cruse, UC3 Thursday, July 14 2:00 pm Data as Publication John Kunze, UC3 Catherine Mitchell, CDL Publishing Program Thursday, July 28 2:00 pm Merritt: Depositing Content and Providing Access Thursday, August 11 2:00 pm DCXL (Data Curation Excel) Please take the webinar survey Please take the webinar survey

For more information UC Curation Center Stephen AbramsMargaret Low Lisa ColvinDavid Loy Patricia Cruse Mark Reyes Scott Fisher Tracy Seneca Erik Hetzner Joan Starr Greg Janée Marisa Strong John Kunze Perry Willett UC3 webinar series Merritt repository