Long-term preservation aspects in the eSciDoc project Natasa Bulatovic Max-Planck Digital Library

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
CNRIS CNRIS 2.0 Challenges for a new generation of Research Information Systems.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Using eSciDoc Turn-key access?. Lodewijk Bogaards Software Architect and project leader Easy On Fedora DORSDL2 Generic a.relating to or characteristic.
Versioning of Digital Objects in a Fedora-based Repository Matthias Razum FIZ Karlsruhe DORSDL Workshop Alicante September 21, 2006.
Kristin Eberle Monica Hampton Carmen Velasquez Kristin Eberle Monica Hampton Carmen Velasquez Knowledge Management.
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
This work is licensed under a Creative Commons Attribution 2.0 Germany License eSciDoc, VIRR and Digitization.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
SobekCM’s Community Ecosystems & Socio-Technical Practices Presented by Mark V. Sullivan June 10 th, 2014 Sobek image created by Jeff Dahl and is shared.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
DSpace. TM 2 Agenda  Introduction to DSpace  DSpace community  Institutional Repository  Easy to add/find content in DSpace  Building Online Communities.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
One Body, Many Heads for Repository-Powered Digital Content Applications Hydra Europe Symposium, Trinity College, Dublin, 7 th April 2014 Chris Awre Head.
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
This work is licensed under a Creative Commons Attribution 2.0 Germany License eSciDoc NIMS Malte Dreyer.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
JISC Information Environment Service Registry (IESR) Ann Apps MIMAS, The University of Manchester, UK.
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DSpace - Digital Library Software
Interoperability How to Build a Digital Library Ian H. Witten and David Bainbridge.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
FACES General Overview ViRR (Virtueller Raum Reichsrecht) Software Solutions Kristina Büchner and Bastien Saquet Contact:Kristina Buechner:
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
21st October 2008 eSciDoc – A Service Infrastructure for Cultural Heritage Content VSMM 2008 – Digital Archives Online Natasa Bulatovic, Ulla Tschida,
About FACES Collection of images of adult emotional facial stimuli (171 women and men) 6 emotions: neutrality, sadness, disgust, fear, anger and happiness.
An Overview of Data-PASS Shared Catalog
VI-SEEM Data Repository
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
VI-SEEM Data Repository
eSciDoc –Object and content modelling experiences
eSciDoc Abstractions, Functionality and Services
eSciDoc – Service infrastructure and solutions development
eSciDoc – Content model requirements
ESciDoc Overview Malte Dreyer.
ESciDoc Introduction M. Dreyer.
ESciDoc Introduction M. Dreyer.
Malte Dreyer – Matthias Razum
Institutional Repositories
Presentation transcript:

Long-term preservation aspects in the eSciDoc project Natasa Bulatovic Max-Planck Digital Library

eSciDoc: Background information  Joint Project between Max Planck Society and FIZ Karlsruhe  Funded by German Federal Ministry for Education and Research (BMBF) until Mid 2009  Additional substantial own efforts by both partners  Both Partners committed themselves to ongoing efforts until 2011

eSciDoc project landscape

Diversity of MPG supported by a service infrastructure ●People ●Librarians, researchers, developers, general public ●Data ●Publications, research data, supplementary material etc. ●Processes ●Various institutes apply various workflows supporting their data management

eSciDoc: what it addresses and provides?  Organizational aspects Business processes: introspection into the “research information and research data life-cycle” Modelling of the “research workflows” (usage, scenarios, use cases, process worfklows) “SOA” is not only technical infrastructure - it is driven by organization-wide processes and requirements  Technical aspects service infrastructure instead of silos applications solutions to visualize, publish, relate and manage data easy composition of services and data mashups and repurposing Both organizational and technical activities facilitate the growth of the organization and the dissemination, accessibility and easy reuse of the research results across disciplines

eSciDoc.PubMan: Management of publications

eSciDoc.Virr: Management of digitized textual resources

eSciDoc.FACES: Management of image collections

Challenges And facts for eSciDoc project related to the digital preservation and long-term archiving

Which data are managed? Support for variety of data (publications, old manuscripts, microscopic images, patents, datasets)

Abstraction: Data structure Abstraction in data modelling and implemented data services (items, containers).xml Specialization through content models gives contextual information on what data it is (publication, scanned book page, lexical resource, collection of “digital things”) Atomistic model

Atomistic model - items Seite 11

Atomistic model - containers Seite 12

Communities and workflows Core workflow Support for variety of workflows for different user-communities Community specific workflow

Persistent identification Each resource and resource component (metadata, files) must be persistently identified Users may decide what to identify (resource, resource version) Users may configure when to assign the persistent identifier on a repository level Keep all persistent identifiers of the resource (if such existed before the resource has been created in the system) Different communities decide what to identify and when in a different manner and at a different state of the completeness of their data

Persistent identification system  But which PID system to choose out of many ?  Handle System (CNRI - Corporation for National Research Initiatives)  Digital Object Identifiers - DOI (IDF – International DOI Foundation)  Archival Resource Key –ARK (California Digital Library)  the Uniform Resource Name - URN (IANA)  National Bibliography Numbers - NBNs  the persistent URL – PURL (OCLC, Online Computer Library Center)  the Open URL  Most important is the organizational commitment for PID and PID infrastructure, PIDs are rather organizational then technical effort  EPIC – European Persistent Identifier Consortium (GWDG Germany, SARA Netherlands, CSC Finland, )

Resource metadata  Standards and Frameworks – have to be used so data are known  Dublin Core, MODS, other  Where no standards exist, describe the profile in standard format  Dublin Core Singapore Framework  Services, metadata quality and metadata extraction  Metadata Validation service  CoNE – Control of Named Entities service  JHove – technical metadata extraction service  Supports and important aspect: quality of data, clear identification of resources, format migration  Interoperability – data redundancy  Ensures resources are disseminated via RSS, OAI-PMH Support for variety of data (publications, old manuscripts, microscopic images, patents, datasets) and their metadata

Metadata standards and frameworks “The Singapore Framework for Dublin Core Application Profiles is a framework for designing metadata applications for maximum interoperability and for documenting such applications for maximum reusability”

Metadata profiles ●DCAP based (Dublin Core Application Profile) ●DC terms (identified URIs) ●eSciDoc solution specific terms (identified by URIs) ●METS/MODS ●Publicly available ●Functional description ●Application profiles schemas (xsd) ●Vocabulary encoding schemas ●Interoperability levels ●Shared term definitions (done) ●Semantic interoperability (done) ●Description set syntactic interoperability (started) ●Description set profile interoperability (started)

Services CoNE – Control of named entities Journals, Persons, Dewey Decimal Classification (3 public levels), Creative Commons Licenses (CC licenses), ISO Languages, MIME Types, PACS classification, Custom classifications “Vocabulary Encoding schemas for resources ” Validation service Semantic metadata validation based on Shibboleth If the publication is of type “Journal Article” then the “Journal” information has to be present Rules are preserved in XML – and are not encoded in the business logic code Jhove service Technical metadata are extracted prior to binary content upload Technical metadata are kept together with the binary content Example

Data provenance  Support for event logs and version history (what happened with the content?)  PREMIS v1 created for each resource (events info) Example:  Upcoming: upgrade to PREMIS v2

Software, standards, documentation  Software all used software is an open source software eSciDoc software available under CDDL license (open source)  Documentation Accessible (still a lot of work to do) Specifications and some design are available via MPDL Colab e.g.  Standards and formats XML, XSD used for interface and service operation messages Use of metadata standards PREMIS Open formats wherever possible e.g. metadata, lexical resources TEI-XML etc. Containment: Fedora XML objects for content resources OAIS reference model – compatible OAIS Bitstream preservation

OAIS Archive: What we will put for LTA? XML XML/RDF XSDs Open source code Binary content with technical metadata (XML) and calculated MD5 checksums

Lessons learned  Have we covered all aspects?  They must be considered from the very start in infrastructural design and implementation  Things are getting more complicated in dynamic environment (data are changed often)  Selection process  What – all or only selected research data?  When – whenever data is created or after the finalization of curration activities?  Why - “institutional memory”, future research?  Data redundancy  As many copies as possible  Share and replicate as much as possible - redundancy does not hurt (it costs )  Organizational commitment  Sustainable  Long-term  Technical  Collaboration and cooperation with partners necessary (its not a single unit effort)

Thank you!   

Resources  eSciDoc project web pages, Infrastructure download   MPDL collaboratory network   MPDL software download     