Www.kb.se Depositing e-material to The National Library of Sweden.

Slides:



Advertisements
Similar presentations
Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
Advertisements

IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
RepoMMan: using Web Services and BPEL to facilitate workflow interaction with a digital repository Richard Green.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
Interoperability and Preservation with the Hub and Spoke (HandS) Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Next Generation Node (NGN) Technical Overview April 2007.
All Things to All People Combining Resources to Build an Integrated Digital Repository Preservation and Access for Electronic College and University Records.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
WMS: Democratizing Data
U of R eXtensible Catalog Team MetaCat. Problem Domain.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
FABULOUS Fedora/Arrow Batch Utility with Lots Of User Services Presenter – David Groenewegen Prashant Pandey Lead Developer.
A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
MAHI Research Database Data Validation System Software Prototype Demonstration September 18, 2001
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Using the SAS® Information Delivery Portal
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
11 October 2015 MAVIS v “Sneak Preview”. 11 October 2015 Enhancements in the Release  Reference Material  Brief Accessioning View  Template.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
XML Registries Source: Java TM API for XML Registries Specification.
0 eCPIC User Training: Resource Library These training materials are owned by the Federal Government. They can be used or modified only by FESCOM member.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
Digital initiatives Digital Initiatives at the National Library of Wales 19 th April 2007 Paul Bevan
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
The NLW Digital Asset Management System Paul Bevan DAMS Implementation Manager
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
DSpace - Digital Library Software
Web Services An Introduction Copyright © Curt Hill.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.
Implementing PREMIS in DigiTool Michael Kaplan ALA 2007 Update.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
This material is based upon work supported by the U.S. Department of Energy Office of Science under Cooperative Agreement DE-SC Michigan State.
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
Google Code Libraries Dima Ionut Daniel. Contents What is Google Code? LDAPBeans Object-ldap-mapping Ldap-ODM Bug4j jOOR Rapa jongo Conclusion Bibliography.
What is Fedora Commons, and Why Should You Care? Cole Hudson and Graham Hukill.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Functionality in a Digital Archive Erik Oltmans Koninklijke Bibliotheek Raymond J. van Diessen IBM Business Consulting Services Hilde van.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
An Introduction to Data Modeling with Fedora Thorny Staples Fedora Commons, Inc.
Fedora Digital Object in a Nutshell Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Java Web Services Orca Knowledge Center – Web Service key concepts.
Ingest and Dissemination with DAITSS
Using E-Business Suite Attachments
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
SDMX IT Tools SDMX Registry
Presentation transcript:

Depositing e-material to The National Library of Sweden

KB - Overview 1661 – First legal deposit law 1877 – Becomes a government institution 1996 – First steps in digitization 1997 – Kulturarw3 - the first collection of the Swedish web 20?? – Deposit law expanded to include electronically published documents

KB – Aim of repository Be able to receive different kinds of data in different kinds of formats Be able to handle large amounts of incoming data (scalability)‏ Have a flexible and modular design Be able to utilize services that can receive data from organizations with different technical capabilities A system for long term preservation and presentation

Overview - Architecture

Reality – Types of material Will receive widely different kinds of materials –Different: file formats metadata formats structure of data naming schemas From a lot of different sources –Local file system, FTP, Database, URL on the web –Should still try to use the same services Solution: –Normalize received material to an internal format –Represent data + metadata as DIDL XML

Overview – Deposit system

Fundamentals of deposit system Modular design One internal format for representing packages Try to use as simple interfaces between services as possible –REST services (HTTP + XML)‏ –Message Queue to drop packages for the system in –This makes the system independent of platform and programming framework Each module should be highly configurable with smaller sub-components –Build services as chains of simple components concerned with just one task –Use Spring Framework for configuration

Internal package format Uses Digital Item Declaration Language (DIDL)‏ –An MPEG-21 standard –An XML format for both data and metadata Do not inline data, just metadata Store datastreams centrally and reference 1 DIDL file = 1 ”object” One package has: –ID –Type –List of Attributes(name/value pairs)‏ –List of Metadata(as XML)‏ –List of Resources(as references)‏

Internal package format Represent a package as a DIDL file –Parser to read a DIDL file into a Java object –Serializer to write a Java object to a DIDL file Usually works with the package as a Java object BUT: –Only plain XML is sent between services –Decouples services from programming language, anything that can handle XML is fine

Internal package format - Attributes Attributes –Name/value pairs(Example: page-number = 5)‏ –Flexible way of representing additional information about a package

Internal package format - Metadata Metadata –Name –Description (optional)‏ –XML that represents the metadata

Internal package format - Resource Resource –ID –Mimetype –List of Attributes (for this Resource only)‏ –List of Metadata (for this Resource only)‏ –Reference to the datastream (a URL)‏

Package normalizer

Package normalizer Takes data in one format and creates an internal package –Creates the DIDL file and writes the datastreams to the Resource Store Places the package on a queue for further processing One normalizer per type of data package delivered –Has to know the contract for the delivered data Looks in an inbox at regular intervals for new packages –File system directory Data could be delivered via FTP or file copy on local file system –URL OAI-PMH server with metadata that has links to actual resources OAI-ORE fits in nicely here –Database –Web form operated by human –Anything else?

Enricher

Enriching a package REST service –POST a DIDL file and get it back enriched Implemented with Spring and a chain of enrichers –Each doing one specific task, for example adding a urn:nbn –Some only make sense for a specific kind of package –Can be a different set of enrichers for different package types Examples of enrichers –Adding urn:nbn –Updating MARCXML to reflect that it is an electronic copy –Adding extracted technical metadata from JHove or DROID –And so on... Possible to have enrichers that involves human intervention

Validator

Validating a package Similar in design to Enricher REST service –POST a DIDL file and get back a status report Implemented with Spring and a chain of tests –Each test doing one specific task –Some only make sense for a specific kind of package –Can be a different set of tests for different package types Examples of tests –Verifying that a PDF is readable –Validating metadata –And so on... Possible to have tests that involves human intervention

Ingest

Ingest REST service –PUT a DIDL file and get back an id pointing into the repository In future: –Perhaps add possibility to update or delete package in repository using POST and DELETE Abstraction that hides the actual repository used –Can change repository without affecting rest of the system –Repository dependant enrichments and tests can be done here We use Fedora as our repository The same principal is used for ingestion into the long-term preservation archive

Fedora Fedora is used as the repository –Reasons why: Open-source Actively developed Large (and growing) user base Good design and nice features –We use version 2.2 obviously going to move to 3.0 in the future Used for storage and presentation –Stores both relevant datastreams and metadata –Have relations between datastreams (i.e. sequence-number)‏ Possible to search against the repository –As standard search against DC fields

Fedora – Content Models Content Model – A contract of available Datastreams and Behaviour Definitions in a Fedora record In Fedora 2.x just an informal agreement But from Fedora 3.0 a new mechanism exists for this –Called Content Model Architecture (CMA)‏ –A Content Model could involve multiple Fedora records Atomistic versus Compund model –Also specifies relations Both between datastreams and Fedora records Using RDF in the RELS-EXT datastream

Fedora - An example Content Model PagedObject Content Model – Used for digitized material where each page is an image – Atomistic, i.e. one page becomes one Fedora record – Also has one Fedora record for the object as a whole Record for the object –Datastreams DC MODS MARCXML –Behaviour Definitions view list getPreview –Relations member of a collection member of OAI-PMH set Record for an individual page –Datastreams WEBIMAGE THUMBNAIL –Behaviour Definitions getImage getZoom –Relations member of the object sequence-number etc.

Fedora - Ingest Gets a DIDL package and creates corresponding FOXML –Different FOXML for different Content Models –Which Content Model depends on Type of package –A Content Model can result in multiple FOXML files (and accordingly multiple Fedora records) Uses Fedora's Web Services to ingest the FOXML to the repository The datastreams are also transferred to the Fedora repository (Also a urn:nbn is mapped to the objects location in Fedora)‏

Fedora - Access Built-in search system –Search for DC terms and some Fedora terms Built-in OAI-PMH provider –We give access to DC, MODS and MARCXML Built-in RDF Query Server –Query against the RDF in RELS-EXT In future: OAI-ORE provider for Fedora We provide our own viewer for digitized objects –Developed with Google Web Toolkit (GWT)‏ –Has one tab with an overview of all pages –Another tab with an individual page with zooming functionality and the ability to navigate between pages –Some simple metadata displayed

Example A demo of viewing e-material from our Fedora repository. Accessing SOT from LIBRIS.