SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

Preserv: Preservation architecture and interface A brief overview of ideas wrt to the project plan For Preserv partners meeting, BL, London, 18th November.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
Joint Information Systems Committee 11/03/07 | | Slide 1 Joint Information Systems CommitteeSupporting education and research JISC Conference 2007 Managing.
RepoMMan: using Web Services and BPEL to facilitate workflow interaction with a digital repository Richard Green.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
Copying Archives Project Group Members: Mushashu Lumpa Ngoni Munyaradzi.
Interoperability and Preservation with the Hub and Spoke (HandS) Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign.
Simple Web service Offering Repository Deposit (SWORD)‏ Project kick-off meeting Birkbeck College, London, 30 th April 2007 Julie Allinson, UKOLN, University.
Interoperability and Preservation with the Hub and Spoke (HandS) Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign
Your Name AutoArchive?:The ADS and the SWORDARM project Catherine Hardman - Archaeology Data Service University of York White Rose/RoaDMap 24 th May 2012.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Depositing e-material to The National Library of Sweden.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Rutgers University Libraries What is RUcore? o An institutional repository, to preserve, manage and make accessible the research and publications of the.
WMS: Democratizing Data
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The Planets Interoperability Framework Rainer Schmidt AIT Austrian Institute of Technology 1st DPIF Symposium, April 21-23, 2010,
A Framework for Distributed Preservation Workflows Rainer Schmidt AIT Austrian Institute of Technology iPres 2009, Oct. 5, San.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
NCSU Libraries Ingest Workflow Issues: Metadata North Carolina Geospatial Data Archiving Project Steve Morris North Carolina State University Libraries.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Adventures in Digital Asset Management: Fedora at the National Library of Wales Glen Robson National Library of Wales
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Using IESR Ann Apps MIMAS, The University of Manchester, UK.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
Export from DSpace to Fedora Repository Bridge Project October 2005.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
One Platform, Two Stories. Willamette University Oregon State University.
HUB AND SPOKE TOOL SUITE PREMIS Implementation Fair – 7 October 2009 Bill Ingram Visiting Research Programmer University of Illinois at Urbana-Champaign.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Technical Update 2008 Sandy Payette, Executive Director Eddie Shin, Senior Developer April 3, 2008 Open Repositories 2008, Fedora User Group.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
ARROW Institutional Repositories for Managing e-Theses Presentation to ETD September 2005 Geoff Payne, ARROW Project Manager.
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Significant Properties - where next?. 2 Curatorial role in SP Object analysis will enumerate technical properties and identify the purpose for each Stakeholder.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Collection Management Systems
Repository-specific Spoke Scripts Content Repository JSR-170/283 Content Repository for Java Technology API Normalized H&S METS Files METS Import/ExportMETS.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Developing a digital repository infrastructure for King’s College London RSP Training Day, 22 nd January 2009 Gareth Knight Centre for e-Research.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Digital Library Storage using iRODS Data Grids Mark Hedges, Tobias Blanke Centre for e-Research, King’s College London Arts and Humanities Data Service.
An Introduction to Tessella and The Safety Deposit Box Platform
Joseph JaJa, Mike Smorul, and Sangchul Song
Overview: Fedora Architecture and Software Features
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Implementing an Institutional Repository: Part II
Digital Preservation Planning:
Automation and Scalability in Digital Preservation
Execute your Processes
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities Data Service

Background Arts & Humanities Data Service Activities included management and preservation of research outputs from UK researchers in arts and humanities Centre for e-Research, King’s College London (CeRch) Activities will include management and preservation of research outputs from KCL researchers in all disciplines Among other things …

Context Ingestion and preservation of complex material into digital repository (Fedora- based) Unpredictable structures Many formats Formalised but manual procedures Not scaleable Functional limitations (e.g. preservation metadata, provenance)

Schematic ingest process (simplified)

Requirements Handles complex/compound objects Distributed architecture Scalable Automated processing and user input Able to integrate specialised third-party tools (e.g. format conversion) Preservation metadata management Audit trail/provenance metadata

Approach Workflow management tool to create and execute workflows (jBPM) Generic interfaces defining common preservation and ingest actions Implementations of these interfaces encapsulating units of functionality Generic interfaces to wrap third-party tools. Web service (SOAP & REST) and local implementations

jBPM Chain together automated actions and user tasks to form a workflow or “Business Process” Open source, flexible, extensible workflow management system Bridges gap between users/developers by giving them a common language Packaged as a J2EE application - can run on any J2EE application server such as JBoss.

jPBM (design view)

jBPM (XML view) A jPDL (XML) fragment defining (part of) a workflow

jBPM (Nodes and Action Handlers)

jBPM (execution view)

Architecture (1)

Architecture (2)

Interfaces Interfaces: local (java), SOAP and REST options coarse-grained e.g.: Create file characterisation Identify file format Migrate file format Normalise file format Check file integrity …

Service implementations Configure use of particular implementations, e.g. Format validation: JHOVE and others Format identification: JHOVE, DROID, XENA Format conversion: various Metadata capture: PREMIS

Workflow inputs & ouputs

Re-use example – SHERPA DP 2 Project Objectives: Investigate methods for the provision of distributed preservation services and alternative methods of content-service provider interaction. Provide archiving for varied software repositories and web resources Perform curatorial activities for diverse types of content, ranging from simple objects to highly structured research data. Website: Contact:

Re-use example – SHERPA DP 2 Content providers supported: Repositories: Fedora, CDS Invenio, DSpace, EPrints, DigiTool Website: Large dynamic sites, static sites. Automated ingest methods: OAI-PMH: METS, MPEG21-DIDL, MarcXML, Dublin Core and other metadata formats supported. SWORD: An ATOM application profile Content types supported: Wide variety of supported content type - image collections, static and dynamic web sites, datasets and other types of research data. Website: Contact:

Issues Lack of suitable tools in some areas – expensive, outputs unreliable Preserving content – what do we actually want to preserve? Significant properties – soft concept, hard to quantify (InSPECT) Problems with jBPM

Further work Make code more robust and fill in gaps Integrate task screens with other identity management systems (e.g. Shibboleth federation) Incorporate content model-specific processing Incorporate disseminators Integrate service registry for selecting services to invoke Resource discovery metadata generation

Questions Contact: