Presentation is loading. Please wait.

Presentation is loading. Please wait.

SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.

Similar presentations


Presentation on theme: "SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities."— Presentation transcript:

1 SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities Data Service

2 Background Arts & Humanities Data Service Activities included management and preservation of research outputs from UK researchers in arts and humanities Centre for e-Research, King’s College London (CeRch) Activities will include management and preservation of research outputs from KCL researchers in all disciplines Among other things …

3 Context Ingestion and preservation of complex material into digital repository (Fedora- based) Unpredictable structures Many formats Formalised but manual procedures Not scaleable Functional limitations (e.g. preservation metadata, provenance)

4 Schematic ingest process (simplified)

5 Requirements Handles complex/compound objects Distributed architecture Scalable Automated processing and user input Able to integrate specialised third-party tools (e.g. format conversion) Preservation metadata management Audit trail/provenance metadata

6 Approach Workflow management tool to create and execute workflows (jBPM) Generic interfaces defining common preservation and ingest actions Implementations of these interfaces encapsulating units of functionality Generic interfaces to wrap third-party tools. Web service (SOAP & REST) and local implementations

7 jBPM Chain together automated actions and user tasks to form a workflow or “Business Process” Open source, flexible, extensible workflow management system Bridges gap between users/developers by giving them a common language Packaged as a J2EE application - can run on any J2EE application server such as JBoss.

8 jPBM (design view)

9 jBPM (XML view) A jPDL (XML) fragment defining (part of) a workflow

10 jBPM (Nodes and Action Handlers)

11 jBPM (execution view)

12 Architecture (1)

13 Architecture (2)

14 Interfaces Interfaces: local (java), SOAP and REST options coarse-grained e.g.: Create file characterisation Identify file format Migrate file format Normalise file format Check file integrity …

15 Service implementations Configure use of particular implementations, e.g. Format validation: JHOVE and others Format identification: JHOVE, DROID, XENA Format conversion: various Metadata capture: PREMIS

16 Workflow inputs & ouputs

17 Re-use example – SHERPA DP 2 Project Objectives: Investigate methods for the provision of distributed preservation services and alternative methods of content-service provider interaction. Provide archiving for varied software repositories and web resources Perform curatorial activities for diverse types of content, ranging from simple objects to highly structured research data. Website: http://www.sherpadp.org.uk Contact: stephen.grace@kcl.ac.uk; gareth.knight@kcl.ac.uk

18 Re-use example – SHERPA DP 2 Content providers supported: Repositories: Fedora, CDS Invenio, DSpace, EPrints, DigiTool Website: Large dynamic sites, static sites. Automated ingest methods: OAI-PMH: METS, MPEG21-DIDL, MarcXML, Dublin Core and other metadata formats supported. SWORD: An ATOM application profile Content types supported: Wide variety of supported content type - image collections, static and dynamic web sites, datasets and other types of research data. Website: http://www.sherpadp.org.uk Contact: stephen.grace@kcl.ac.uk; gareth.knight@kcl.ac.uk

19 Issues Lack of suitable tools in some areas – expensive, outputs unreliable Preserving content – what do we actually want to preserve? Significant properties – soft concept, hard to quantify (InSPECT) Problems with jBPM

20 Further work Make code more robust and fill in gaps Integrate task screens with other identity management systems (e.g. Shibboleth federation) Incorporate content model-specific processing Incorporate disseminators Integrate service registry for selecting services to invoke Resource discovery metadata generation

21 Questions Contact: mark.hedges@kcl.ac.uk


Download ppt "SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities."

Similar presentations


Ads by Google