Download presentation
Presentation is loading. Please wait.
Published byThomas Anderson Modified over 8 years ago
1
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration of large data sets in order to help automate digital preservation Digital preservation: standards + policies + technologies to ensure access to digital objects over time “Preservation workflows”, “Digital objects 4 ever” 42 months, in the period 2011-2014 16 project partners, 22 WPs, 55 deliverables, 88 milestones, zillion mailing lists
2
SCAP E The Problem Scale of data sets involved in digital preservation: large number of objects involved in data sets the objects can be large in size or complex in structure the data collections can contain heterogeneous objects (objects of different type) Data formats change over time, become obsolete Migrating digital objects – must ensure success Reproducibility of preservation processes and collection of provenance data over the entire digital object’s lifecycle
3
SCAP E The Solution – From Project Proposal The preservation processes - realised as data pipelines and described formally as Taverna workflows Workflows will invoke various services for planning and execution of institutional preservation and quality assurance strategies Workflows will be deployed on a large scale (using clouds) and executed over large, distributed and heterogeneous collections of complex digital objects The execution of workflows will be controlled by a policy- based system, which will ensure the workflows are in line with state-of-the art in digital object representation, file formats, rendering tools, etc. and detect and report any errors in a preservation process
4
SCAP E The Solution – In Practice Preservation services are written in various languages Use Taverna’s External Tools or Beanshells to invoke them from inside Taverna workflows Preservation services need to be running locally to be able to deploy them to a cluster and avoid bottleneck problem related to invoking a Web service Convert Taverna’s workflows to workflows executable and parallelizable on Hadoop MapReduce Compile Taverna workflows to intermediate language Jaql that can be optimized and executed on MapReduce
5
SCAP E Benefits to Us Strengthened External Tools plugin and improved support for running external services Taverna workflow (potentially containing only local services) -> parallelizable Jaql workflow executable on a MapReduce cloud App4Andy-style applications that process large data, use local scripts and need parallelization/optimization Some extensions to myExperiment (“run wf on a cloud”) /BioCatalogue – not sure how reusable
6
SCAP E Other Projects Affecting SCAPE External Tools plugin for Taverna Provenance in Taverna Browsing, exporting We design a Taverna wf, but actually run a Jaql wf – so provenance is not being captured by Taverna? Next Generation Workbench – could with a more advanced UI SCUFL2 – for conversion to Jaql workflows Easier for manipulation than current t2flow?
7
SCAP E Summary Contributions Taverna Workbench for workflow design myExperiment VRE for sharing workflows BioCatalogue catalogue for curating preservation services Ontology development Expectations Scalability in workflow execution Experiences with new domain – digital libraries
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.