Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Ross King AIT Austrian Institute of Technology GmbH SCAPE/OPF Executive Seminar: Managing Digital Preservation The Hague, April 2, 2014 SCAPE Tools.

Similar presentations


Presentation on theme: "Dr. Ross King AIT Austrian Institute of Technology GmbH SCAPE/OPF Executive Seminar: Managing Digital Preservation The Hague, April 2, 2014 SCAPE Tools."— Presentation transcript:

1 Dr. Ross King AIT Austrian Institute of Technology GmbH SCAPE/OPF Executive Seminar: Managing Digital Preservation The Hague, April 2, 2014 SCAPE Tools and Solutions

2 SCAPE Project SCAPE Tools SCAPE Solutions SCAPE and Preservation Management SCAPE Additional Information Online Resources Events Contact Information 2 Outline This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

3 SCAPE – what is it about? Planning and executing computing-intensive digital preservation processes such as the large-scale ingestion, characterisation or migration of large (multi-Terabyte) and complex data sets SCAPE results include Preservation scenarios Preservation tools Preservation workflows Preservation infrastructure Preservation best-practices SCAPE is a follow-up to the highly successful FP6 IP Planets. 3

4 SCAPE Project Data Project instrument: FP7 Collaborative Project 20 Partners from 11 countries 6. Call Objective ICT : Digital Libraries and Digital Preservation Target outcome (a) Scalable systems and services for preserving digital content 10. Call Objective ICT : Supplements to Strengthen Cooperation in ICT R&D in an Enlarged European Union Duration: 44 months February 2011 – September 2014 Budget: 12.0 Million Euro Funded: 9.2 Million Euro 4

5 SCAPE Consortium 5

6 SCAPE Tools 6

7 Toolwrapper Application that adapts existing tools to the SCAPE Platform https://github.com/openplanets/scape-toolwrapper Enhances wrapped tools Standard naming scheme for CC, AS and QA tools Standard invocation method (CLI) Debian packages for easy deployment on the cluster Support for data streaming (useful for Hadoop jobs) Generates Preservation Components Taverna workflows with embedded metadata for easy discovery Automatic publication of components on myExperiment (to support discoverability) Standard ports to enable composition of Preservation Components (based on well defined component profiles, CC, AS & QA) Digital Preservation Toolkit Software suite that contains a large set of DP tools 77 operations in total Easy to deploy on Linux machines (via apt-get) apt-get install digital-preservation-tools 7 Scalable Tools This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

8 Jpylyzer JP2 (JPEG 2000 Part 1) validator and properties extractor Pagelyzer Suite of tools for detecting changes in web pages and their rendering xcorrSound Suite of tools for automated quality assurance of audio migration processes https://github.com/openplanets/scape-xcorrsound Matchbox Duplicate image detection tool ToMaR Supports the scalable execution preinstalled tools or other applications Wraps command-line invocation of a tool into a MapReduce program https://github.com/openplanets/scape/tree/master/pt-mapred 8 Scalable Tools This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

9 SCOUT: an automated preservation watch system Enables planning tool and decision makers to monitor the world and the organisation Collects relevant knowledge and enable automated notification Open and extensible c3po: scalable content profiling c3po analyses characterisation data based on fits Scale-out MongoDB (100k/min/node) Visual drill-down and well-documented profile Automated sample selection PLATO 4.4: scalable preservation planning Technology upgrade - refactored, rebuilt, standardised, tested New features Groups allow collaborative planning Integration of control policies for group Quality domain – measures 9 Planning and Watch Tools This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

10 Fedora All REST, no SOAP RDF as first class objects JCR 2.0 Implementation (ModeShape) Infinispan distributed NoSQL datastore RODA KEEP Solutions’ open source repository Implements all SCAPE APIs Rosetta Ex Libris ’ commercial long-term preservation system Implements SCAPE Data Connector API 10 Repositories This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

11 11 SCAPE Architecture This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

12 SCAPE Components 3rd Party Components 12n HDFS Hadoop... PigToMaR 3rd Party Components with SCAPE contributions STAGER LOADER Fedora 4RosettaRODA Taverna Data Connector API SCAPE APIs PPL toolspec Digital Objects SCAPE Platform This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

13 SCAPE Components 3rd Party Components 12n HDFS Hadoop... PigToMaR 3rd Party Components with SCAPE contributions STAGER LOADER Fedora 4RosettaRODA Taverna Data Connector API SCAPE APIs PPL toolspec Tool wrapper Components Digital Objects Preservation Tools SCAPE Platform + Preservation Components This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

14 SCAPE Components 3rd Party Components 12n HDFS Hadoop... PigToMaR 3rd Party Components with SCAPE contributions STAGER LOADER Fedora 4RosettaRODA PLATO 4 Taverna Data Connector API Report API Plan Management API SCOUT SCAPE APIs PPL toolspec Tool wrapper Components Digital Objects Preservation Tools SCAPE Planning and Watch This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

15 SCAPE Solutions see also 15

16 User Story As a curator of image files, I need a digital preservation system that can migrate a large number of images from one format to another, ensuring that the migrated images conform to our institutional profile, that no image data is lost and that the migration is cost effective (saving storage for example). SCAPE Solution SCAPE Platform ImageMagick (with SCAPE toolspec description) Jpylyzer 16 Migration: Large Scale Image Migration This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

17 User Story As the owner of a large audio collection, I need a digital preservation system that can migrate large numbers of audio files from one format to another and ensure that the migration is a good and complete copy of the original. SCAPE Solution SCAPE Platform xcorrSound 17 Migration: Large Scale Audio Migration This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

18 User Story As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats. SCAPE Solution SCAPE Platform ARC Unpacker FITS Tool (with SCAPE toolspec description) 18 Analysis: File Format Identification and Characterisation of Web Archives This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

19 User Story In order to be confident that we have preserved a website we need a digital preservation system that can automate the comparison of the two Web Snapshots - for example a harvested copy and a previous harvested copy that has been manually verified as an accurate representation of the site. This will enable us to ensure Web content has been successfully harvested and inform harvesting policies. SCAPE Solution Pagelyzer Hadoop Platform 19 Quality Control: Comparison of Web Snapshots This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

20 Open Source Development And/or implementation of open APIs Uniform Deployment Use the SCAPE Toolspec+Toolwrapper to publish tools As Advanced Packaging Toolkit (APT) packages As SCAPE Components Preservation Planning Use PLATO to test tools (as SCAPE Components) and make policy-based plans Process Modelling Use Taverna to model preservation workflows Taverna works directly with SCAPE components for experimental workflows Taverna workflows can be converted to Hadoop/Pig workflows in some cases Hadoop Deployment Use APT packages to deploy to a Hadoop environment Scalable Execution SCAPE ToMaR can directly access tools through the toolspec 20 Solving Preservation Problems the SCAPE Way This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). from digitalbevaring.dk

21 SCAPE and Preservation Management 21

22 Research and Development Focus on innovation Services are prototypes Unstable Buggy Maintenance pool limited to a few (or one) expert(s) 22 Production Focus on daily business needs Service availability is a priority Services are stable Enjoy a large maintenance pool The Wall This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

23 Research and Development 23 Production The Wall This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). 12n HDFS Hadoop... PigToMaR FedoraRosettaRODA Digital Objects

24 Research and Development 24 Production The Wall This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). 12n HDFS Hadoop... PigToMaR STAGER LOADER FedoraRosettaRODA Data Connector API Digital Objects

25 Other problems with The Wall? How can we break through The Wall? 25 The Wall This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).

26 SCAPE Additional Information 26

27 Additional Resources of Interest Development Infrastructure Code repository hosted by the Open Planets Foundation and GitHub https://github.com/openplanets/scape/ Development Wiki Experimental Workflows Publications Public Deliverables Tools 27

28 SCAPE Events DL2014: Joint SCAPE/APARSEN Workshop September 8, 2014, London Registration: 28 See

29 SCAPE Contact Information Twitter: #scapeproject Dr. Ross King AIT Austrian Institute of Technology GmbH Donau-City-Strasse 1 A-1220 Wien 29

30 Thank you for your attention! Questions? 30


Download ppt "Dr. Ross King AIT Austrian Institute of Technology GmbH SCAPE/OPF Executive Seminar: Managing Digital Preservation The Hague, April 2, 2014 SCAPE Tools."

Similar presentations


Ads by Google