Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, A Weekend with Nanite Large scale characterisation of web archives
A short introduction to the experiment A live demonstration A look at the data for characterisation A look at the input for the job Run the job Analysis of the output and of the run itself. 2 Agenda This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
Performance-testing the tools SCAPE User Story: As a Web Archive I need a Digital Preservation System that can process both ARC and WARC files and identify file formats/characterize of items contained so that I can assess preservation risks and plan which tools will be required for access to those formats. 3 Task at Hand This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
Apache Tika DROID from The National Archive (libmagic) Not a word on FITS... 4 Tools at Hand This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
Created and maintained by the British Library Improved by SCAPE and sustained by Open Planets Foundation Tika and libmagic support added Advanced Tika support through a ”persistent” Tika server ARC header extraction added More to come… 5 Nanite This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
6 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
SCAPE User Story for web archive data: labs.org/display/SP/File+Format+Identification+and+Ch aracterisation+of+Web+Archiveshttp://wiki.opf- labs.org/display/SP/File+Format+Identification+and+Ch aracterisation+of+Web+Archives Nanite: A Weekend With Nanite blog post: weekend-nanite weekend-nanite Open Planets Blogs: References This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).