Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira Supervisors Ana Alice Baptista José Carlos Ramalho ECDL 05 Doctoral Consortium
Contents Introductory concepts Research problems Proposed system Methodology Topics for discussion
Introductory concepts Digital preservation –The set of processes and activities that ensure the continued access to information and all kinds of cultural heritage existing in digital formats Digital object –An information object, of any type of information or any format, that is expressed in digital form – Text documents, digital photos, vector graphics, databases, Web pages, software
Strategies for digital preservation Emulation –Reproduction of the behaviour of a hardware/software platform in a different technological environment Encapsulation – Storing information about how the objects should be interpreted Migration –Periodic transfer of digital materials from one hardware/software configuration to another Others –Computer museums, viewers, Universal Virtual Computer
Migration Advantages –Updated formats that users can read and edit Disadvantages –Requires a continuous diligence –Data loss Variants –Migration on request –Normalisation –Distributed migration
Distributed migration A network of remote conversion services supported by a semantic layer [Hunter et al.] Advantages – Platform independent – Redundancy – Multiple migration paths – Cost reduction – Compatible with other migration strategies Disadvantages – bandwidth – Slow Examples –PANIC –MyMorph (NLMed) –TOM
How to choose a preservation strategy? Many preservation alternatives Lack of universal acceptance Distinct preservation requirements –Satisfaction of the designated community – Characteristics of the collection – Budget Framework for evaluating preservation strategies [Rauber] –Utility Analysis
Evaluation of preservation strategies 1.Definition of objective tree 2.Assignment of measurement units (e.g. millimetre, Mb, Euro) 3.Identification of preservation alternatives 4.Execution of preservation alternatives and evaluation of the outcome 5.Weighting of criteria in the objective tree 6.Calculation of partial and total values 7.Ranking of alternatives
Objective tree (example)
Research problems Automation of preservation processes Authenticity issues Cost management Evaluation of preservation alternatives
Research questions Is it feasible to design and implement a system that is able to automatically : – determine the amount of data loss occurred in a migration and generate detailed migration reports for inclusion in the objects’ preservation metadata? – provide recommendations of migration paths or target formats that will best suit users’ requirements?
Proposed System
Methodology - proof of concept The concepts 1.Automatic quantification of data loss occurred in a migration and generation of preservation metadata 2.Automatic recommendation of migration strategies as well as target formats The proof (empirical validation) 1.Evaluator versus Human experts 2.Advisor versus Evaluation framework
Key contributions For individual preservers, digital archives and libraries : – Outsourcing and automation of digital preservation –Generation of preservation metadata (authenticity) – Ranking of migration alternatives For designers and programmers of converters: –Possibility of publishing their converters as services For metadata creators and users: –Increase adoption –Help to improve future versions –Accelerate the development of XML bindings
Round-up Service oriented architecture (SOA) – Automatic quantification of data loss –Provides recommendations on which migration paths or target formats are best suited for each user –Simplifies the creation of preservation metadata –Based on migration Methodology – Proof of concept with empirical validation Evaluator versus Human experts Advisor versus Evaluation framework
Topics for discussion Relevance of research Research methodology System architecture Format registry vocabulary –e.g. MIME types, TOM type descriptors, Global Digital Format Registry, PRONOM, etc. Preservation metadata schema –e.g. PREMIS data dictionary (event entity)