Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlled vocabularies, metadata standards and linked data for digital preservation The case of Sapienza Digital Library Mariella Guercio, University.

Similar presentations


Presentation on theme: "Controlled vocabularies, metadata standards and linked data for digital preservation The case of Sapienza Digital Library Mariella Guercio, University."— Presentation transcript:

1 Controlled vocabularies, metadata standards and linked data for digital preservation The case of Sapienza Digital Library Mariella Guercio, University of Rome Sapienza, Digilab maria.guercio@uniroma1.it

2 digital preservation and linked data: remarks and open questions

3 digital preservation & linked data - 1 http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Digital_Preservation From the final report of the Library Incubator Group (and its small portion on digital preservation) “Preservation of digital objects in the long term is a challenging activity which is not limited to storage and back-up: it involves complex strategies aiming at providing a trusted environment where digital objects can evolve along with the changes in technology, hardware and software environments. To manage these evolutions, strategies such as emulation and migration have to be proceeded. In order to do so, it is necessary to collect, store and manage all the information relevant to preserve a digital object through its lifecycle. This is usually done by collecting preservation metadata, i.e. metadata about digital objects, their formats, the events they have met throughout their lifecycle, etc. […] Linked data provides a global environment for describing the objects and their significant properties, also allowing to avoid duplication of efforts when describing for instance data formats”

4 digital preservation & linked data - 2 http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Digital_Preservation Use case scenario (mainly on formats, but a good simple starting point) “A librarian needs to undertake a preservation action on a subset of his collection which is subject to obsolescence. For instance, he wants to transform all TIFF files in his data registry into JPEG2000 files […] Application of linked data for the given use case A lot of metadata is needed to realize this use case : –metadata about the object –metadata about file formats and associated tools –metadata about events and agents involved in the events –metadata about versions of objects” “The most technical and/or non institution-specific bits of metadata, such as description of formats, tools or events, can be mutualized and shared across the library community globally, so as to avoid duplication of effort between institutions actually undertaking similar actions”

5 existing work http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Digital_Preservation “ Digital preservation repository systems such as Fedora Commons [1] and SPAR (Scalable Preservation and Archiving System, developed at the national library of France) are using RDF as their standard to store preservation metadata. The P2 Registry project aims at providing a registry for the description of formats, based on Pronom, in a Linked Data form. The Library of Congress has started to provide preservation vocabularies as Linked Data, notably to describe preservation events and preservation level roles. The California Digital Library is using principles of Linked Data and the REST architecture in the implementation of its digital repository micro-services known collectively as Merritt. Linked Data is used for integration/coordination of functionally distinct curation services” …. What else? Can we develop a list for future cooperation?

6 supporting LOD: some remarks - 1 Preservation requires the interconnection of a massive quantity of information not differently than the internet services but for different goals: not just for increasing the network itself ad its links but for trusted data dissemination and usability and for collecting and making available documentation able to support the reliability, the content and context persistency and the presumption of authenticity of digital resources For the preservation it is necessary to acquire sufficient quantity of qualified and consistent information, well structured (according to OAIS PDI typology), standardized and interoperable Consistency implies common vocabularies and well defined ontologies These results can be feasible and achievable only if automatic tools are implemented and if a life cycle approach is early supported, possibly based on standardized guidelines (cross-domain with reference to the general methodology and domain driven for detailed information) –common analysis of main concepts is first required (provenance, identity, context…)

7 supporting LOD: some remarks - 2 Significant properties: they are crucial in many scenarios for digital preservation (but the term is ambiguous and normally used only for format properties and/or persistent identifier, a small portion of information prescribed by OAIS PDI) –relevant concepts but not at all sufficient for planning and managing digital preservation Existing work on LOD for preservation: limited in quantity and mainly focused on very basic information Vocabularies on preservation: still lacking of consistency and quality, not recognized and preserved (with the exception of OAIS as a basis glossary, PREMIS for technical information, PRONOM for formats) The relevance of using standardized vocabularies at international or at national level (as much as possible: i.e. VIAF, FOAF, TGN, etc.) is not sufficiently recognized, often even by the national memory institutions

8 from potentiality to implementation: (some) general questions which requirements are needed to make this technology efficient and meaningful (and largely accepted by the memory institutions? which level of mediation is implied? How relevant will be the domain knowledge and the role of memory institutions, the only developers and owners of qualified and trusted metadata schemas? How to make memory institutions aware of this potentiality and increase their role (it is not a technical question and not limited to the techniques for LOD publication)? which degree of granularity? Which level of standardization is required and possible? Which specific standards should be considered or further developed?

9 from potentiality to implementation: (some) questions on digital preservation How effort is required to build ontologies and LOD to support digital preservation? Is OAIS a model sufficiently detailed (specifically with reference to PDI) to ensure feasibility and efficiency for cross-domain outputs? If not, which steps have to be undertaken? How to improve the cooperation among the research projects for a more productive direction? Is it possible to integrate OAIS PDI with general guidelines and propose a standardized approach for building common structures of information packages where provenance, context, reference, fixity and rights information are supported by common protocols? Are CASPAR-APARSEN proposals on authenticity evidence record a viable perspective and can be supported by linked open data technologies? Which communities could be fruitfully involved for cooperation in these directions other than the traditional digital heritage curators? Which existing standards could be relevant? Only ISO 16363? Which other standardization initiatives can be undertaken or recognized?

10 a focus on Sapienza Digital Library: a challenging project for access and preservation

11 digital libraries projects: critical aspects and challenges for access and preservation - 1 Digital libraries provide the most suitable environment for LOD implementation (good intermediation and workflows available, massive open access provided), but: provenance and context are not always identified as crucial components (see the case of EUROPEANA and its lack of attention to the complex interrelationship among digital objects in its first phase) archival standards are often recognized for their general value for handling and describing complex relations and aggregations and for supporting standardized (but also flexible) lifecycle management systems, but not yet completely implemented outside the archival information systems

12 digital libraries projects: critical aspects and challenges for access and preservation - 2 compliance with standards is generally stated and partially ensured but mainly as a static and flat model for representation, the differentiation of digital resources is not usually covered: –many digital libraries are limited to the identification of individual resources and have developed simple (poor) research interfaces the integration (and not convergence) among heterogeneous cultural information access systems is a key question, but a balance is required between specificity, details and general perspective: –the functionality for retrieval must be easy to use but not trivial and new forms for intermediation are required (particularly when digital resources are complex and articulated ).

13 the Sapienza Digital Library – SDL project - 1 An advanced prototype of digital library with ambitious goals ensuring accessibility in a digital form of cross-domain information contents and their complex aggregations (collections, sub-collections, fonds, series, subseries, folder, collection partitions at various level, etc.) created by Sapienza University research communities, archivists, librarians and museum experts or made available by corporate bodies or individuals in relation to the academic environment developing and implementing a cross-domain harmonization of the descriptive practices for new resources by adopting and adapting national and international standards and recommendations (not only related to the librarian sector): MODS, Dublin Core, but also EAD/EAC and discipline oriented standards supporting a large use of controlled vocabularies (based on trusted international or national initiatives and to be developed in the form of linked open data): –PICO (Portale della Cultura Italiana, Ministry for cultural heritage) and –Nuovo soggettario di Firenze for subjects –TGN (Getty Thesaurus for geographic names), –VIAF Virtual International Authority file and –other internationally recognized vocabularies based on specific disciplinary domains

14 the Sapienza Digital Library – SDL project - 2 Supporting the interoperability with other kinds of information management systems (libraries/archives/musems/universities, open access repositories...) Supporting a life cycle management system by developing a metadata framework able to support at the creation phase the following requirements (for further preservation): –compliance with OAIS; –acquisition of a comprehensive set of descriptive information able to be exchanged with other digital library systems or other information management systems; –acquisition of essential metadata, enabling the long term management (like persistent identifier, fixity information automatically created, provenance and context information based on descriptive standards and on the preservation information collected in the course of the digitization process and at the submission phase)

15 the SDL project: the preservation model It is based on Open Archival Information System (OAIS) functional model It intends to develop OAIS compliant services able to support the requirements established by ISO 16363 for certification and audit The SIPs’ preservation metadata are encoded in PREMIS standard semantics –the actions are considered as Events in digital curation workflow, performed under specific conditions, formally defined and linked to the relevant Rights statements

16 the SDL project: relevance of a linked data approach According to our stakeholders (internal working group on linked data for SDL) in the first phase linked data will concern rights analysis and local controlled vocabularies (in term of collection responsibilities like creator, preserver and their provenance and context information) The identified benefits will include the capacity of making content available and sharing data (social marketing) the increasing of discoverability and the qualification of the rendition both of SDL content and of external information the capacity of a better integration and connection from multiple sources the creation of a larger community the possibility of maximizing the data usefulness and re-usability new form of conceptual browsing and alternative visualizations

17 the SDL project: not yet solved/faced challenges the development of new and effective organizational models and workflows a better ease for LOD creation a more comprehensive and systematic strategy for rights management …. but we are just starting: tests done and prototype to use at the end of July


Download ppt "Controlled vocabularies, metadata standards and linked data for digital preservation The case of Sapienza Digital Library Mariella Guercio, University."

Similar presentations


Ads by Google