Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Experiences in Digital Collection Building Virtual Libraries

Similar presentations


Presentation on theme: "IBM Experiences in Digital Collection Building Virtual Libraries"— Presentation transcript:

1 IBM Experiences in Digital Collection Building Virtual Libraries

2 Night Watch by Rembrandt, 1642 A few year ago the painting was damaged by a mentally disturb man with a knife and had to be repaired Digital Collection Building

3 Long Term Preservation
What is digital preservation? Digital preservation can be defined as: “The act of maintaining information in a correct and independently understandable form, over the long term (OAIS)”. Safe storage and permanent access are both part of digital preservation Connection between the research area of digital preservation and a national digital repository for digitized cultural heritage Digital Collection Building

4 Long Term Preservation challenge
Sense of urgency to resolve problem is increasing: More and more objects are only available in digital form (“born digital”) There is no mechanism for systematic preservation of information published on the Web national digital repository for digitized cultural heritage No complete integrated solutions exist yet, but: there are many initiatives in the Academic world the topic is on the agenda of the European Commission The DIAS system can be a solution: to structure the complex problem to resolve – already today - a part of the problem to facilitate gradual incorporation of further partial solutions Digital Collection Building

5 Technology Preservation
Within Long Term Preservation (LTP) three areas of interest can be identified: Medium-, Technology-, and Intellectual Preservation Medium Preservation Medium preservation is the concern for preserving the medium on which information is stored, such as tapes, disks, optical disks, CD-ROMs and the like. Technology Preservation We need to be aware of technology obsolescence as even more of a problem than medium decay, and undertake steps of technology preservation. Rather than simply refreshing, we also need migration and emulation: migrating information forward through technology / format stages as they become available and as the old technologies / formats cease being supported by vendors and the user community. emulating old and obsolete technologies / formats on current technology platforms Intellectual Preservation There remains a third preservation requirement, intellectual preservation, which addresses the integrity and authenticity of the information as originally recorded. Digital Collection Building

6 Introduction Koninklijke Bibliotheek – Library of the Netherlands KB was one of the first libraries to develop an electronic deposit with long-term preservation as the key objective Koninklijke Bibliotheek (KB) Medium-sized national library, founded in 1798 Financed by Ministry of Education, Culture and Science Annual budget € 50 million Digital archiving and R&D Permanent Preservation: € 1,1 million structurally allotted to staff, system maintenance € 1,2 million permanently dedicated to research Digital archiving and preservation embedded in organization Departments: e-Depot (electronic publications) Digital Preservation IT Digital Collection Building

7 Mission The KB is the National Library of the Netherlands
We give researchers and students access to research information  We enable everyone to share in the riches of our cultural heritage We foster the national infrastructure for scientific information We further permanent access to digital information within an international context. Digital Collection Building

8 E-Depot – Digital Information Archiving System
KB is responsible for all publications appearing in the Netherlands (and international publications from 2002) At the heart of the e-Depot system is a technical component called DIAS (Digital Information and Archiving System) based on the OAIS (Open Archival Information System) Reference Model. This Reference Model establishes a common framework of terms and concepts which comprise an Open Archival Information System (OAIS). It is also a technical recommendation for use in developing a broader consensus on what is required for an archive to provide permanent, or indefinite long-term, preservation of digital information. e-Depot system at the KB is one of the first operational digital archives based on the OAIS, worldwide. Digital Collection Building

9 Digital preservation research
KB’s research initiatives are geared towards ensuring long-term access - on the digital preservation strategies emulation and migration. Emulation aims to render the digital objects in their original form and to preserve all functionality of the objects. Migration strategy is focused on converting the digital objects to provide access to them in a future computer environment. The digital preservation department is looking into properties of file formats, preservation metadata and the development of a module called the Preservation Manager. The Preservation Manager contains information about how to render digital objects on the file format level. The Preservation Manager will provide the information necessary to be able to plan actions before a specific file type becomes inaccessible. Digital Collection Building

10 Scale Current activities are focused on e-publications based on PDF
Volume 8 million e-publications currently 10 million e-publications eventually (based on current agreements) Size 1 e-publication equals 1 Mb on average 1 Terabyte for every 1 million publications Capacity 5,000 – 50,000 e-publications ingested per day Digital Collection Building

11 Requirements of a national digital repository
A function for depositing collections and corresponding metadata A way of searching in deposited collections A request function for specific TIFF files or a complete collection of TIFF files A delivery function for the requested files A security system that should prevent the unwarranted use of collections by other institutions or third parties. The service should be available on a daily basis and independent of the location of the institutions. Development of a data model that facilitates long-term preservation and permanent access to digital images. Digital Collection Building

12 Implementation of research results:
Upcoming challenges Growth will increase dramatically with the itroduction of new collections Increasing capacity Ingest of digitised master images Ingest of websites 10 TB  500 TB Implementation of research results: Characterisation Operational Preservation Manager Migration module Emulator Digital Collection Building

13 Digitization The KB is involved in many national and international digitization projects
The Memory of the Netherlands Metamorfoze IMPACT: IMProving ACcess to Text Atlantic World: Dutch-American relationship since the 17th Century Atlas Van der Hagen and Atlas Beudeker Bibliotheca Universalis- Linschoten Bookbindings The Anglo-Dutch heritage Transatlantic Digital Library Digital Collection Building

14 Digitization KB has looked into the potential strategies to store digitized material into the their electronic deposit system Each digitized page results in: One high quality master image (TIFF or PNG or JPEG2000) One or more derived lower quality images (e.g. JPG for access) Multiple machine-readable text files (e.g. obtained through OCR) Descriptive metadata Technical metadata about the digitization process Structural metadata describing layout of page (e.g. separate articles) In most cases, each digitized page is part of a larger whole Most books, newspapers are multipage items Requires additional structural metadata to describe images as part of the whole Digital Collection Building

15 Digitization Open questions
How to link and manage the different representations of the same object One physical representation, one digital master, multiple derived files – only one bibliographic description MPEG-21 DIDL Which fileformat to choose TIFF is very inefficient for storage, JPEG2000 and PNG are more efficient Certainly not JPG (lossy compression) Is lossless compression allowed to reduce storage volume At what level? Compressed file formats, zip/tar packages, storage hardware compression Which files constitute an AIP? A single master image (not a correct representation of a multipage intellectual entity; relations to be resolved outside AIP) Full set of master images for each intellectual entity Including OCR text Are access copies to be considered a new AIP obtained through migration Access copies vs. Preservation copies Storing full set of access copy vs. generating access copies on the fly Digital Collection Building

16 Component Business Model Conceptual Approach: A Business Component Map is a is a tabular view of the business components in the scope of interest (Source; Internal 2004) Columns are Business Competencies, defined as large business areas with characteristic skills and capabilities, for example, product development or supply chain. A Business Component is a part of an enterprise that has the potential to operate independently, in the extreme as a separate company, or as part of another company. controlling executing directing Business Planning Business Unit Tracking Sales Management Credit Assessment Reconciliation Compliance Staff Appraisals Relationship Management Sector Management Product Management Production Administration Product Fulfillment Sales Marketing Campaigns Product Directory Credit Administration Customer Accounts General Ledger Document Management Customer Dialogue Contact Routing Staff Administration Business New Business Development Servicing & Sales Product Fulfilment Financial Control and Accounting Sector Planning Portfolio Planning Account Planning Sales Planning Fulfilment Planning An Accountability Level characterizes the scope and intent of activity and decision-making. The three levels used in CBM are Directing, Controlling and Executing. Directing is about strategy, overall direction and policy. Controlling is about monitoring, managing exceptions and tactical decision making Executing is about doing the work Digital Collection Building

17 KB´s Component Business Model The competencies are clustered around 5 business areas
Service Management includes all the competencies needed to deliver collection objects and associated services to the customers of the KB across the supported channels Acquisitions Management involves the competencies needed to acquire, process and catalogue all publications, both for the research collection and the deposit collection. Preservation Management focuses on the competencies needed to facilitate access to the different collections over-time addressing decay or obsolete technology support associated with a collection. Business Management identifies all the competencies associated with the management of any business IT Management relates to the competencies needed to manage the overall IT infrastructure. KB differs from other organizations because their electronic deposit solution should be operational for hundreds of years without major interruptions Digital Collection Building

18 KB´s Component Business Model 51 competencies have been identified in order for the KB to operate
Digital Collection Building

19 Digital Information Archiving System Architecture
Digital Collection Building

20 Complexity and data types
Web Sites: Dynamic generated pages Volatile external references No general accepted notion of versioning Security hurdles on web sites Needed Web server environment Applications / CDROMS: Dependent on operating system and peripheral devices Require user interaction Potential dependencies with additional software: driver, dlls Static data format Only dependent upon viewer application Format migration will preserve digital object Autonomous digital object Data type Digital Collection Building

21 Depending on the main focus different implementation strategies can be identified:
Supplier Focus Quality assurance Automatic ingest Security (identification, authentication, authorization) Consumer Focus Delivery channels Content metadata Security (identification, authentication, authorization) Decompose into components AIP composition SIP requirements Self-describing Content (bibliographical) metadata Preservation (technical) metadata Preservation Focus Media preservation Technical metadata Migration tooling Emulation tooling Digital Collection Building

22 Descriptive Information about the Package
Information Packages are the building blocks of every electronic deposit Submission Information Package (SIP) The Information Package identified by the producer in the submission agreement with the OAIS. Archival Information Package (AIP) Content Information and the associated Preservation Description Information required to preserve the Content Information over the long term. This information includes the related Packaging Information. Dissemination Information Package (DIP) An Information Package that contains part or all of one or more AIPs and that is distributed to the consumer as requested. Information Package Content Information Preservation Description Descriptive Information about the Package Digital Collection Building

23 Producer Focus: Infrastructure to support different ingest streams (Batch Builder) Stable SIP and DIP definitions Alignment QA processes between supplier and deposit organization Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building

24 Consumer Focus: Solid identification, authentication and authorization infrastructure Support for multi-channel delivery (Packaging & Delivery) Collection building (value-add of deposit organization) Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building

25 Preservation Focus: Management of technical metadata
Strong focus on providing accessibility (identification, performing technical preservation, building rendering environment) Definition of authenticity criteria Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building

26 Typical DIAS implementation scenario
Requirements analysis Project starts with requirements analysis, a fit-gap analysis and see how far DIAS-Core fits all requirements Application development Possible new applications components have to be designed, built and tested Infrastructure design and implementation Definition and implementation into the customer’s environment, conforming to the DIAS compliant infrastructure Delivery Through Fast Deploy deliver the DIAS solution in the different environments (DTA) Acceptance Acceptance of the system by the customer Digital Collection Building

27 Interoperability Interoperability will be key to the success of the next generation long-term electronic deposit systems Digital Collection Building

28 Reference Model for an „Open Archival Information System" – ISO 14721
Digital Collection Building

29 IBM´s DIAS is based on some OAIS modifications developed in the Networked European Deposit Library (EU-Projekt 2000 – 2002) Delivery & Capture handles the pre-processing of digital objects to be ingested. It receives or captures digital objects and offers a working space for verification in conformance with the specifications for ingestion into the electronic deposit system. Packaging & Delivery is the output interface of the deposit system. It handles the post-processing of digital objects retrieved from the electronic deposit system. It negotiates access requests, delivers and installs electronic publications along with the appropriate software for viewing or running the electronic publication and handles the metadata for direct access by the requestor. Ingest Preservation Data Management Access Archival Storage Delivery & Capture Packaging Administration Monitoring & Logging Query AIP SIP DIP Digital Collection Building

30 DIAS-Core: Technical Components
ISIP IDIP Ingest Access Archival Storage Preservation Planning Administration CM Resource Manager TSM Storage Server Loader Retriever Data Management AccessManager CM Library Server Logging Reporting Preservation Manager Monitoring and Control Digital Collection Building

31 DIAS-Core: Application Architecture
Client Layer Midtier Layer Server Layer Physical Storage Layer Admin Client Admin Access Manager Server Retriever Client Retriever DB2 CM LS CM RM1..n Tape Magnetic Disk SAN Loader TSM Optical Access Manager Logger Persistent Identifier Generator Monitoring & Control Digital Collection Building

32 WebSphere Application Server
Content Manager together with other members of the DB2, Tivoli and WebSphere product families covers the majority of components in the e-Depot architecture Archival Storage Ingest Preservation Data Management Access Delivery & Capture Packaging Administration External System Monitoring Content Manager DB2 WebSphere Application Server TSM Digital Collection Building

33 Delivery&Capture and Package&Delivery
Java applications running on Websphere Application Server take care of specialized DIAS functions and customer specific requirements Ingest and Access: Implement SIP and DIP interfaces Translate and validate customer metadata to DIAS metadata Delivery&Capture and Package&Delivery Customer functions to create SIPs and extract DIPs Customer functions for Preservation Manager Implement functions to monitor file formats and support migration of file formats Digital Collection Building

34 Reference on long-term preservation
Raymond Lorie and Raymond J. van Diessen: Long-Term Preservation of Complex Processes in IS&T Archiving Conference, Washington, DC, April 26-29, 2005. Hoeven, J.R. van der, Diessen, R.J. van en Meer, K. van der, Development of a Universal Virtual Computer (UVC) for long-term preservation of digital objects, Journal of Information Science, vol. 31(3), p , 2005. Raymond van Diessen and Raymond Lorie, UVC: A Universal Computer for Long-Term Preservation of Digital Information, RJ 10338, IBM Almaden Research Center, San Jose, CA, 2005 Eric Oltmans, Raymond J. van Diessen, Hilde van Wijngaarden: Preservation Functionality in a Digital Archive in ACM/IEEE Joint Conference on Digital Libraries,Tucson, AZ, June 7-11, 2004 Raymond J. van Diessen and Titia van der Werf - Davelaar: Authenticity in a Digital Environment, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: Raymond J. van Diessen: Preservation Requirements in a Deposit System, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: Raymond J. van Diessen and Ben J. van Rijnsoever: Managing Media Migration in a Deposit System, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: Raymond J. van Diessen and Johan F. Steenbakkers: The Long-Term Preservation Study of the DNEP Project - an Overview of the Results, in: IBM / KB Long-term Preservation Study Report Series, IBM Global Services Netherlands, 2002, ISBN/ISSN: X Digital Collection Building

35 DIAS Solution: http://www.ibm.com/nl/dias/
Important sites related to DIAS and IBM´s long-term preservation effort: DIAS Solution: IBM Alphaworks UVC: KB: Kopal: Digital Collection Building


Download ppt "IBM Experiences in Digital Collection Building Virtual Libraries"

Similar presentations


Ads by Google