Presentation is loading. Please wait.

Presentation is loading. Please wait.

Archiving Digital Resources for Future

Similar presentations


Presentation on theme: "Archiving Digital Resources for Future"— Presentation transcript:

1 Archiving Digital Resources for Future
Shigeo Sugimoto Research Center for Knowledge Communities Grad. School of Library, Information and Media Studies University of Tsukuba Japan

2 Personal Backgrounds Born in Osaka, Japan in 1953
Education: BE, ME and PhD from Dept. Information Science, Faculty of Engineering, Kyoto University Software Engineering and Programming Languages Job: Faculty at a LIS school in Tsukuba since 1983 Research: Digital Libraries, Digital Archives, Metadata International activities: Dublin Core Metadata Initiative Digital libraries, preservation, metadata research conferences Consortium of information Schools in Asia-Pacific (CiSAP) Governmental Committee works: Records Management, Digital Archive and Publishing Digital Resources for National Diet Library, National Archives of Japan

3 Goal of this Talk Discuss issues for long-term use of digital resources as the important asset of our knowledge-centric society View digital archives as a well-organized collection of information and knowledge resources in our networked information society Understand issues for further development of digital archives from broad perspectives

4 Outline Introduction Terms Digital Archive Digital Preservation
Metadata Concluding Remarks

5 Introduction Our information environment is already “paperless”, e.g.
Create, deliver, store and access documents via the Internet Use papers as a one-time media for reading and discard them once the contents are read. Scan in prints, store contents electronically and discard prints Use on-line dictionaries more frequently than printed dictionary Resources are born in a digital form and used in a digital environment Digital Cameras, Smart Phones Digital Books and Mobile Reading Devices

6 Introduction Information resources are easily lost unless they are paid special attentions, Deterioration of Papers, Films, CDs/DVDs Software Obsolescence Archiving and preservation of digital resources is indispensable for Keeping the resources searchable, accessible and usable Maintaining the resources for future users

7 Introduction Long-term use of digital resources is a crucial issue for the networked information society because We are so heavily relying on the networked information environment today, So many information and knowledge resources are published and consumed digitally, So many government and corporate records are created and stored digitally, and We need to keep our information and knowledge resources for future users, but Life time of digital media is shorter compared with papers.

8 Introduction The goal of this talk is to overview digital archive and its long-term use, which covers Terms and concepts, Typical digital archive services, Preservation of digital resources, Metadata issues, and Personal perspectives for future

9 Terms and Concepts Resource (Information Resource): Any instance from which we get information, typically a book, a paper, a file or a set of files. Born Digital Resource: A digital resource created natively in a digital form Digitized Resource: A resource created by converting a physical resource or non-digital resource into a digital format Turned Digital Resource: Same as Digitized Resource Metadata: Data about data (or data about a resource)

10 Terms and Concepts Digital Archive:
Collection of digital resources organized and preserved for long-term use of the resources Collecting, organizing and preserving digital resources for long-term use Digital Preservation: Preserving digital objects for long-term use Digital Curation: Similar to Digital Archive Maintaining, preserving and adding value to digital research data throughout its lifecycle, by Digital Curation Centre, UK (

11 Digital Archive – Typical Digital Archives
Web Archive - Archived collection of Web resources, e.g. Internet Archive Institutional Repository, Scholarly Archive - Archived collection of scholarly resources, e.g. academic institutional repositories, preprints and technical reports archives, electronic theses and dissertations Digitized collection of cultural and historical resources, e.g. digitized collection of library and museum holdings such as American Memory, World Digital Library Digital collection of records of governments and corporate bodies

12 Digital Archive – Some Examples
Very High-Tech High-Quality Digitization of Physical Objects 3D sensing + Virtual Reality technology, e.g. digitization of Bayon at the ruins of Angkor ( Massive Digitization of Books and Documents Google Books project Book digitization by National Diet Library, Japan ( 240K books online off-library use, 570 K books in-house use Records Database at Japan Center for Asian Historical Records ( 22M images as of , 1.6M catalog records of Japanese Government before World War II from Meiji Era

13 Digital Archive – Some Examples
Collaborative Archives Europeana ( “Paintings, music, films and books from Europe's galleries, libraries, archives and museums“ World Digital Library ( “The World Digital Library (WDL) makes available on the Internet, free of charge and in multilingual format, significant primary materials from countries and cultures around the world. “ National Digital Archive Project, Taiwan (NDAP) ( Taiwan e-Learning and Digital Archive Program (TELDAP) ( Multi-Disciplinary National Archives

14 Digital Archive – Why Digital Archive?
Easy and flexible access to important resources Collect and organize information resources for users in the networked information environment Geographical distance has been a fundamental barrier for the general public to access valuable resources stored at major memory institutions, e.g. national libraries, national archives, national museums Equal access for anyone to valuable resources is crucial to empower the progress of our knowledge centric society Encourage inter- and cross-disciplinary use of resources

15 Digital Archive – Why Digital Archive?
Preserving digital resources for future users Many important resources already exist only in digital forms Adding values by maintaining valuable resources for long period of time Preserving non-digital resources using digital technologies for future users Physical resources may be broken or lost by disaster

16 Digital Archive – Why Digital Archive?
Preserving born digital resources Preparation for the growth of digital publishing and electronic records of governments Legal deposit of digitally published resources Digital archives of e-government records Preserving database contents for future use Scientific databases, statistics databases, etc. Preserving Web and Internet resources Open Web and Hidden Web Institutional Web (Intranet Web)

17 Preservation: Keep Resources Accessible and Usable
Archival Functions Resources to be archived Resources for Users Collection Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Preservation: Keep Resources Accessible and Usable

18 Preservation: Keep Resources Accessible and Usable
Archival Functions Resources to be archived Resources for Users Collection Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Trusted Preservation: Keep Resources Accessible and Usable

19 Sharing Preservation Repository
Resources to be archived Resources for Users Collection Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Trusted Repository Sharing Preservation Function

20 Digital Preservation - Fundamental Issues -
In general, life-time of digital resources is short Rapid progress and change of technologies Hardware issues: Short life-time of electronic memory media and their players, e.g. Floppy, CD, DVD, LD, Video Tapes/Cassettes, Audio Tapes/Cassettes, Magnetic Tapes, etc. Software issues: Frequent version changes of software tools and their running environment dependency, e.g., word processors, authoring tools, spreadsheets, browsers, PC operating systems, etc.

21 Digital Preservation - Fundamental Issues -
The diversity of hardware and software is always increasing Special purpose software is used for specific contents, which are usually high-end contents 3D graphics, Virtual Reality, Interactive Contents The volume of database contents is always increasing Network oriented digital publishing is growing – paradigm shift toward digital publishing

22 Digital Preservation - Basic Solution -
Migration and Emulation Migration: migrate the preserved resources to a new system environment Emulation: build an emulator to realize a working environment for the preserved resources Metadata Fundamental component for preservation to record information about a preserved resource and to keep track of its preservation history Descriptive, administrative and technical metadata for archiving and preserving resources Preservation of metadata and its schema is required

23 Digital Preservation - Basic Solution -
Open Archival Information System (OAIS) [1] International Standard Reference Model for Archival Systems: a system framework for archival systems Information Package: a package structure to keep data object for long-term Information Object Preservation Description Information: Metadata for preservation Package information: metadata for finding and managing information package [1] CCSDS Reference Model for an Archival Information System,

24 Digital Preservation: OAIS
DIP SIP queries result sets orders Preservation Planning Access Data Management Ingest Archival Storage Administration Descriptive Info AIP PRODUCER CONSUMER MANAGEMENT IP: Information Package SIP: Submission IP, AIP: Archival IP, DIP: Dissemination IP

25 OAIS: Information Object
Data Object Representation Information Information Object Data Object: Physical or Digital Object Representation Information: Information required to represent a data object in a meaningful way for users Structural, Semantic, Technological information

26 OAIS: Information Package and Content Information
Information Object Preservation Description Information (PDI) Packaging Information Information Package Description about package Content Information Set of information that is the original target of preservation Content Data Object together with its Representation Information, i.e. Information Object

27 OAIS: Preservation Description Information
Reference: one or more mechanisms used to provide assigned identifiers for the Content Information, e.g. taxonomic systems, reference systems, registration systems Info. Object PDI Packaging Information Information Package Context: relationships of the Content Information to its environment Provenance: history of the Content Information, i.e., origin or source of the Content Information, any changes that may have taken place since it was originated, and who has had custody of it since it was originated Fixity: Data Integrity checks or Validation/Verification keys used to ensure that the particular Content Information object has not been altered in an undocumented manner

28 Digital Preservation: Metadata Issues
Metadata for Digital Preservation Metadata schema projects based on OAIS PDI Ceders, Nedlib, OCLC-RLG METS: Metadata Encoding and Transmission Standard ( standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library Container Standard which has seven categories of description PREMIS ( Preservation Metadata: Implementation Strategies PREMIS Data Model PREMIS Data Dictionary

29 Intellectual Entities
PREMIS Data Model Data model shows entities and their relationships Intellectual Entities Objects Events Rights Agents

30 Intellectual Entities
PREMIS Data Model Intellectual Entities Objects Events Rights Agents Information objects to be preserved Separation of Intellectual Entity and Objects Files of the same content and in different formats

31 Intellectual Entities
PREMIS Data Model Intellectual Entities Objects Events Rights Agents Entities associated with preservation tasks

32 Digital Preservation - Fundamental Issues Again-
There is no Perfect solution for preservation of digital preservation There is no perfect solution for preservation of non-digital resources either Preservation of conventional non-digital resources is mainly preservation of the information media Digital preservation is primarily preservation of the information contents but not the information media, i.e. preservation of contents but not container For example, electronic journals published on the Web use no tangible media, s at governmental sectors could be an official record.

33 Digital Preservation - Fundamental Issues Again-
How do we preserve? Preserve the original content in the original binary data in the original format Keep the functionality and look-and-feel, Risk of obsolescence of software and hardware to render and interact with the content Convert the original content into a format suitable for long-term use, i.e. widely used standard formats are preferable May loose some functionality of digital resources, e.g. hyperlinks, dynamic contents Need to identify the important content that have to be preserved

34 Digital Preservation - Fundamental Issues Again-
Confidentiality, Integrity, Authenticity Crucial aspects for preservation, especially preservation of official documents Confidentiality changes over time Rights Issues Copyright issues Privacy issues Metadata Issues Metadata has to be preserved with the primary resources, otherwise the resource would loose their value Metadata schema has to be preserved as well, otherwise metadata will lose interpretability Semantics of metadata terms have to be recorded and preserved

35 Digital Preservation - Fundamental Issues Again-
Proper preservation management Preservation planning based on risk management Obsolescence of software and hardware Degradation of memory media Digital preservation is a management issue rather than a technological issue, because There is no perfect technological solution to preserve anything forever We need to determine what and how information resources should be preserved We need to cope with organizational changes of archives and also manage archives under social circumstances changes

36 Digital Preservation - Fundamental Issues Again-
A personal perspective We are responsible to preserve resources for our next generation. It is not realistic for us to expect technology and social environment changes for 100 years or 1000 years. Digital technologies change very rapidly which is disadvantageous for preservation from the viewpoint of stability. However, digital resources are easily and flexibly copiable, which is a significant advantage for preservation.

37 Metadata (Structured) Data about Data
Description about a resource from a certain point of view in accordance with the requirements in the domain resource Metadata

38 Metadata Users search, access, evaluate a resource, and pay money for the resource on the network These tasks are carried out in the virtual space but not physical space Metadata is required in all tasks of this process We need to use metadata technology suitable to our applications and also to our network environment Tasks over the Net

39 Metadata Interoperability is a key issue for metadata
Interoperability across communities Interoperability over time --- Preservation A fundamental barrier is semantic gap between communities Same word for different concepts Different words for a same concept Linked Open Data – Sharing concepts expressed as data, i.e., terms, phrases, etc.

40 Metadata Promote sharing and reuse of metadata vocabularies
Metadata vocabulary – a controlled set of terms used to express metadata – is semantic basis of metadata Sharing metadata vocabulary means sharing concepts Application Profile concept of Dublin Core Mixing and matching metadata vocabularies Clear separation of metadata vocabularies and structural constraints in a metadata schema

41 Application Profile A metadata schema (conceptual) Title Mandatory
Subject Optional Repeatable Author Mandatory Publisher Mandatory if applicable A metadata schema (conceptual)

42 Application Profile Choose appropriate terms for an application scheme
Title Subject Author Publisher Title Date Subject Author Type Publisher Metadata Vocabulary 2 (Metadata Element Set) Metadata Vocabulary 1 (Metadata Element Set) Choose appropriate terms for an application scheme

43 Application Profile Structural constraints for every element
Title Mandatory Subject Optional Repeatable Author Publisher If applicable Title Date Subject Author Type Publisher Metadata Vocabulary 2 (Metadata Element Set) Metadata Vocabulary 1 (Metadata Element Set) Define encoding scheme for implementation

44 Some Remarks before Conclusion
A personal perspective learned from Quake and Tsunami Physical stuffs are easily lost. Many heritage resources were lost. Many PCs and servers were lost. More robust infrastructure is required to keep important resources safe and preserve them for future Robust Cloud environment looks advantageous, however Current Cloud is too simple to be adopted for archiving important resources

45 Some Remarks before Conclusion
Archival Cloud – a layered architecture Application Systems / Services Archiving as a Service Collect Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Preservation as a Service Preserve

46 Some Remarks before Conclusion
A personal perspective for promoting digital environment at Memory Institutions, e.g. Museums, Libraries, and Archives High-tech, high-quality digitization is crucial to increase the potentials of MLAs Adoption of digital technologies, which should be really usable but need not be high-tech, is crucial to promote usability of resources at MLAs Human resource development is crucial to further develop MLAs for future networked information society

47 Some Remarks before Conclusion
A personal perspective learned from governmental committee works Paradigm shift in publishing environment Shrinking print publishing market, expanding digital publishing market in Japan Print Publishing 2600 Byen (1996) → 1900 Byen (2009) Digital Publishing 0.4 BYen(2004) → 50 BYen (2009) E-publishing business Mangas on mobile phones has been growing E-book readers and smart phones may expand the market Piracy issue for Mangas (Comics) and Novels Illegal scanlation of weekly Manga magazines Rights issues – relationship between publishers and creators

48 Some Remarks before Conclusion
Governmental records management Promotion of e-Gov but real change is slow New national law for official records management, effective since Need improvement of records management and archival services Hope to promote national infrastructure for records management and archives Book digitization at NDL NDL which is a national legal deposit library is allowed to covert books into digital format for preservation purpose NDL and publishers have agreed to make digitized books accessible at public libraries for those books which are not obtainable in the market even if their copyrights are still alive

49 Conclusion Digital Archive is an important function and service for our society to maintain valuable intellectual resources and preserve for future There are many different types of digital archives but their mission is to select, collect, organize, preserve and provide access to valuable resources Digital preservation is a challenging task but we have to find appropriate solutions. There is no unique solution. We need to find an appropriate solution in accordance with requirements of the archiving task and the community.

50 Conclusion Metadata is an important component for archiving and preservation. Preservation of metadata is a challenging task as well as preservation of primary resources

51 Thank you very much for your attention and patience
For Your Information iPres 2011: Int’l Conf. on Preservation of Digital Objects, November 1-4, Singapore Any questions:


Download ppt "Archiving Digital Resources for Future"

Similar presentations


Ads by Google