Archiving Digital Resources for Future

Slides:



Advertisements
Similar presentations
Current State of Play in Digital Preservation Peter B. Hirtle Cornell University Library Society of American Archivists.
Advertisements

Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
An Introduction June 17, 2013 Open Archival Information System (OAIS)
Digital Content Solutions Digital content management technology has transformed the way to manage content and knowledge, in this knowledge era. Research.
| IFLA2010. Newspaper Section | Newspaper Resources in transition: Digital Preservation and Access - keynote - IFLA International Newspaper.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
3. Technical and administrative metadata standards Metadata Standards and Applications.
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Introduction to Implementing an Institutional Repository Delivered to Technical Services Staff Dr. John Archer Library University of Regina September 21,
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
The University of Auckland New Zealand 3 November 2006 Teaching & Learning Online: a perspective from a University Librarian Speaker: Janet Copsey, The.
Grey Literature, E-Repositories and Evaluation of Academic & Research Institutes. The case study of BPI e-repository Maria V. Kitsiou - Head Librarian,
Metadata for preservation Michael Day, UKOLN, University of Bath Chinese-European Workshop on Digital Preservation,
© LNB Latvian Digital Library as a Resource for Life-long Learning Uldis Straujums Lecturer, University of Latvia, Information System Designer, National.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
Swapan Deoghuria Scientist-II, Computer Centre Indian Association for the Cultivation of Science Kolkata , INDIA URL:
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
Amos Kujenga ADLSN Training Coordinator Addis Ababa, Ethiopia 5 – 7 November 2014 Introduction To Digital Libraries and Repositories.
Open Access to Grey Literature: Challenges and Opportunities in India By Dr. Manorama Tripathi Prof. H. N. Prasad Banaras Hindu University, Varanasi. Mr.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Life Cycle Models & Principles Jake Carlson Associate Professor of Library Science Data Services Specialist Purdue University Libraries.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Linked Digital Archive Institutional Repository Rathachai Chawuthai CSIM/SET/AIT.
EVA Workshop, 26 March 2003, Florence, Italy1 COINE Cultural Objects In Networked Environments Anthi Baliou University of Macedonia,Library Thessaloniki,
Introduction to metadata
Metadata for digital preservation: a review of recent developments Michael Day UKOLN, University of Bath ECDL2001, 5th European Conference.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
The OAIS Reference Model Michael Day, Digital Curation Centre UKOLN, University of Bath Reference Models meeting,
Preservation metadata and the Cedars project Michael Day UKOLN: UK Office for Library and Information Networking University of Bath
ARIADNE is funded by the European Commission's Seventh Framework Programme Archiving and Repositories Holly Wright.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Cedars work on metadata Michael Day UKOLN, University of Bath Cedars Workshop Manchester, February 2002.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Data Management and Digital Preservation Carly Dearborn, MSIS Digital Preservation & Electronic Records Archivist
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Informatics for Scientific Data Bio-informatics and Medical Informatics Week 9 Lecture notes INF 380E: Perspectives on Information.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Metadata Issues in Long-term Management of Data and Metadata
Building A Repository for Digital Objects
Implementing an Institutional Repository: Part II
An Open Archival Repository System for UT Austin
Oya Y. Rieger Cornell University Library May 2004
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Archiving Digital Resources for Future Shigeo Sugimoto Research Center for Knowledge Communities Grad. School of Library, Information and Media Studies University of Tsukuba Japan sugimoto@slis.tsukuba.ac.jp

Personal Backgrounds Born in Osaka, Japan in 1953 Education: BE, ME and PhD from Dept. Information Science, Faculty of Engineering, Kyoto University Software Engineering and Programming Languages Job: Faculty at a LIS school in Tsukuba since 1983 Research: Digital Libraries, Digital Archives, Metadata International activities: Dublin Core Metadata Initiative Digital libraries, preservation, metadata research conferences Consortium of information Schools in Asia-Pacific (CiSAP) Governmental Committee works: Records Management, Digital Archive and Publishing Digital Resources for National Diet Library, National Archives of Japan

Goal of this Talk Discuss issues for long-term use of digital resources as the important asset of our knowledge-centric society View digital archives as a well-organized collection of information and knowledge resources in our networked information society Understand issues for further development of digital archives from broad perspectives

Outline Introduction Terms Digital Archive Digital Preservation Metadata Concluding Remarks

Introduction Our information environment is already “paperless”, e.g. Create, deliver, store and access documents via the Internet Use papers as a one-time media for reading and discard them once the contents are read. Scan in prints, store contents electronically and discard prints Use on-line dictionaries more frequently than printed dictionary Resources are born in a digital form and used in a digital environment Digital Cameras, Smart Phones Digital Books and Mobile Reading Devices

Introduction Information resources are easily lost unless they are paid special attentions, Deterioration of Papers, Films, CDs/DVDs Software Obsolescence Archiving and preservation of digital resources is indispensable for Keeping the resources searchable, accessible and usable Maintaining the resources for future users

Introduction Long-term use of digital resources is a crucial issue for the networked information society because We are so heavily relying on the networked information environment today, So many information and knowledge resources are published and consumed digitally, So many government and corporate records are created and stored digitally, and We need to keep our information and knowledge resources for future users, but Life time of digital media is shorter compared with papers.

Introduction The goal of this talk is to overview digital archive and its long-term use, which covers Terms and concepts, Typical digital archive services, Preservation of digital resources, Metadata issues, and Personal perspectives for future

Terms and Concepts Resource (Information Resource): Any instance from which we get information, typically a book, a paper, a file or a set of files. Born Digital Resource: A digital resource created natively in a digital form Digitized Resource: A resource created by converting a physical resource or non-digital resource into a digital format Turned Digital Resource: Same as Digitized Resource Metadata: Data about data (or data about a resource)

Terms and Concepts Digital Archive: Collection of digital resources organized and preserved for long-term use of the resources Collecting, organizing and preserving digital resources for long-term use Digital Preservation: Preserving digital objects for long-term use Digital Curation: Similar to Digital Archive Maintaining, preserving and adding value to digital research data throughout its lifecycle, by Digital Curation Centre, UK (http://www.dcc.ac.uk/)

Digital Archive – Typical Digital Archives Web Archive - Archived collection of Web resources, e.g. Internet Archive Institutional Repository, Scholarly Archive - Archived collection of scholarly resources, e.g. academic institutional repositories, preprints and technical reports archives, electronic theses and dissertations Digitized collection of cultural and historical resources, e.g. digitized collection of library and museum holdings such as American Memory, World Digital Library Digital collection of records of governments and corporate bodies

Digital Archive – Some Examples Very High-Tech High-Quality Digitization of Physical Objects 3D sensing + Virtual Reality technology, e.g. digitization of Bayon at the ruins of Angkor (http://www.cvl.iis.u-tokyo.ac.jp/projects.html) Massive Digitization of Books and Documents Google Books project Book digitization by National Diet Library, Japan (http://www.ndl.go.jp/en/data/endl.html) 240K books online off-library use, 570 K books in-house use Records Database at Japan Center for Asian Historical Records (http://www.jacar.go.jp/english/index.html) 22M images as of 2011.4, 1.6M catalog records of Japanese Government before World War II from Meiji Era

Digital Archive – Some Examples Collaborative Archives Europeana (http://europeana.eu/portal/): “Paintings, music, films and books from Europe's galleries, libraries, archives and museums“ World Digital Library (http://www.wdl.org/en/): “The World Digital Library (WDL) makes available on the Internet, free of charge and in multilingual format, significant primary materials from countries and cultures around the world. “ National Digital Archive Project, Taiwan (NDAP) (http://www.ndap.org.tw/index_en.php) Taiwan e-Learning and Digital Archive Program (TELDAP) (http://www.teldap.tw/en/): Multi-Disciplinary National Archives

Digital Archive – Why Digital Archive? Easy and flexible access to important resources Collect and organize information resources for users in the networked information environment Geographical distance has been a fundamental barrier for the general public to access valuable resources stored at major memory institutions, e.g. national libraries, national archives, national museums Equal access for anyone to valuable resources is crucial to empower the progress of our knowledge centric society Encourage inter- and cross-disciplinary use of resources

Digital Archive – Why Digital Archive? Preserving digital resources for future users Many important resources already exist only in digital forms Adding values by maintaining valuable resources for long period of time Preserving non-digital resources using digital technologies for future users Physical resources may be broken or lost by disaster

Digital Archive – Why Digital Archive? Preserving born digital resources Preparation for the growth of digital publishing and electronic records of governments Legal deposit of digitally published resources Digital archives of e-government records Preserving database contents for future use Scientific databases, statistics databases, etc. Preserving Web and Internet resources Open Web and Hidden Web Institutional Web (Intranet Web)

Preservation: Keep Resources Accessible and Usable Archival Functions Resources to be archived Resources for Users Collection Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Preservation: Keep Resources Accessible and Usable

Preservation: Keep Resources Accessible and Usable Archival Functions Resources to be archived Resources for Users Collection Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Trusted Preservation: Keep Resources Accessible and Usable

Sharing Preservation Repository Resources to be archived Resources for Users Collection Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Trusted Repository Sharing Preservation Function

Digital Preservation - Fundamental Issues - In general, life-time of digital resources is short Rapid progress and change of technologies Hardware issues: Short life-time of electronic memory media and their players, e.g. Floppy, CD, DVD, LD, Video Tapes/Cassettes, Audio Tapes/Cassettes, Magnetic Tapes, etc. Software issues: Frequent version changes of software tools and their running environment dependency, e.g., word processors, authoring tools, spreadsheets, browsers, PC operating systems, etc.

Digital Preservation - Fundamental Issues - The diversity of hardware and software is always increasing Special purpose software is used for specific contents, which are usually high-end contents 3D graphics, Virtual Reality, Interactive Contents The volume of database contents is always increasing Network oriented digital publishing is growing – paradigm shift toward digital publishing

Digital Preservation - Basic Solution - Migration and Emulation Migration: migrate the preserved resources to a new system environment Emulation: build an emulator to realize a working environment for the preserved resources Metadata Fundamental component for preservation to record information about a preserved resource and to keep track of its preservation history Descriptive, administrative and technical metadata for archiving and preserving resources Preservation of metadata and its schema is required

Digital Preservation - Basic Solution - Open Archival Information System (OAIS) [1] International Standard Reference Model for Archival Systems: a system framework for archival systems Information Package: a package structure to keep data object for long-term Information Object Preservation Description Information: Metadata for preservation Package information: metadata for finding and managing information package [1] CCSDS Reference Model for an Archival Information System, http://public.ccsds.org/publications/archive/650x0b1.PDF

Digital Preservation: OAIS DIP SIP queries result sets orders Preservation Planning Access Data Management Ingest Archival Storage Administration Descriptive Info AIP PRODUCER CONSUMER MANAGEMENT IP: Information Package SIP: Submission IP, AIP: Archival IP, DIP: Dissemination IP

OAIS: Information Object Data Object Representation Information Information Object + → Data Object: Physical or Digital Object Representation Information: Information required to represent a data object in a meaningful way for users Structural, Semantic, Technological information

OAIS: Information Package and Content Information Information Object Preservation Description Information (PDI) Packaging Information Information Package Description about package Content Information Set of information that is the original target of preservation Content Data Object together with its Representation Information, i.e. Information Object

OAIS: Preservation Description Information Reference: one or more mechanisms used to provide assigned identifiers for the Content Information, e.g. taxonomic systems, reference systems, registration systems Info. Object PDI Packaging Information Information Package Context: relationships of the Content Information to its environment Provenance: history of the Content Information, i.e., origin or source of the Content Information, any changes that may have taken place since it was originated, and who has had custody of it since it was originated Fixity: Data Integrity checks or Validation/Verification keys used to ensure that the particular Content Information object has not been altered in an undocumented manner

Digital Preservation: Metadata Issues Metadata for Digital Preservation Metadata schema projects based on OAIS PDI Ceders, Nedlib, OCLC-RLG METS: Metadata Encoding and Transmission Standard (http://www.loc.gov/standards/mets/) standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library Container Standard which has seven categories of description PREMIS (http://www.loc.gov/standards/premis/) Preservation Metadata: Implementation Strategies PREMIS Data Model PREMIS Data Dictionary

Intellectual Entities PREMIS Data Model Data model shows entities and their relationships Intellectual Entities Objects Events Rights Agents

Intellectual Entities PREMIS Data Model Intellectual Entities Objects Events Rights Agents Information objects to be preserved Separation of Intellectual Entity and Objects --- Files of the same content and in different formats

Intellectual Entities PREMIS Data Model Intellectual Entities Objects Events Rights Agents Entities associated with preservation tasks

Digital Preservation - Fundamental Issues Again- There is no Perfect solution for preservation of digital preservation There is no perfect solution for preservation of non-digital resources either Preservation of conventional non-digital resources is mainly preservation of the information media Digital preservation is primarily preservation of the information contents but not the information media, i.e. preservation of contents but not container For example, electronic journals published on the Web use no tangible media, emails at governmental sectors could be an official record.

Digital Preservation - Fundamental Issues Again- How do we preserve? Preserve the original content in the original binary data in the original format Keep the functionality and look-and-feel, Risk of obsolescence of software and hardware to render and interact with the content Convert the original content into a format suitable for long-term use, i.e. widely used standard formats are preferable May loose some functionality of digital resources, e.g. hyperlinks, dynamic contents Need to identify the important content that have to be preserved

Digital Preservation - Fundamental Issues Again- Confidentiality, Integrity, Authenticity Crucial aspects for preservation, especially preservation of official documents Confidentiality changes over time Rights Issues Copyright issues Privacy issues Metadata Issues Metadata has to be preserved with the primary resources, otherwise the resource would loose their value Metadata schema has to be preserved as well, otherwise metadata will lose interpretability Semantics of metadata terms have to be recorded and preserved

Digital Preservation - Fundamental Issues Again- Proper preservation management Preservation planning based on risk management Obsolescence of software and hardware Degradation of memory media Digital preservation is a management issue rather than a technological issue, because There is no perfect technological solution to preserve anything forever We need to determine what and how information resources should be preserved We need to cope with organizational changes of archives and also manage archives under social circumstances changes

Digital Preservation - Fundamental Issues Again- A personal perspective We are responsible to preserve resources for our next generation. It is not realistic for us to expect technology and social environment changes for 100 years or 1000 years. Digital technologies change very rapidly which is disadvantageous for preservation from the viewpoint of stability. However, digital resources are easily and flexibly copiable, which is a significant advantage for preservation.

Metadata (Structured) Data about Data Description about a resource from a certain point of view in accordance with the requirements in the domain resource Metadata

Metadata Users search, access, evaluate a resource, and pay money for the resource on the network These tasks are carried out in the virtual space but not physical space Metadata is required in all tasks of this process We need to use metadata technology suitable to our applications and also to our network environment Tasks over the Net

Metadata Interoperability is a key issue for metadata Interoperability across communities Interoperability over time --- Preservation A fundamental barrier is semantic gap between communities Same word for different concepts Different words for a same concept Linked Open Data – Sharing concepts expressed as data, i.e., terms, phrases, etc.

Metadata Promote sharing and reuse of metadata vocabularies Metadata vocabulary – a controlled set of terms used to express metadata – is semantic basis of metadata Sharing metadata vocabulary means sharing concepts Application Profile concept of Dublin Core Mixing and matching metadata vocabularies Clear separation of metadata vocabularies and structural constraints in a metadata schema

Application Profile A metadata schema (conceptual) Title Mandatory Subject Optional Repeatable Author Mandatory Publisher Mandatory if applicable A metadata schema (conceptual)

Application Profile Choose appropriate terms for an application scheme Title Subject Author Publisher Title Date Subject Author Type Publisher Metadata Vocabulary 2 (Metadata Element Set) Metadata Vocabulary 1 (Metadata Element Set) Choose appropriate terms for an application scheme

Application Profile Structural constraints for every element Title Mandatory Subject Optional Repeatable Author Publisher If applicable Title Date Subject Author Type Publisher Metadata Vocabulary 2 (Metadata Element Set) Metadata Vocabulary 1 (Metadata Element Set) Define encoding scheme for implementation

Some Remarks before Conclusion A personal perspective learned from 2011.3.11 Quake and Tsunami Physical stuffs are easily lost. Many heritage resources were lost. Many PCs and servers were lost. More robust infrastructure is required to keep important resources safe and preserve them for future Robust Cloud environment looks advantageous, however Current Cloud is too simple to be adopted for archiving important resources

Some Remarks before Conclusion Archival Cloud – a layered architecture Application Systems / Services Archiving as a Service Collect Collect, Organize, Re-format, Rights Management Access Search and Access Browse Access Control Preservation as a Service Preserve

Some Remarks before Conclusion A personal perspective for promoting digital environment at Memory Institutions, e.g. Museums, Libraries, and Archives High-tech, high-quality digitization is crucial to increase the potentials of MLAs Adoption of digital technologies, which should be really usable but need not be high-tech, is crucial to promote usability of resources at MLAs Human resource development is crucial to further develop MLAs for future networked information society

Some Remarks before Conclusion A personal perspective learned from governmental committee works Paradigm shift in publishing environment Shrinking print publishing market, expanding digital publishing market in Japan Print Publishing 2600 Byen (1996) → 1900 Byen (2009) Digital Publishing 0.4 BYen(2004) → 50 BYen (2009) E-publishing business Mangas on mobile phones has been growing E-book readers and smart phones may expand the market Piracy issue for Mangas (Comics) and Novels Illegal scanlation of weekly Manga magazines Rights issues – relationship between publishers and creators

Some Remarks before Conclusion Governmental records management Promotion of e-Gov but real change is slow New national law for official records management, effective since 2011.4 Need improvement of records management and archival services Hope to promote national infrastructure for records management and archives Book digitization at NDL NDL which is a national legal deposit library is allowed to covert books into digital format for preservation purpose NDL and publishers have agreed to make digitized books accessible at public libraries for those books which are not obtainable in the market even if their copyrights are still alive

Conclusion Digital Archive is an important function and service for our society to maintain valuable intellectual resources and preserve for future There are many different types of digital archives but their mission is to select, collect, organize, preserve and provide access to valuable resources Digital preservation is a challenging task but we have to find appropriate solutions. There is no unique solution. We need to find an appropriate solution in accordance with requirements of the archiving task and the community.

Conclusion Metadata is an important component for archiving and preservation. Preservation of metadata is a challenging task as well as preservation of primary resources

Thank you very much for your attention and patience For Your Information iPres 2011: Int’l Conf. on Preservation of Digital Objects, November 1-4, Singapore http://ipres2011.sg/ Any questions: sugimoto@slis.tsukuba.ac.jp