International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder,

Slides:



Advertisements
Similar presentations
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Advertisements

Strategic issues for digital projects... …or, what are we doing here?
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Document Management: tools and techniques * Choudhary, Pravin.Kr., Sr. Manager (Doc.), DLF ** Shokeen, Ashu (Dr.), Head & Chairman, KUK.
Sharpdesk Overview Desktop Composer Search Imaging      
IAEA International Atomic Energy Agency International Nuclear Information System (INIS) DIGITISATION INIS Training Seminar 7-11 October 2013, Vienna,
Mr Greenhalgh S4 Computing Int 1 Things you could do with knowing before the Exam…
Digital Photography Made Easy With Jim Battles Battles Photography.
Services Digitisation & Content Management. 600 People – India.
IAEA International Atomic Energy Agency 13th Joint INIS/ETDE Technical Committee Meeting October 2011, Vienna, Austria Dobrica Savić, INIS Unit Head.
Commercial Data Processing Lesson 2: The Data Processing Cycle.
Digital Imaging and Image Analysis
1 CS 502: Computing Methods for Digital Libraries Lecture 9 Conversion to Digital Formats Anne Kenney, Cornell University Library.
These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,
International Nuclear Information System (INIS) INIS - 43 Years of International Cooperation in the Field of Nuclear Information Head, INIS Unit NIS Section.
International Atomic Energy Agency The Role of the National INIS Centre and the INIS Secretariat INIS Training Seminar October 2013, Vienna, Austria.
IAEA International Atomic Energy Agency ICSTI 2013 Annual Members’ Meeting March 2013.
Capturing and optimising digital images for research Gilles Couzin.
Multimedia for the Web: Creating Digital Excitement Multimedia Element -- Graphics.
International Atomic Energy Agency Agenda item #2.3 Digital Preservation Services Jan Nov 2008) 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008,
International Atomic Energy Agency NCL documents Case Studies, Requirements Germain St-Pierre Thomas Kalapurackal Branko Krznaric INIS Unit INIS Training.
Digital Images. Scanned or digitally captured image Image created on computer using graphics software.
IAEA International Atomic Energy Agency Dobrica Savić & Germain St-Pierre Nuclear Information Section, IAEA Vienna Austria.
 Scanned or digitally captured image  Image created on computer using graphics software.
IAEA International Atomic Energy Agency International Nuclear Information System (INIS) OCR at INIS INIS Training Seminar 7-11 October 2013, Vienna, Austria.
Computer Systems Peripherals. What is a peripheral? A peripheral is a device which can be attached to a computer processor Peripherals can be internal.
HBCU-CUL Digital Imaging Workshop, November 2005
IT Introduction to Information Technology CHAPTER 05 - INPUT.
Prepared by George Holt Digital Photography BITMAP GRAPHIC ESSENTIALS.
IAEA International Atomic Energy Agency Digital Preservation at INIS United Nations Library and Information Network for Knowledge Sharing (UN-LINKS) 24.
Key terms and concepts: introducing first principles.
Unit 30 P1 – Hardware & Software Required For Use In Digital Graphics
IAEA International Atomic Energy Agency Agenda item 3.3 INIS IT developments 13th INIS/ETDE Joint Technical Committee Meeting October 2011, Vienna,
1 EDMS 101 Speaker: Monica Crocker, DHS EDMS Coordinator Overview of current project(s) Objective of this section: This session outlines EDMS fundamentals.
International Atomic Energy Agency NCL documents Case Studies, Requirements Germain St-Pierre Yves Reynaud-Pulido INIS Training Seminar Vienna, Austria,
International Atomic Energy Agency SUBMISSION OF NCL FULL TEXT TO INIS Germain St-Pierre INIS Training Seminar Vienna, Austria, 23–27 November 2009.
Digitisation of Archival and Manuscript Materials in Libraries Presentation by Martin Bradley.
Data management in the field Ari Haukijärvi 2nd EHES training seminar.
International Nuclear Information System (INIS)
The Complexities & Economics of Digitizing Microfilm
International Atomic Energy Agency 6.1 Digital Preservation Projects Review 11 th INIS/ETDE Joint Technical Committee Meeting 6-8 November 2007, Vienna,
Mark Sullivan Digital Library of the Caribbean. Imaging  Imaging Theory & Specifications  Recommended Equipment and Software 2 dLOC Training (7/29/2013)
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
Digitizing Photographs For Sustainable Heritage Workshop, June 12-15, 2014 By Steven Bingo Project Archivist, Washington State University.
Quality Levels of Reproduction Adolf Knoll National Library of the Czech Republic.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
Digital Image Capture of Musical Scores Jenn Riley, Indiana University Digital Library Program Ichiro Fujinaga, McGill University.
Graphics workshop Library and Information Services University of St Andrews.
IAEA International Atomic Energy Agency International Nuclear Information System (INIS) 2.3 Digital Preservation Activities 36 th Consultative Meeting.
Storage of digital objects Adolf Knoll National Library of the Czech Republic
 Scanned or digitally captured image  Image created on computer using graphics software.
IAEA International Atomic Energy Agency INIS Training Seminar Review of INIS Activities Dobrica Savić Head of INIS Unit 23 November 2009 Vienna, Austria.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
NCL documents Case Studies, Requirements Germain St-Pierre Vienna, Austria, October 2015.
OCR at INIS Branko Krznarić. Outline  What is OCR?  OCR Objectives  Principles  Techniques  Software INIS Training Seminar October 2015, Vienna,
International Atomic Energy Agency The Role of the National INIS Centre and the INIS Secretariat INIS Training Seminar November 2011, Vienna, Austria.
IAEA International Atomic Energy Agency Agenda item 1.6 Highlights of the 13th INIS/ETDE Joint Technical Committee Meeting Actions and Recommendations.
The Complexities & Economics of Scanning Microfilmed Documents Videos
1 Electronic Document Workflow Stephen P. Levenson Convener TC 171 SC2/JWG5 Chief Technology Office Administrative Office of the U.S. Courts.
Modul 2 Sifat Dasar Informasi Digital Mata Kuliah Preservasi Informasi Digital.
Scanner Scanner Introduction: Scanner is an input device. It reads the graphical images or line art or text from the source and converts.
DIGITIZATION IN THEORY AND PRACTICE WEBSITE: Helen Nneka Okpala Presentation done at University of.
Enabling compute operations, security & control of data / records Dynamic, relevant & accurate real-time data access to authorized users Informed decision.
RECORDS MANAGEMENT Judith Read and Mary Lea Ginn Chapter 12 Electronic Media and Image Records 1 © 2016 Cengage Learning ®. May not be scanned, copied.
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Digital Images.
Document Management: tools and techniques
Current Challenges in Digitization
Document Management: tools and techniques
Presentation transcript:

International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder, G. St-Pierre, Y. Reynaud-Pulido, T. Kalapurackal Database Production and Imaging Group, INIS Unit INIS & NKM Section

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 2 INIS Mission: preservation of nuclear knowledge serving as a reservoir of nuclear information provision of quality information services promotion of a culture of “information and knowledge sharing“ Digital Preservation at INIS

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 3 INIS Non-Conventional Literature (NCL )  Production of the INIS electronic Full Text Database Digital Preservation Activities  Digitization projects  at IAEA  at Member States Digital Preservation Digital Preservation at INIS

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 4 Objectives: Consistent, high-level of image quality Interoperability and accessibility of digitized resources Long-term preservation of digital resources for future generations  Member States  IAEA Develop good practices for digital preservation  ‘Overview of INIS Digital Preservation Practices ’: INIS Information Letter No. 253 & Attachment ( ) Digital Preservation Digital Preservation at INIS

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 5  INIS principles and workflow base on Cornell University’s digital imaging tutorial:  available in English, French, Spanish Digital Preservation Principles INIS Digital Preservation Principles

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 6 INIS Workflow  Document Benchmarking  Document Preparation  Scanning  Quality Control  Image Enhancement  Metadata Creation/Validation  Export including Compression  Completeness Check  Back-up  Post-processing  Storage and dissemination

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 7 Benchmarking & Document Preparation Benchmarking:  Adequately capture the ‘original’ content in digital form?  Physical format & condition meets digitizing requirements?  What is the type of material to be digitized?  Which resolution?  At which bit-depth?  Which compression parameters?  Estimated accuracy level for OCR?  Other considerations? Preparation :  Physically (unbind, remove staples/clips, etc.)  Structurally (add/remove barcodes, separate chapters, parts, etc.)  Characteristics of paper (eg. size, thick, glossy/mat, condition)

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 8 Scanning – Capture Modes & Optical Resolution Capture modes: depends on the physical form of original  Bitonal: 1 bit/pixel – black & white (printed text)  Greyscale: 8 bits/pixel – 256 grey shades (black & white photographs)  Colour: 24 bits/pixel – 16 million colours & grey shades (continuous tone & colour) Optical Resolution:  “dots per inch” (DPI) or “pixels per inch” (PPI)  High resolution  fine detail  large file size   Bit depth: amount of information captured  Greater bit depths  more accurate representation

International Atomic Energy Agency 9 Scanning at INIS – Capture & Optical Resolution INIS practice:  Standard Scanner Settings (for Plain b/w text):  bitonal (black & white)  300 dpi  Special Cases (colour, pictures):  greyscale and colour  200 – 300 dpi with 8 bit depth (256 colours/tones)  IMPORTANT: post-processing image compression needed to reduce file size  NEVER use colour settings to scan B/W documents 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 10 Quality Control - QC Retain:  value  utility  integrity of resources Verify:  quality  accuracy  consistency INIS verifies:  accuracy & completeness (eg. same number of pages?)  data integrity  correctness of metadata  form and validity  correct matching of metadata and image files  ‘checksum’ algorithm (authenticity & integrity of digitized files)  number & order of bytes (eg. after move, copy, transfer, burn)  visual inspection: resolution, colour, tone, appearance attn: changeable  light & monitors

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 11 Image Enhancement Definition: Any process applied to the raw scan to improve quality or legibility of the resource Image Enhancement Image Enhancement at INIS:  despeckling  deskewing  noise reduction  black border removal  colour and tone adjustment, etc.

International Atomic Energy Agency Quality Control and Image Enhancement (1)  Skewed? th ILO Meeting, 3-5 Nov 2008, Vienna, AT

International Atomic Energy Agency  Noisy (e.g. unnecessary dots)? Quality Control and Image Enhancement (2) th ILO Meeting, 3-5 Nov 2008, Vienna, AT

International Atomic Energy Agency  Black border ? Quality Control and Image Enhancement (3) th ILO Meeting, 3-5 Nov 2008, Vienna, AT

International Atomic Energy Agency Quality Control and Image Enhancement (4) IMPORTANT Paper Size must match document hard copy A4 ≠ Letter Size Text cut = RESCAN If noticed during QC of incoming PDF, INIS will request the Input Centre to resend the page th ILO Meeting, 3-5 Nov 2008, Vienna, AT

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 16 File Formats  Very important: Prefer ‘ non-proprietary ’ formats  Several standard file formats exist  different resolution, bit-depth, colour capabilities, etc. INIS Digital Collection: 1.From ‘Paper’ or ‘Microfiche’ to ‘Digital’:  Master images in TIFF Group IV (b/w), in JPEG (colour)  Majority Full-Text searchable PDF 2.Digital files received from INIS National Centres  PDF  Compression: JBIG2 (b/w), JPEG (colour)

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 17 Preservation Formats  PDF: open standard – official ISO :2008  PDF/A: Long-term archiving of electronic documents Creation of PDF documents whose visual appearance will remain the same over the course of time Official ISO standard: ISO :2005 Further development ongoing  INIS:  considers adopting PDF/A for efficient preservation long-term archival of the Agency’s and Member States’ nuclear information resources  pilot project in 2009

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 18 OCR – Optical Character Recognition Printed text  searchable as electronic text Primary objective for INIS digitization projects: creation of ‘searchable full text’ INIS:  major tool for mass production: ABBYY FineReader 8  ~ 98% accuracy: printed text in Latin & Cyrillic characters Satisfactory testing with Script and Arabic Characters:  Adobe Acrobat Professional 8.0: Chinese (Simplified), Japanese, Korean  ABBYY FineReader Pro9: Hebrew, Thai  VERUS™ Professional: Arabic

International Atomic Energy Agency Various OCR types  Typewritten  Hand print and cursive  Fraktur  Music scores  MICR (Magnetic Ink Character Recognition) th ILO Meeting, 3-5 Nov 2008, Vienna, AT

International Atomic Energy Agency OCRprocess 1 (no or wrong dictionary) OCR process 1 (no or wrong dictionary)

International Atomic Energy Agency OCRprocess 2 (proper dictionary) OCR process 2 (proper dictionary)

International Atomic Energy Agency Scanned (raster) Image Visual representation of the original document Image Layer Hidden Text Enables full-text search Extra information for search engines OCRvalue added OCR - value added th ILO Meeting, 3-5 Nov 2008, Vienna, AT errors

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 23 Storage of Digital Files  Mandatory: reliable & controlled environment  Storage of master files: high quality, industry standard devices, eg. CD-R, DVD, or other contemporary reliable media  Backup of master files: regularly, off-site, secure location  RAID: Redundant Array of Independent Disks  several drives act collectively as a single storage system  consider RAID  for large production environment INIS: THECUS N5200B PRO, 5x3,5" SATA Raid  5 disks  1 TB each  configured as local network data storage

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 24 Back-up and Off-Site Storage Create: regular back-ups of master files Store: remote from the original source in a secure location INIS:  1970 to 1997: ‘microfiche’  NCL full text: paper  microfiche  safe, long-term storage  INIS National Centres  full set of NCL microfiche  Austrian Central Lib. of Physics  From 1997: ‘digital’  NCL on CD: INIS Document Delivery Centres (National Centres)  Secure “off site” & back-up: Austrian Central Lib. of Physics  2008: microfiche to PDF  Austrian Central Lib. of Physics  INIS National Centres  INIS Online Database

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 25 Preservation Planning  Contents of digital files must remain ‘meaningful’ Different processes: 1. Refreshing: copy files from one storage medium to another  verify authenticity & integrity of the files (e.g. checksum) 2. Migration: transfer files from one HW & SW to another or from one computer generation to next generations  format-based: move files from ‘obsolete’ format to ‘new’ format 3. Emulation: re-create technical environment  maintain information about HW & SW = system reengineered INIS: Refreshing  CD to DVD (until 2007)  from 2008: copy to Thecus storage device  When PDF/A implemented: ‘migration’

International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 26 Metadata  Key role for digital resources:  Describe, process, manage, track, access, preserve INIS:  comprehensive ‘bibliographic’ metadata  describe the intellectual content of full text  bibliographic elements to identify & retrieve resources  INIS Database: digital resources with bibliographic metadata  Technical metadata for digital resources: automatic creation with PDF files  Future: more sophisticated approach with implementation of PDF/A

International Atomic Energy Agency Thank you for your attention! Your INIS Digital Preservation Team 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT