Download presentation
Presentation is loading. Please wait.
Published byJordan Clarke Modified over 8 years ago
1
International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder, G. St-Pierre, Y. Reynaud-Pulido, T. Kalapurackal Database Production and Imaging Group, INIS Unit INIS & NKM Section
2
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 2 INIS Mission: preservation of nuclear knowledge serving as a reservoir of nuclear information provision of quality information services promotion of a culture of “information and knowledge sharing“ Digital Preservation at INIS
3
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 3 INIS Non-Conventional Literature (NCL ) Production of the INIS electronic Full Text Database Digital Preservation Activities Digitization projects at IAEA at Member States Digital Preservation Digital Preservation at INIS
4
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 4 Objectives: Consistent, high-level of image quality Interoperability and accessibility of digitized resources Long-term preservation of digital resources for future generations Member States IAEA Develop good practices for digital preservation ‘Overview of INIS Digital Preservation Practices ’: INIS Information Letter No. 253 & Attachment (2008-10-03) http://www.iaea.org/inisnkm/marea/restricted/restrictedpdf/2008/infoletter253_attachment.pdf Digital Preservation Digital Preservation at INIS
5
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 5 INIS principles and workflow base on Cornell University’s digital imaging tutorial: http://www.library.cornell.edu/preservation/tutorial/index.html available in English, French, Spanish Digital Preservation Principles INIS Digital Preservation Principles
6
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 6 INIS Workflow Document Benchmarking Document Preparation Scanning Quality Control Image Enhancement Metadata Creation/Validation Export including Compression Completeness Check Back-up Post-processing Storage and dissemination
7
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 7 Benchmarking & Document Preparation Benchmarking: Adequately capture the ‘original’ content in digital form? Physical format & condition meets digitizing requirements? What is the type of material to be digitized? Which resolution? At which bit-depth? Which compression parameters? Estimated accuracy level for OCR? Other considerations? Preparation : Physically (unbind, remove staples/clips, etc.) Structurally (add/remove barcodes, separate chapters, parts, etc.) Characteristics of paper (eg. size, thick, glossy/mat, condition)
8
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 8 Scanning – Capture Modes & Optical Resolution Capture modes: depends on the physical form of original Bitonal: 1 bit/pixel – black & white (printed text) Greyscale: 8 bits/pixel – 256 grey shades (black & white photographs) Colour: 24 bits/pixel – 16 million colours & grey shades (continuous tone & colour) Optical Resolution: “dots per inch” (DPI) or “pixels per inch” (PPI) High resolution fine detail large file size Bit depth: amount of information captured Greater bit depths more accurate representation
9
International Atomic Energy Agency 9 Scanning at INIS – Capture & Optical Resolution INIS practice: Standard Scanner Settings (for Plain b/w text): bitonal (black & white) 300 dpi Special Cases (colour, pictures): greyscale and colour 200 – 300 dpi with 8 bit depth (256 colours/tones) IMPORTANT: post-processing image compression needed to reduce file size NEVER use colour settings to scan B/W documents 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
10
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 10 Quality Control - QC Retain: value utility integrity of resources Verify: quality accuracy consistency INIS verifies: accuracy & completeness (eg. same number of pages?) data integrity correctness of metadata form and validity correct matching of metadata and image files ‘checksum’ algorithm (authenticity & integrity of digitized files) number & order of bytes (eg. after move, copy, transfer, burn) visual inspection: resolution, colour, tone, appearance attn: changeable light & monitors
11
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 11 Image Enhancement Definition: Any process applied to the raw scan to improve quality or legibility of the resource Image Enhancement Image Enhancement at INIS: despeckling deskewing noise reduction black border removal colour and tone adjustment, etc.
12
International Atomic Energy Agency Quality Control and Image Enhancement (1) Skewed? 12 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
13
International Atomic Energy Agency Noisy (e.g. unnecessary dots)? Quality Control and Image Enhancement (2) 13 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
14
International Atomic Energy Agency Black border ? Quality Control and Image Enhancement (3) 14 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
15
International Atomic Energy Agency Quality Control and Image Enhancement (4) IMPORTANT Paper Size must match document hard copy A4 ≠ Letter Size Text cut = RESCAN If noticed during QC of incoming PDF, INIS will request the Input Centre to resend the page 15 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
16
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 16 File Formats Very important: Prefer ‘ non-proprietary ’ formats Several standard file formats exist different resolution, bit-depth, colour capabilities, etc. INIS Digital Collection: 1.From ‘Paper’ or ‘Microfiche’ to ‘Digital’: Master images in TIFF Group IV (b/w), in JPEG (colour) Majority Full-Text searchable PDF 2.Digital files received from INIS National Centres PDF Compression: JBIG2 (b/w), JPEG (colour)
17
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 17 Preservation Formats PDF: open standard – official ISO 32000-1:2008 PDF/A: Long-term archiving of electronic documents Creation of PDF documents whose visual appearance will remain the same over the course of time Official ISO standard: ISO 19005-1:2005 Further development ongoing http://www.pdfa.orghttp://www.pdfa.org INIS: considers adopting PDF/A for efficient preservation long-term archival of the Agency’s and Member States’ nuclear information resources pilot project in 2009
18
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 18 OCR – Optical Character Recognition Printed text searchable as electronic text Primary objective for INIS digitization projects: creation of ‘searchable full text’ INIS: major tool for mass production: ABBYY FineReader 8 ~ 98% accuracy: printed text in Latin & Cyrillic characters Satisfactory testing with Script and Arabic Characters: Adobe Acrobat Professional 8.0: Chinese (Simplified), Japanese, Korean ABBYY FineReader Pro9: Hebrew, Thai VERUS™ Professional: Arabic
19
International Atomic Energy Agency Various OCR types Typewritten Hand print and cursive Fraktur Music scores MICR (Magnetic Ink Character Recognition) 19 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
20
International Atomic Energy Agency OCRprocess 1 (no or wrong dictionary) OCR process 1 (no or wrong dictionary)
21
International Atomic Energy Agency OCRprocess 2 (proper dictionary) OCR process 2 (proper dictionary)
22
International Atomic Energy Agency Scanned (raster) Image Visual representation of the original document Image Layer Hidden Text Enables full-text search Extra information for search engines OCRvalue added OCR - value added 22 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT errors
23
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 23 Storage of Digital Files Mandatory: reliable & controlled environment Storage of master files: high quality, industry standard devices, eg. CD-R, DVD, or other contemporary reliable media Backup of master files: regularly, off-site, secure location RAID: Redundant Array of Independent Disks several drives act collectively as a single storage system consider RAID for large production environment INIS: THECUS N5200B PRO, 5x3,5" SATA Raid 5 disks 1 TB each configured as local network data storage
24
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 24 Back-up and Off-Site Storage Create: regular back-ups of master files Store: remote from the original source in a secure location INIS: 1970 to 1997: ‘microfiche’ NCL full text: paper microfiche safe, long-term storage INIS National Centres full set of NCL microfiche Austrian Central Lib. of Physics From 1997: ‘digital’ NCL on CD: INIS Document Delivery Centres (National Centres) Secure “off site” & back-up: Austrian Central Lib. of Physics 2008: microfiche to PDF Austrian Central Lib. of Physics INIS National Centres INIS Online Database
25
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 25 Preservation Planning Contents of digital files must remain ‘meaningful’ Different processes: 1. Refreshing: copy files from one storage medium to another verify authenticity & integrity of the files (e.g. checksum) 2. Migration: transfer files from one HW & SW to another or from one computer generation to next generations format-based: move files from ‘obsolete’ format to ‘new’ format 3. Emulation: re-create technical environment maintain information about HW & SW = system reengineered INIS: Refreshing CD to DVD (until 2007) from 2008: copy to Thecus storage device When PDF/A implemented: ‘migration’
26
International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 26 Metadata Key role for digital resources: Describe, process, manage, track, access, preserve INIS: comprehensive ‘bibliographic’ metadata describe the intellectual content of full text bibliographic elements to identify & retrieve resources INIS Database: digital resources with bibliographic metadata Technical metadata for digital resources: automatic creation with PDF files Future: more sophisticated approach with implementation of PDF/A
27
International Atomic Energy Agency Thank you for your attention! Your INIS Digital Preservation Team 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.