Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder,

Similar presentations


Presentation on theme: "International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder,"— Presentation transcript:

1 International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder, G. St-Pierre, Y. Reynaud-Pulido, T. Kalapurackal Database Production and Imaging Group, INIS Unit INIS & NKM Section

2 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 2 INIS Mission: preservation of nuclear knowledge serving as a reservoir of nuclear information provision of quality information services promotion of a culture of “information and knowledge sharing“ Digital Preservation at INIS

3 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 3 INIS Non-Conventional Literature (NCL )  Production of the INIS electronic Full Text Database Digital Preservation Activities  Digitization projects  at IAEA  at Member States Digital Preservation Digital Preservation at INIS

4 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 4 Objectives: Consistent, high-level of image quality Interoperability and accessibility of digitized resources Long-term preservation of digital resources for future generations  Member States  IAEA Develop good practices for digital preservation  ‘Overview of INIS Digital Preservation Practices ’: INIS Information Letter No. 253 & Attachment (2008-10-03) http://www.iaea.org/inisnkm/marea/restricted/restrictedpdf/2008/infoletter253_attachment.pdf Digital Preservation Digital Preservation at INIS

5 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 5  INIS principles and workflow base on Cornell University’s digital imaging tutorial: http://www.library.cornell.edu/preservation/tutorial/index.html  available in English, French, Spanish Digital Preservation Principles INIS Digital Preservation Principles

6 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 6 INIS Workflow  Document Benchmarking  Document Preparation  Scanning  Quality Control  Image Enhancement  Metadata Creation/Validation  Export including Compression  Completeness Check  Back-up  Post-processing  Storage and dissemination

7 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 7 Benchmarking & Document Preparation Benchmarking:  Adequately capture the ‘original’ content in digital form?  Physical format & condition meets digitizing requirements?  What is the type of material to be digitized?  Which resolution?  At which bit-depth?  Which compression parameters?  Estimated accuracy level for OCR?  Other considerations? Preparation :  Physically (unbind, remove staples/clips, etc.)  Structurally (add/remove barcodes, separate chapters, parts, etc.)  Characteristics of paper (eg. size, thick, glossy/mat, condition)

8 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 8 Scanning – Capture Modes & Optical Resolution Capture modes: depends on the physical form of original  Bitonal: 1 bit/pixel – black & white (printed text)  Greyscale: 8 bits/pixel – 256 grey shades (black & white photographs)  Colour: 24 bits/pixel – 16 million colours & grey shades (continuous tone & colour) Optical Resolution:  “dots per inch” (DPI) or “pixels per inch” (PPI)  High resolution  fine detail  large file size   Bit depth: amount of information captured  Greater bit depths  more accurate representation

9 International Atomic Energy Agency 9 Scanning at INIS – Capture & Optical Resolution INIS practice:  Standard Scanner Settings (for Plain b/w text):  bitonal (black & white)  300 dpi  Special Cases (colour, pictures):  greyscale and colour  200 – 300 dpi with 8 bit depth (256 colours/tones)  IMPORTANT: post-processing image compression needed to reduce file size  NEVER use colour settings to scan B/W documents 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

10 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 10 Quality Control - QC Retain:  value  utility  integrity of resources Verify:  quality  accuracy  consistency INIS verifies:  accuracy & completeness (eg. same number of pages?)  data integrity  correctness of metadata  form and validity  correct matching of metadata and image files  ‘checksum’ algorithm (authenticity & integrity of digitized files)  number & order of bytes (eg. after move, copy, transfer, burn)  visual inspection: resolution, colour, tone, appearance attn: changeable  light & monitors

11 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 11 Image Enhancement Definition: Any process applied to the raw scan to improve quality or legibility of the resource Image Enhancement Image Enhancement at INIS:  despeckling  deskewing  noise reduction  black border removal  colour and tone adjustment, etc.

12 International Atomic Energy Agency Quality Control and Image Enhancement (1)  Skewed? 12 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

13 International Atomic Energy Agency  Noisy (e.g. unnecessary dots)? Quality Control and Image Enhancement (2) 13 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

14 International Atomic Energy Agency  Black border ? Quality Control and Image Enhancement (3) 14 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

15 International Atomic Energy Agency Quality Control and Image Enhancement (4) IMPORTANT Paper Size must match document hard copy A4 ≠ Letter Size Text cut = RESCAN If noticed during QC of incoming PDF, INIS will request the Input Centre to resend the page 15 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

16 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 16 File Formats  Very important: Prefer ‘ non-proprietary ’ formats  Several standard file formats exist  different resolution, bit-depth, colour capabilities, etc. INIS Digital Collection: 1.From ‘Paper’ or ‘Microfiche’ to ‘Digital’:  Master images in TIFF Group IV (b/w), in JPEG (colour)  Majority Full-Text searchable PDF 2.Digital files received from INIS National Centres  PDF  Compression: JBIG2 (b/w), JPEG (colour)

17 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 17 Preservation Formats  PDF: open standard – official ISO 32000-1:2008  PDF/A: Long-term archiving of electronic documents Creation of PDF documents whose visual appearance will remain the same over the course of time Official ISO standard: ISO 19005-1:2005 Further development ongoing  http://www.pdfa.orghttp://www.pdfa.org INIS:  considers adopting PDF/A for efficient preservation long-term archival of the Agency’s and Member States’ nuclear information resources  pilot project in 2009

18 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 18 OCR – Optical Character Recognition Printed text  searchable as electronic text Primary objective for INIS digitization projects: creation of ‘searchable full text’ INIS:  major tool for mass production: ABBYY FineReader 8  ~ 98% accuracy: printed text in Latin & Cyrillic characters Satisfactory testing with Script and Arabic Characters:  Adobe Acrobat Professional 8.0: Chinese (Simplified), Japanese, Korean  ABBYY FineReader Pro9: Hebrew, Thai  VERUS™ Professional: Arabic

19 International Atomic Energy Agency Various OCR types  Typewritten  Hand print and cursive  Fraktur  Music scores  MICR (Magnetic Ink Character Recognition) 19 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT

20 International Atomic Energy Agency OCRprocess 1 (no or wrong dictionary) OCR process 1 (no or wrong dictionary)

21 International Atomic Energy Agency OCRprocess 2 (proper dictionary) OCR process 2 (proper dictionary)

22 International Atomic Energy Agency Scanned (raster) Image Visual representation of the original document Image Layer Hidden Text Enables full-text search Extra information for search engines OCRvalue added OCR - value added 22 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT errors

23 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 23 Storage of Digital Files  Mandatory: reliable & controlled environment  Storage of master files: high quality, industry standard devices, eg. CD-R, DVD, or other contemporary reliable media  Backup of master files: regularly, off-site, secure location  RAID: Redundant Array of Independent Disks  several drives act collectively as a single storage system  consider RAID  for large production environment INIS: THECUS N5200B PRO, 5x3,5" SATA Raid  5 disks  1 TB each  configured as local network data storage

24 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 24 Back-up and Off-Site Storage Create: regular back-ups of master files Store: remote from the original source in a secure location INIS:  1970 to 1997: ‘microfiche’  NCL full text: paper  microfiche  safe, long-term storage  INIS National Centres  full set of NCL microfiche  Austrian Central Lib. of Physics  From 1997: ‘digital’  NCL on CD: INIS Document Delivery Centres (National Centres)  Secure “off site” & back-up: Austrian Central Lib. of Physics  2008: microfiche to PDF  Austrian Central Lib. of Physics  INIS National Centres  INIS Online Database

25 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 25 Preservation Planning  Contents of digital files must remain ‘meaningful’ Different processes: 1. Refreshing: copy files from one storage medium to another  verify authenticity & integrity of the files (e.g. checksum) 2. Migration: transfer files from one HW & SW to another or from one computer generation to next generations  format-based: move files from ‘obsolete’ format to ‘new’ format 3. Emulation: re-create technical environment  maintain information about HW & SW = system reengineered INIS: Refreshing  CD to DVD (until 2007)  from 2008: copy to Thecus storage device  When PDF/A implemented: ‘migration’

26 International Atomic Energy Agency 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT 26 Metadata  Key role for digital resources:  Describe, process, manage, track, access, preserve INIS:  comprehensive ‘bibliographic’ metadata  describe the intellectual content of full text  bibliographic elements to identify & retrieve resources  INIS Database: digital resources with bibliographic metadata  Technical metadata for digital resources: automatic creation with PDF files  Future: more sophisticated approach with implementation of PDF/A

27 International Atomic Energy Agency Thank you for your attention! Your INIS Digital Preservation Team 34 th ILO Meeting, 3-5 Nov 2008, Vienna, AT


Download ppt "International Atomic Energy Agency Digital Preservation Session Tue, 4 Nov 2008 34 th INIS Liaison Officers’ Meeting 3-5 Nov 2008, Vienna, Austria S. Rieder,"

Similar presentations


Ads by Google