These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,

Slides:



Advertisements
Similar presentations
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Advertisements

Creating textual resources Printed documents. Content of this session Types of printed documents Methods of capture Some examples.
Opening Up Worldwide Access to Key BC Historical Documents: BC Historical Newspapers Mike Conroy, Community Digital Projects Analyst UBC Library.
The UM Libraries’ Frost Concert Archive Documenting the Performance History of the University of Miami Frost School of Music Amy Strickland University.
Services Digitisation & Content Management. 600 People – India.
OCLC Online Computer Library Center Microfilmed Newspapers: Selection for Digitization Success ALA June 25, 2006 OCLC Preservation Service Centers.
Illinois Newspapers: Anna FitzSimmons, Amy Sullivan, Tracy Nectoux, Nathan Yarasavage Preparing Our Past for the Future.
ALA Annual June 2008 CONTENTdm in ConTEXT Geri Ingram OCLC Digital Collection Services Manager, Customer Services.
Colin Potter and Caroline Foxon – Sunshine Coast Regional Library Service
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
PDF (Portable Document Format) for Digital Preservation and Delivery John Laurie Digital Initiatives Librarian The University of Auckland Library National.
DIGITIZATION OF LOCAL HISTORY COLLECTIONS IN PUBLIC LIBRARY “VLADISLAV PETKOVIC DIS” IN CHACHAK: DIGITIZATION OF THE NEWSPAPER “THE VOICE OF CHACHAK” Bogdan.
Newspaper Preservation through Collaboration and Communication The Texas Digital Newspaper Program By Ana Krahmer & Mark Phillips University of North Texas.
NATIONAL LIBRARY OF MEDICINE PubMed Central Martha Fishel National Library of Medicine CENDI Meeting September 15, 2004.
1 History in a digital world: helping communities access and explore their heritage through newspapers. Cathy Pilgrim – Director, Australian Newspaper.
ANNO – AustriaN Newspapers Online A digitisation initiative of the Austrian National Library.
JSTOR & OCR - A Case Study Kiffany Francis. What is JSTOR? “JSTOR is a not-for- profit organization with a dual mission to create and maintain a trusted.
Sai Deng, Metadata Catalog Librarian, Wichita State University Libraries Tse-Min Wang, Graduate Student in CS, Wichita State University Digital Imaging.
1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Port Townsend Leader Historical Newspaper Archive Keith Darrock.
OCLC Online Computer Library Center Digitization Lifecycle Solutions: An Integrated Approach ALA Annual 2005.
Getting ALL Your Newspaper Data into CONTENTdm: The New Flex Loader CONTENTdm Western Users Group June 3, 2010.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Looking back, moving forward: Examining the impact of digitizing the ACS archive 232nd ACS National Meeting September 13, 2006 David Martinsen, Adam Chesler.
Lakeland Click arrow to advance show. Click on the “A” under “Listed By Name.” (“A” for Academic Search Database)
2002 September -- ejk/UF RESEARCH TOPICS Web-Interface Performance DTD Extensibility Imaging Distillation Other topics?
Digital Archiving Kathryn Lybarger November 6, 2008.
Kentuckiana Digital Library: A Digital Archive of Kentucky History Eric Weig Head, Digital Programs Special Collections & Digital Programs Division University.
Mark Sullivan Digital Library of the Caribbean. Imaging  Imaging Theory & Specifications  Recommended Equipment and Software 2 dLOC Training (7/29/2013)
Dominic Bordelon and Adam St.Pierre.  Based upon The Advocate Obituary Index  Obtained obituaries from microfilm to make full-text searchable records.
1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program
National Park Service U.S. Department of the Interior Resource Information Management Division National Information Systems Center Office of the Chief.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
The Portal to Texas History: Harnessing Technology to Enable Collaboration with Small Museums and Libraries CNI, December 6, 2005 Cathy Nelson Hartman.
1 Using Digital Technologies to unlock history for researchers. Rose Holley – Manager Newspaper Digitisation Program Australian Academy of the Humanities.
Digitizing Aloha: Using Information Technology to Preserve and Present the History and Culture of Hawai'i Bob Schwarzwalder Assistant University Librarian,
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
Digitizing Photographs For Sustainable Heritage Workshop, June 12-15, 2014 By Steven Bingo Project Archivist, Washington State University.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitization An Introduction to Digitization Projects and to Using the Montana Memory Project.
Digital Image Capture of Musical Scores Jenn Riley, Indiana University Digital Library Program Ichiro Fujinaga, McGill University.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Digitizing Newspapers with the Quartz A0 Scanner Sarah Lynn Fisher Project Coordinator, NDNP Ana Krahmer Coordinator, TDNP University of North Texas Libraries.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Nikola Tesla Museum Clipping Library Saša Malkov Nenad Mitić Žarko Mijajlović 3 rd SEEDI Int.Conf. Cetinje, Montenegro 14. September 2007.
Calum Dow Thurs 12 th November Our Partners…
Memory Masters Preserving Digitized Histories— for today, for tomorrow, and for the future This project is made possible by a grant from the federal Institute.
Locating News Resources 8 Mar Outline Mastering E-newspapers –Factiva –WiseNews –SCMP Archive –ProQuest Historical Newspapers: South China Morning.
1 THE AUSTRALIAN NEWSPAPERS DIGITISATION PROGRAM (NDP) Rose Holley – Manager Newspaper Digitisation Program Presentation for Spydus 31 October 2007, NLA,
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
WORKING WITH VENDORS: THE UCONN “DAILY CAMPUS” STUDENT NEWSPAPER DIGITAL REFORMATTING CASE STUDY DIGITAL COMMONWEALTH ANNUAL CONFERENCE MAY 1, 2013 DEVENS.
DIGITIZATION IN THEORY AND PRACTICE WEBSITE: Helen Nneka Okpala Presentation done at University of.
Post-ALA Annual July 11, 2008 Pre-Conference Workshop: The Care and Feeding of Compound Objects Geri Ingram OCLC Digital Collection Services Manager, User.
CONTENTdm A proven solution September A complete digital collection management software solution Stores, manages and provides access for all digital.
1 Australian newspaper digitisation program Bronwyn Lee National Library of Australia Presentation to 13 th IASI World Congress – 13 March 2009 Sports.
February 22, 2012 Jim Duran and Julia Stringfellow
Digitization of The Increase A. Lapham Papers Collection
Locating News Resources
The Basics of Creating Accessible Documents for ILL Practitioners
Migrating to Unified Content
RESEARCH TOPICS Web-Interface Performance DTD Extensibility Imaging
My Program Session Title
Current Challenges in Digitization
Presentation transcript:

These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem, PA CALIFA – January 9, 2009

Objective Learn about the workflow to turn historical newspapers into a searchable collection online - starting from preservation microfilm or original paper. Prepare for the key decisions that lead to success and help you define your vision and expectations.

Outline Scanning workflow Metadata decisions Access options Your digital newspaper project

Getting Started Develop a vision & plan (Goal, Scope, Budget/Funding, Stakeholders…) Select content (titles, date range, page count, quality/completeness, copyright, …) Select format to digitize: Film or Original? Assess film quality (imaging, collation, film generation) No film available? Consider analog preservation as part of digital project

Film generations Archive Master 1 st generation Print Master 2nd generation Service Copies 3rd generation Best choice

Line that separates columns Heavy scratches Example: Heavily scratched Service Copy

Lost text Example: Uneven lighting – lost text

Acetate or Polyester? Acetate: not stable for long-term preservation! (Caring for your film may be a byproduct of a digital project) PolyesterAcetate: light blocked

Scanning from Paper Key points: Bound/disbound Collation?! Conservation? Cost of imaging Color!

The Roadmap IMAGE QUALITY METADATA QUALITY ACCESS QUALITY SelectionSelection

Section 1: Digitization Digitization options are relatively simple (1bit vs 8bit, film vs original) Recommended: ppi Best quality digital image, typically the master file is a TIFF file.

Section 1: Digitization 1bit (= bitonal) 8bit (=grayscale)

Section 2: Content Conversion Content Conversion is major intersection – and it’s tied to your vision for access (presentation system) Determine what digital building blocks are needed for the planned presentation system: METADATA CREATION/COLLECTION (incl. text recognition - OCR) JPEG/JPEG2000 XML (METS/ALTO or other) PDF

OCR - Optical Character Recognition simple OCR (uncorrected) vs. enhancements (Headline/byline correction, article classification, text correction)

OCR – the rocky road to “99%” (?)  Input: “photo” of the page  Zoning: Columns & reading order  Analyze characters/words – Recognition  All CAPS fonts (major headlines) yield low accuracy OCR is cost effective tool to gain “full-text” searchability.

Main Choices for Content Conversion Image Only approach (aka digital microfilm) vs. PDF based vs. integrated model where page images and metadata are integrated via a presentation system.

PDF based presentation PRO Common format OCR Multi-page Free Reader Printing CON Slow Not suitable for 8bit Secondary searches Not scaleable Hidden searchable text

Integrated Presentation: Page level Integrated Presentation Page Level Access Example: ContentDM FEATURES: Bitonal or gray Search across collections Primary hits in JPEG 2000 Clipping tool Rich metadata, not only from OCR, but also Dublin Core

Integrated Presentation: Article level With article segmentation

Section 3: Presentation Digital Newspaper Collection go live! Page Level Access in CONTENTdm: AccessPA group license: Lycoming College, PAwww.accesspadigital.org Lycoming College, PA Wissahickon Valley PL, PA – Ambler Gazette Article Level Access in CONTENTdm: Seattle Spectator

Outlook – The Challenges Analog preservation (film) vs. electronic preservation: File sizes, costs of storage; scanning with digital preservation in mind creates loads of data “If you give the mouse a cookie….” (aka setting expectations) Regaining full-text logic from a photograph of a page; Newspapers are oversize, portrait format, screen is landscape. Zooming will improve legibility, but will not show full page at same time. Access without DAM is not practical, but has costs associated

Resources National Digital Newspaper Program (NDNP) (partnership of the Library of Congress and the National Endowment for the Humanities)

Questions? Today: Today: Break-out sessions “Tomorrow”: “Tomorrow”: Contact Christine Guenther OCLC Preservation Service Center Bethlehem, PA Thank you!