Erin Kinney, Wyoming State Library
Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions, obituary and ILL requests
Digitize all newspapers published in Wyoming * and make them easily accessible over the internet. *Preserved on microfilm at the Wyoming State Archives Project
1,436 microfilm reels 850,000 full pages 8,000,000 clippings
Funding Applied for a 2006 NDNP grant, which was not funded. The Wyoming State Legislature appropriated $940,000 to the State Library in FY Requested additional funding in FY09-10 which was later denied.
Wyoming State Library Wyoming State Archives University of Wyoming American Heritage Center Wyoming Press Association Wyoming State Historical Society The Partners
Partners Wyoming State Archives provided copies of master microfilm reels Wyoming State Historical Society provided metadata workers The company picked to do the work was PTFS from Bethesda, MD
Why PTFS? Expertise: people, process, software, hardware More than ten years imaging experience All media types, qualities, formats Many hardware and software configurations R&D Development of an archiving system has helped PTFS perfect imaging capabilities
Technical Requirements All text searchable Content management system (CMS) with a web interface, and a customizable thesaurus Very powerful search engine
Technical Standards Project followed 2007 NDNP best practices High Accuracy OCR & Auto-Zoning 400 dpi grayscale Enhanced metadata Articles, legal/land notices, and advertisements clipped
Digitization Processes Receive newspaper microfilm reels; Inventory control Categorize, sort, prepare Scan Microfilm at 400 dpi Export to USB External Drive Enhance metadata Post Image Processing Data formatting for system QC/QA OCR images ArchivalWare Approval Server Zone, crop & de- skew full images Create image/text PDFs: full page & clippings
FileApprox. Size 1 reel strip image (~1000 pages)50 GB 2 page up TIFF images50 MB Archive TIFF images30 MB Uncompressed page level PDFs5 MB Compressed page level PDFs kb Clipping PDFs (uncompressed) kb Image Sizes
PTFS Confidential Challenges Image Quality, OCR Accuracy Difficult to achieve high OCR accuracy Original text quality varies : yellowed paper, bleed through, faded, bound page curvature Microfilm quality varies Dark borders, washed out sections, out of focus Misc: Scratches, Thumbs, Tape, Staples!! Grayscale best, but results in large files sizes
PTFS Confidential Challenges Rules for zoning (for clipping) are complicated to design and execute Newspaper formats vary widely from title to title & year to year Determine zoning rules and consistently follow
PTFS Confidential Challenges “NDNP Ready” Imaging & metadata standards, XML packets Massive storage requirements—many, many terabytes of storage File types: TIFF and PDF Browse hierarchy Determines organization of collection Supports logical presentation Page & clipping relationships
PTFS Confidential Solutions Browse Hierarchy Organized by county/city, then newspaper title, year, month, date Pages and clippings will be presented together Page & clipping relationship Clippings linked to pages Archive quality image location Archive quality images transported via USB external drive and backed up to tape (twice!)
Lessons Learned Lots of open communication, between all partners, contractor and sub-contractors Start looking for money early, but make sure you have your ducks in a row. Don’t get discouraged.
Lessons Learned Think of the long term implications of decisions made at the beginning of the project Decisions made at the beginning of the project can have unforeseen, and often huge, implications.
Opportunities Fill in gaps in coverage Orphan papers—publishers and even towns that no longer exist “New to us” newspaper titles that haven’t been located and microfilmed yet
Wyoming Newspaper Project Contact Erin Kinney, Digital Initiatives Librarian