1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28.

Slides:



Advertisements
Similar presentations
Focus on Your Content, Not on Ingesting Your Content Terry Brady Applications Programmer Analyst Georgetown University Library
Advertisements

Publication Module using back end interface. Institution Data Entry Add Documents. Edit/Delete Documents that are added but not yet sent to Institution.
Tony Melvyn Product Manager OCLC Delivery Services Enhancement Overview for ALI, Academic Libraries of Indiana March 11, 2011.
Primo search New ACPE Library resource discovery and delivery service Catalogue and database items can now be retrieved from a single search interface.
Facilitator Notes: Explanation: NA More Information: NA Facilitation Tips: Discuss the general rules to be followed during the classroom session and explain.
E-Portfolio July2014 Managing Multi-source Feedback.
Module 2 Routine operations in Speednet Module objective At the end of the module, you will be able to – Manage routine operations in Speednet.
WASTE MANAGEMENT ©2010 SciQuest USA Confidential 1 Powered by RFx User Guide.
SIS – NBS Online Specimen Tracking System Training
Client Lunch & Learn (12:15). Association for Information & Image Management Nov Research Scanner Utilization.
The most comprehensive Oracle applications & technology content under one roof Procure to Pay Automation Bevan Wright Fusion5 NZ Oracle User Group.
DIGITIZATION OF LOCAL HISTORY COLLECTIONS IN PUBLIC LIBRARY “VLADISLAV PETKOVIC DIS” IN CHACHAK: DIGITIZATION OF THE NEWSPAPER “THE VOICE OF CHACHAK” Bogdan.
Administration & Workflow
NATIONAL LIBRARY OF MEDICINE PubMed Central Martha Fishel National Library of Medicine CENDI Meeting September 15, 2004.
Software to Manage EEP Vegetation Plot Data A design proposal Michael Lee January 31, 2011.
1 THE AUSTRALIAN NEWSPAPERS DIGITISATION PROGRAM (NDP) Rose Holley – Manager Newspaper Digitisation Program Presentation at the Association of Parliamentary.
PAWN V0.7 University of Maryland Institute for Advanced Computer Studies.
Swets Information Services SwetsWise Title Bank 13 th Panhellenic Libraries Conference th October Corfu.
1 Moving type from past to present: chronicling Australia through the digitisation of newspapers. Cathy Pilgrim – Director, Australian Newspaper Digitisation.
1 History in a digital world: helping communities access and explore their heritage through newspapers. Cathy Pilgrim – Director, Australian Newspaper.
Brief Overview of Data Processing of Afghanistan Household Listing, Pilot Census Results, Population and Housing Census and NRVA Survey Brief Overview.
1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008.
Records Survey and Retention Schedule Recertification 2011.
Background on USPS mail forwarding operations Overview of PARS
Solutions Summit 2014 Discrepancy Processing & Resolution Terri Sullivan.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
DAFv2 Hands on Lab 1. Agenda Administration Manager Administration Manager Roles, General Settings, Job-Types, Phases, Users, Workstations, Collections.
A Web Crawler Design for Data Mining
Registering a systematic review on PROSPERO. PROSPERO: International prospective register of systematic reviews Web based Free to register Free to search.
HOUSELISTING SCHEDULE NPR SCHEDULE HOUSEHOLD SCHEDULE.
WORKS WITH UK: +44 (0) MOBILE SOLUTIONS FOR PUBLISHERS Created by Videobuilder,
Mass digitisation? Astrid Verheusen Projectmanager Research & Development Division National library of the Netherlands LIBER-EBLIDA Workshop on Digitisation.
1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program
Administrator – Employee Overview September, 2011.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Information Management System “ Expert Profile Module" Information Management System “ Expert Profile Module" The Expert Profile module is an integrated.
1 Using Digital Technologies to unlock history for researchers. Rose Holley – Manager Newspaper Digitisation Program Australian Academy of the Humanities.
How to use TREx 1 Disclaimer: TREx under development, minor modifications may occur pending final release. Prepared for Education Service Center TREx Training.
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
1 System for Administration, Training, and Educational Resources for NASA Introduction for SATERN Administration.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
LOCKBOX PROCESSING A Study in Efficiency. Introduction & Services  Sunwest Bank internal Lockbox – 30+ years  AQ2 Technologies AQURIT 7 Solution  Types.
Information Management System “Good Practice Module" Information Management System “Good Practice Module" The Good Practice / Success Stories module is.
Statistical Expertise for Sound Decision Making Quality Assurance for Census Data Processing Jean-Michel Durr 28/1/20111Fourth meeting of the TCG - Lubjana.
Searches and Search Results 1 CONFIDENTIAL - LPS Real Estate Group Client Use Only.
GEtServices Purchasing Units & Materials Training For Suppliers Request.
Omeka Plugin Presentation: Contribution Plugin Greg Ferguson LIS 654 November 8, 2011.
Information Management System “Project Module" Information Management System “Project Module" The Project module is an integrated part of System. The back.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
How PromoServe can save you at least 25% of your admin costs Andy Barton
1 Overview of Progress Cathy Pilgrim – Director ANDP Presentation to NSLA 19 February 2009, National Library of Australia Australian Newspapers Digitisation.
Where are my files? Discoveries in establishing a digital archive workflow Sally McDonald Archivist/Librarian Western History/Genealogy, Denver Public.
1 THE AUSTRALIAN NEWSPAPERS DIGITISATION PROGRAM (NDP) Rose Holley – Manager Newspaper Digitisation Program Presentation for Spydus 31 October 2007, NLA,
Document Module Features Streamlines the control, routing and revision process for critical documents and records Controls documents in any format (Excel,
Orders and Invoices Supply Chain Platform: Rolls-Royce Training for Indirect Suppliers March 2016.
Purchase Orders May 2015 Supply Chain Platform: Rolls-Royce Training for Controls and Data Services Limited.
February 22, 2012 Jim Duran and Julia Stringfellow
1 Australian Newspapers Beta Summary of Usage and Feedback August – November 2008 ANPlan-ANDP Workshop,
1 Terminal Management System Usage Overview Document Version 1.1.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
SCC P2P – Collaboration Made Easy Contract Management training
Moving on : Repository Services after the RAE
Bentley Project Reel Digitization Bentley Historical Library t
GDSS – Digital Signature
Data Capture Process Stages
Presented by: Jeff Moore – Artsyl Technologies, Inc.
Presentation transcript:

1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28 November 2008

2 Requirements Manage, store and organise millions of digital newspaper pages behind the scenes. Manage, store and organise millions of digital newspaper pages behind the scenes. Manage the entire digitisation workflow from scanning to public delivery. Manage the entire digitisation workflow from scanning to public delivery.

3 How? Current NLA Digital Content Management System cannot cope with volume of digital newspapers or complex structure of newspapers Current NLA Digital Content Management System cannot cope with volume of digital newspapers or complex structure of newspapers No ‘off the shelf’ product available that meets requirements No ‘off the shelf’ product available that meets requirements Need the system now (March 2007) Need the system now (March 2007)

4 Solution NLA team to develop a software solution NLA team to develop a software solution Ensure the system uses open source software Ensure the system uses open source software System to be standalone and not bolted into other systems System to be standalone and not bolted into other systems Possibility of sharing system in future/providing as open source to other libraries Possibility of sharing system in future/providing as open source to other libraries

5 Software Development Agile method of development used Agile method of development used Modules designed in stages as required Modules designed in stages as required Stage 1 – Receipt and checking of scanned images Stage 1 – Receipt and checking of scanned images Stage 2 – Quality Assurance Modules Stage 2 – Quality Assurance Modules Stage 3 – Sending/receiving items from OCR Stage 3 – Sending/receiving items from OCR Stage 4 – System Administration and Statistics Stage 4 – System Administration and Statistics Stage 5 – Interface Design and Usability of System Stage 5 – Interface Design and Usability of System

6 Progress Software development March 2007 – June 2008 Software development March 2007 – June 2008 First module in use May 2007 First module in use May 2007 CMS in use for 18 months CMS in use for 18 months CMS in final stages of completion (Jan – June 2009) CMS in final stages of completion (Jan – June 2009) Further development required to enable acceptance of contributors content Further development required to enable acceptance of contributors content Simple user interface yet to be designed Simple user interface yet to be designed

7

8 Australian Newspapers CMS Screenshots of system follow and explanation of workflows. Screenshots of system follow and explanation of workflows.

9 Preparing for Digitisation Preparing for Digitisation Creation of digital images Creation of digital images Adding metadata and Quality Assurance Adding metadata and Quality Assurance Optical Character Recognition Optical Character Recognition Quality Assurance Quality Assurance Statistics and Admin Statistics and Admin Workflow Summary

10 Identify title to be digitised Identify title to be digitised Source master microfilm from owner Source master microfilm from owner Send master microfilm to scanning contractors Send master microfilm to scanning contractors Add title to Content Management System Add title to Content Management System Preparing for Digitisation

11 CMS - Add Title

12 Microfilm converted to digital images

13 Image Reception Images received from scanning contractor on LTO2 Tape Images received from scanning contractor on LTO2 Tape Tapes added to tape robot and extracted Tapes added to tape robot and extracted Reels automatically added to Content Management System Reels automatically added to Content Management System Reel details are checked Reel details are checked Images ingested into Content Management System Images ingested into Content Management System

14 CMS - Check Reel Details

15 CMS - Ingest Reels

16 CMS - Tasks 1 and 2 Task 1 – Add metadata (dates and page numbers) Task 1 – Add metadata (dates and page numbers) Supervisor reviews marked pages Supervisor reviews marked pages Task 2 – Define batches Task 2 – Define batches Task 2 – Resolve duplicates Task 2 – Resolve duplicates Task 2 – Create missing page targets Task 2 – Create missing page targets

17 Identify title to be worked on

18 Identify reel

19 CMS - Adding Metadata Date and Page Sequence number added Date and Page Sequence number added

20 Supervisor Review Supervisor reviews pages marked for attention Supervisor reviews pages marked for attention

21 CMS - Define Batches Batches defined by date Batches defined by date Each batch contains images Each batch contains images Batches are automatically assigned a number Batches are automatically assigned a number

22 CMS - Resolve Duplicates Duplicate pages compared and the best copy is selected Duplicate pages compared and the best copy is selected

23 Missing page targets are generated Missing page targets are generated Missing Pages

24 Optical Character Recognition (OCR) Complete batches are added to a tape Complete batches are added to a tape Tapes are generated and written Tapes are generated and written Tapes sent to OCR contractor Tapes sent to OCR contractor Contractor completes OCR processes Contractor completes OCR processes OCR data (not images) is returned via FTP OCR data (not images) is returned via FTP

25 CMS - Tapes Created Completed batches added to a tape Completed batches added to a tape

26 Optical Character Recognition (OCR) of pages and article zoning

27 OCR Data Reception (Automated process) OCR contractor advises NLA server that a batch has been completed OCR contractor advises NLA server that a batch has been completed NLA server downloads the batch NLA server downloads the batch Batch is ingested into Content Management System Batch is ingested into Content Management System Checks are performed on data validity Checks are performed on data validity QA Derivatives are generated QA Derivatives are generated Articles may now be searched, but are not yet publicly accessible Articles may now be searched, but are not yet publicly accessible

28 CMS - Batch information

29 Quality Assurance (QA) A random sample of Issues and Articles are checked A random sample of Issues and Articles are checked Volume and Issue number are checked for accuracy Volume and Issue number are checked for accuracy Sample articles are checked against agreed Quality Acceptance Criteria (QAC) Sample articles are checked against agreed Quality Acceptance Criteria (QAC) Error rates calculated against QAC on the fly Error rates calculated against QAC on the fly Supervisor checks final results Supervisor checks final results

30 CMS - Selecting the batch

31 Volume & Issue Number Check

32 Article checked against QAC

33 Re-keyed fields checked for accuracy

34 Supervisor checks results (auto or manual accept/reject)

35 QA Results Automated sent to supplier advising the result Automated sent to supplier advising the result s for rejected batches include a summary of errors s for rejected batches include a summary of errors Summary of errors saved for all batches Summary of errors saved for all batches Accepted batches are immediately accessible in public search system Accepted batches are immediately accessible in public search system

36 Batch History and details retained

37

38 Search or Browse articles within CMS

39 Statistics Stats for content received, QA’d and delivered to the public generated by the Content Management System Stats for content received, QA’d and delivered to the public generated by the Content Management System (Stats for usage of public search system collected using Google Analytics) (Stats for usage of public search system collected using Google Analytics)

40 CMS - Content Statistics

41 CMS - Work Statistics

42 Access Public access to digital newspapers is provided through Australian Newspapers Search and Delivery System Public access to digital newspapers is provided through Australian Newspapers Search and Delivery System Users can search or browse newspapers Users can search or browse newspapers Search results can be refined using filters Search results can be refined using filters Users can browse by Newspaper title or Date. Users can browse by Newspaper title or Date.

43