Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program

Similar presentations


Presentation on theme: "1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program"— Presentation transcript:

1 1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program http://www.nla.gov.au/ndphttp://www.nla.gov.au/ndp rholley@nla.gov.aurholley@nla.gov.au Australian Media Traditions Conference 23 November 2007, Charles Sturt University, Bathurst

2 2 Status of the Program November 2006 Minister for Arts and Sports approval Budget approval -$8 million for 3 million pages over 4 years Contracts signed with digitisation suppliers April 2007 program pilot phase commences

3 3 Content and Coverage National Content Initially a title from each state Focus on major titles from each state first Anticipated that ‘regional’ titles may be contributed later Coverage: published between 1803 – 1954 (out of copyright) West Australian Northern Territory Times Courier Mail Advertiser Sydney Gazette Argus Mercury Canberra Times

4 4 First Newspaper First page of first Australian newspaper ever published The Sydney Gazette and New South Wales Advertiser Saturday March 5 1803

5 5 Through 150 years Up to 1954 (when Copyright applies), and later if agreement with publishers. The Argus 22 August 1945

6 6 Relationship - ANPLAN Website: http://www.nla.gov.au/anplan/

7 7 Keep Up to Date with Progress Website: http://www.nla.gov.au/ndp/

8 8 National Help NLA working with State and Territory Libraries as part of ANPLAN. Libraries suggest titles and dates and provide microfilm for digitising. ANPLAN members and other stakeholders will provide feedback on the search and delivery prototype. Developing model for national contribution of regional newspapers.

9 9 Process in brief National sourcing of selected newspaper microfilm masters. Masters scanned by Contractor, Sydney to tiff files. NLA perform quality assurance, add metadata. Contractor, India process tiff files - OCR, zoning, xml markup. NLA QA files, ingest to system, create derivatives for delivery.

10 10 Logistics Australia (State Capitals – Sydney/Canberra) USA (Virginia) - India (Hyderabad, Chennai)

11 11 6 Month Progress IT Infrastructure and storage implemented at NLA Content management and ingest software developed by NLA to support workflow Quality assurance and production software developed by US/India contractor Pilot data sent to contractors to test workflows, systems and software against agreed project spec.

12 12 Next 6 months Acceptance of pilot data then commence production phase (3 million pages) Development of search and delivery prototype Public launch of service with a good body of content in 2008 Progressive addition of content – national program ongoing

13 13 Technology – internal NLA Old newspapers being processed and delivered using latest digital technology NLA developing in house: –Ingest and storage system –Workflow and content management system including quality assurance module –Search and delivery system NLA providing: –System Infrastructure (storage, backup, disaster recovery)

14 14 Infrastructure and Storage Online Storage – 70 TB: Working space for images in processing 40TB for 1 million pages Search and delivery derivatives 30TB for 3 million pages XML files, database systems and indexes 1 TB Offline Storage – unlimited for master images on tape.

15 15 Establishing Workflows

16 16 Technology - external Scanning microfilm using Flexscan/Eclipse scanner and latest software (nextstar) from NextScan www.nextscan.com www.nextscan.com 20,000 pages a week.

17 17 Scanning Contractor

18 18 Digital Images returned to NLA

19 19 Quality Assurance at NLA Use 2 widescreen monitors placed vertically. Can view complete page within context of issue. Add metadata, sort out missing and duplicate pages within an issue. Prepare batches to send for OCR.

20 20 Metadata

21 21 Page verification

22 22

23 23 Technology - external Software developed to: Zone areas and articles on a page Flag continuing articles across multiple pages Categorise articles on a page OCR text on a page Re-key headings and first 4 lines of text. Deliver XML files (ALTO) and METS/MODS files.

24 24 India Facility - Hyderabad

25 25

26 26 Quality Assurance

27 27 OCR Accuracy

28 28 Batch reporting

29 29 Acceptance Criteria

30 30 Prototype Development Under discussion: Derivative sizes and zoom technology testing Search and Browse features Results and refinement of results User interaction with source (web 2.0) Interface design

31 31 Digital Newspaper Searching Newspapers full text searchable Image captions searchable Search across multiple papers e.g. by persons name. Refine searching by: –Date –Newspaper title –State published

32 32 Refine search by categories News Advertising Birth Death Marriage notices Obituaries Editorial commentary and letters Shipping News Arts and leisure Detailed lists, results, guides

33 33 Search Illustrations Categorised as: Photo Cartoon Map Graph Illustration Captions searchable Canberra Times 26 July 1928 page 6

34 34 Browsing and Viewing Browse papers page by page Zoom in and out of image –to read small text –to view context of article within page layout Print article or entire page or issue

35 35 Zoom technology

36 36 Testing derivative sizes and zooming

37 37 Prototype wireframe

38 38 Other features Under discussion: OCR correction by users Personal annotation of articles by users Tagging results Creating public sets (for historical events) Clustering results Searching across other relevant resources (paid subscription services, international resources, other digital resources)

39 39 Prototype release To be released to stakeholders who have given microfilm content Stakeholders able to view their data Feedback on data quality and search functionality Amendments made and then ‘search and delivery version 1’ released to a wider group for testing and feedback before public launch in 2008.

40 40 Pilot Data Canberra Times Sydney Gazette Northern Territory Times South Australia Advertiser Hobart Town Gazette, Courier, Colonial, Mercury Melbourne Argus Perth Gazette West Australian Brisbane Courier Mail (12 titles, 8000 issues = 50,000 pages = 500,000 articles)

41 41 http://www.nla.gov.au/ndp


Download ppt "1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program"

Similar presentations


Ads by Google