On the Two Sides of the Pond By Hans-Jörg Lieder, Head of the Department of Bibliographic Services – Union Catalogue of Serials Staatsbibliothek zu Berlin.

Slides:



Advertisements
Similar presentations
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Advertisements

The Europeana Newspapers Project A Gateway to European Newspapers Online.
LIBER, Europeana and the Europeana Newspapers Project Dresden, Aleš Pekárek, Association of European Research Libraries, Den Haag, NL.
The European(a) Newspapers Project A Gateway to European Newspapers Online Paris, Thorsten Siegmann, Staatsbibliothek zu Berlin, Germany.
Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
The UM Libraries’ Frost Concert Archive Documenting the Performance History of the University of Miami Frost School of Music Amy Strickland University.
Services Digitisation & Content Management. 600 People – India.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,
Joachim Bauer Senior System Engineer, CCS
Library & Information Services Using the Library Catalogue Part 1: Searching the Catalogue Rachael Hartiss 2008.
Formation of ETD‘s and releated issues 6th ETD Conference May 20 – , Berlin Dr. Nikola Korb, Co-ordination Agency DissOnline Deutsche Bibliothek.
Contents and Formats Existing Digital Sources Gertraud Griepke Cornell University, July 26th 2002.
1 History in a digital world: helping communities access and explore their heritage through newspapers. Cathy Pilgrim – Director, Australian Newspaper.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
JSTOR & OCR - A Case Study Kiffany Francis. What is JSTOR? “JSTOR is a not-for- profit organization with a dual mission to create and maintain a trusted.
Online resources in TCD Library:
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008.
1 Australian Newspapers Digitisation Program Development of the Newspapers Content Management System Rose Holley – ANDP Manager ANPlan/ANDP Workshop, 28.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
The German Union Catalogue of Serials and its interlibrary services Hans-Jörg Lieder Head of the Department of Bibliographic Services Staatsbibliothek.
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Online Resources From Oxford University Press This presentation gives a brief description of University Press Scholarship.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,
Contactforum: Digitale bibliotheken voor muziek. 3/6/2005 Real music libraries in the virtual future: for an integrated view of music and music information.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Cataloging and Metadata at the University Library.
© January/2008 CCS Content Conversion Specialists GmbH Weidestr. 134, Hamburg, Germany consulting technology digitization services.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Questys Text & Image Management System Records Management for the Information Age.
1 Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Track 1 – Part 1 What can we do to prepare the library of the future for researchers ? The Europeana Library Conference Madrid, December 2012.
Library needs and workflows Diane Boehr Head of Cataloging National Library of Medicine, NIH, DHHS
1 Using Digital Technologies to unlock history for researchers. Rose Holley – Manager Newspaper Digitisation Program Australian Academy of the Humanities.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
1 Bridging the gap between the paper past and digital future.
International Seminary on Digitisation: Experience and Technology 11 th May 2004 | National Library | Lisbon – Portugal DIGITAL ARCHIVE OF PORTUGUESE ART.
Digitization Programmes National Library of the Czech Republic Adolf Knoll
The Evolving Digital Mathematics Library: A Mathematics Librarian’s Perspective Timothy W. Cole University of Illinois at Urbana-Champaign 8 Dec
Introduction to metadata
UoS Libraries 2011 EndNote X5 - basic graduate session.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
National Library of Finland Metadata in the Digitisation Process Cultural unity and diversity of the Baltic Sea Region – common history, different languages,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Future of Cataloguing: how RDA positions us for the future for RDA Workshop June, 2010.
The Basics of Managing Your Department Website March 8, 2012.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Internal Services.
Metadata Services for Publishers Bruce A. Miller Publisher Services Executive April 27, 2010.
Bibliographic Record Description of a book or other library material.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
1 July 2004 – METS Opening Day UK docWORKS/METAe The Engine for Automated Metadata Extraction and XML Tagging Claus Gravenhorst Content.
1 Yoel Kortick Senior Librarian Working with the Alma Community Zone and Electronic Resources.
0 A model for direct cataloguing in WorldCat Reiner Diedrichs Verbundzentrale des GBV (VZG) CBS-Partner Meeting, September 15th, 2015 Hamburg.
IFLA Newspapers pre-conference Geneva, Arturs Zogla
Information modeling and infrastructures for metadata
Digital library and OR 21 October 2002 Members’ Council
VI-SEEM Data Repository
DIGITAL LIBRARY.
Márton Németh – László Drótos How to catalogue a web archive?
Tutorial Introduction to help.ebsco.com.
AUC’s Role In Facilitating Access To Knowledge In The Arab World
Presentation transcript:

On the Two Sides of the Pond By Hans-Jörg Lieder, Head of the Department of Bibliographic Services – Union Catalogue of Serials Staatsbibliothek zu Berlin - Preußischer Kulturbesitz; Dr. Katalin Radics, Distinguished Librarian; Librarian of the West European Collections and Classics Young Research Library, University of California, Los Angeles

UNIQUE EUROPEAN MATERIALS – HELD IN A US LIBRARY

Partnership between the UCLA Library and Staatsbibliothek zu Berlin

Newspapers on the way to discoloring and disintegration Storage facility of the University of California Libraries on the UCLA campus

- Leaflets 13”x18.5” or 33cm x 47cm - Imprint indicating the title, date, the number of the issue; warning -Published four or five times a day

UCLA stamps including receiving dates Packed in wrapping paper probably after 1940, packages of sheets No documentation (ordering or receiving records) in the library archives; no correspondence Normal serial subscription scheme (?) Very minimal cataloging record – very low use

Towards a Weeding Decision Brittle condition Check for other holdings in California, US and World libraries OCLC – no other holdings at the time of checking Nine 1938 issues at BNF No holding at the German National Library (Deutsche Nationalbibliothek) Contact with head of Zeitungsabteilung, Staatsbibliothek – no holding in Germany UNIQUE!!! Decision: keep and preserve the UCLA holdings.

Keep and Preserve 9600 pages with gaps Acid-free boxes The most fragile pages in mylar

Digitization Project Funding for digitization Highest quality resolution: 600 dpi RGB Add minimal metadata

Title Deutsches Nachrichtenbüro. 5 Jahrg., Nr. 1581, 1938 October 1, Erste Morgen-Ausgabe Alt ID _ _1581 [Local] AltTitle Erste Morgen-Ausgabe [Descriptive] Deutsches Nachrichtenbüro [Descriptive] Date October 1, 1938 [Publication] [Normalized] Format 1 p. [Extent] Language ger Name University of California, Los Angeles. Library. Dept. of Special CollectionsUniversity of California, Los Angeles. Library. Dept. of Special Collections [Repository] Type newspapersnewspapers [Genre] texttext [Type Of Resource]

Digitized copies: part of UCLA Digital Library at -- freely accessible Searchable only by date More sophisticated searching capability needed – day by day chronicle of the Third Reich for a short period of time -events -names -institutions etc. Deutsches Nachrichten Büro – December 5, network of 36 local services (Landesdienste)

Indexing needed Fraktur – major problem Transliteration into Latin characters OCR (Optical Character Recognition) – has to be made in Germany Looking for a German Partner

Not a problem … here we are!

… but who are “we”? Project: Europeana Newspapers: 18 partners from 12 countries Tasks: Provide OCR for 18 million pages Provide OLR for 2 million pages Provide NER experimentally in assorted languages Provide best practice recommendations for newspaper metadata Provide quality prediction tools Aggregate content and make it available to TEL and Europeana OCR = Optical Character Recognition OLR = Optical Layout Recognition NER = Named Entities Recognition

A Dance of Acronyms: UCLA, SBB and CCS UCLA sent data on hard drive SBB Checked data for correctness and moved images into directory structure Sent data to CCS in Hamburg for OCR and OLR CCS (Content Conversion Specialists) Created full texts per article Stuck data in NZ web service for preliminary presentation purposes SBB Will perform QA of OCR and OLR results Will provide all data to UCLA for further use Will present data in ZEFYS, its own newspaper portal; to the Deutsche Digitale Bibliothek; to TEL (The European Library) and to Europeana

Layout and structure analysis  recognition of words, text lines, text blocks, columns and classification of text blocks, illustrations, advertisements, tables and the following page types: - title page (the title page of an issue) - content page (a page that consists of content/text only) - illustration page (a page that has at least one illustration) - advertisement page (a page that contains adverts only)  Structure analysis through classification of headlines and grouping of zones into articles (incl. article continuation)

ENP OLR workflow | Conversion without scanning Digital Image Metadata Delivery Digital Image Metadata Delivery Digital Object Return Inspection / Automatic QA Doc Delivery Reject Conversion facility Material location Conversion MD Recording

Quality assurance CCS | Automated markup and basic manual correction: - headlines, illustrations, tables, captions, advertisements, etc. - article segmentation and grouping of zones into articles (incl. continuation) Content Provider (Library) Recommended: - Zoning: correct classification of blocks as „text“ or „illustration“ - Article segmentation: correct identification of headlines/text blocks/captions - Grouping: correct gouping of blocks (text, illustration) to articles - Metadata: correct title, issue date and issue number Optional: - Page types: correct page types - Page numbers: correct page sequence - OCR: perform text correction of specific zones (e.g. headlines, captions)

Output | METS/ALTO package  METS/ALTO metadata schemas to describe the structured digital output object  A newspaper issue processed in docWorks is converted into one METS XML file. It reflects the whole physical and logical structure, manages all links to the image files and the related ALTO XML files. ALTO is based on a standardized page description schema and contains all information of a page (print space, margins, coordinates, OCR results).  Benefits of structural markup: - better browsing and more precise text search - better access and display on tablet and mobile devices - automated article classification and clustering through data/text mining and linguistic technologies - user engagement for manual online text correction, article classification, annotation, building personal collections, etc. - sharing articles via social media platforms like Facebook, Twitter, etc. _______________ METS = Metadada Encoding and Transmission Standard ALTO = Analyzed Layout and Text Object

Preliminary Presentation