Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.

Similar presentations


Presentation on theme: "Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American."— Presentation transcript:

1 Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American Newspapers Project BACKGROUND South Carolina Digital Newspaper Program (SCDNP) is a participant of the National Digital Newspaper Program (NDNP). NDNP is a partnership of the National Endowment for the Humanities and the Library of Congress to help states digitize their historical 19 th and early 20 th century newspapers. This digitized content is made available in the Library of Congress’ free, online database, Chronicling America: Historic American Newspapers. To date, 32 states have participated with the goal of reaching all 50 states and digitizing 20 million pages by 2020. SCDNP has participated since 2009 and is digitizing 300,000 newspaper pages. For more information, visit SCDNP’s website at http://library.sc.edu/digital/newspaper/. http://library.sc.edu/digital/newspaper/ STANDARDS & GUIDELINES SCDNP follows strict NDNP technical guidelines and specifications. Metadata standards are derived from several standards, vocabularies, and ontologies and some tie to external sites DCMI Metadata Terms the Bibliographic Ontology DBpedia Dublin Core DCMI Terms FRBR concepts in RDF GeoNames LCCN Permalink lingvoj.org MARC OAI-ORE OWL RDA WorldCat NDNP Tech Specs Produce  8-bit Grayscale images scanned from microfilm (scanned for max. resolution between 300-400 dpi relative to the originals)  OCR with bounding boxes-no article segmentation  Structural metadata for pages, issues, editions, and titles to support chronologically-based browsing interface  Four deliverables per page including a tiff, jpeg2000, pdf, and ocr file in xml format  Up-dated MARC records from the CONSER OCLC database LC Metadata Standards MARC- for extracting descriptive data about newspaper titles; transformed into MODS xml metadata. MODS (Metadata Object Description Schema) MODS is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users. For more info, http://www.loc.gov/standards/mods/.Network Development and MARC Standards Officehttp://www.loc.gov/standards/mods/ METS (Metadata Encoding and Transmission Standard) The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress For NDNP, title, issue, and reel metadata wrapped in METS. For more info, http://www.loc.gov/standards/mets/mets-home.html.XML schema languageWorld Wide Web Consortium Network Development and MARC Standards Office http://www.loc.gov/standards/mets/mets-home.html ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. For NDNP, OCR text must be encoded using the ALTO (Analyzed Layout and Text Object) XML schema, version 2.0. For more info, http://www.loc.gov/standards/alto/.Metadata Encoding and Transmission Schema (METS) administrative metadata section http://www.loc.gov/standards/alto/ METS ALTO XML Object Model ALTO (analyzed layout and text object) stores layout information and OCR recognized text of pages of any kind of printed documents like books, journals and newspapers. ALTO is an open xml standard format to store layout and content information. It is designed to be used as an extension schema to METS where METS provides metadata and structural information while ALTO contains content and physical information. Newspaper Metadata is converted to XML Batch level metadataBatch level metadata converted to xml files Laura Blair and Virginia Pierce, South Carolina Digital Newspaper Project, USC Libraries Reel level metadata Reel level metadata converted to xml files Issue and Page level metadata Issue/Page level metadata converted to xml files OCR xml files Chronicling America, a free, keyword searchable database for digitized historic newspapers. Visit the site at http://chroniclingamerica.loc.gov/http://chroniclingamerica.loc.gov/ View of final product: a n historical South Carolina newspaper page loaded into Chronicling America,


Download ppt "Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American."

Similar presentations


Ads by Google