Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Cornell Veterinarian A Metadata Perspective.

Similar presentations


Presentation on theme: "The Cornell Veterinarian A Metadata Perspective."— Presentation transcript:

1 The Cornell Veterinarian A Metadata Perspective

2

3 The Challenge (Reprise)

4 Hathi Volume Interface

5 Hathi Data API

6 Hathi METS File

7 Hathi METS File (Continued)

8 Hathifile Record Elements Hathi Volume ID: mdp.39015076694507 Access: allow [Notes on mapping for rights attributes where contextual user data would affect access] Rights: pd [public domain] HathiTrust record number: 000529434 Enumeration/Chronology: v.33 no.11 1900 Source: MIU Source institution record number: 000529434 OCLC number: 1554176 Title: The Chicago medical times.

9 What I [naively] thought was the solution… 1.Use the Hathi Data API to find Table of Contents for each Volume 2.Gather the related OCR 3.Parse out the article citation values from the OCR (hopefully in a mostly automated way) 4.Use the pagination data from the TOC to build links 5.What could be automated could be done manually Goal: a citation index with Hathi URLs that could be used to build an interface or given to an index like PubMED

10 HathiTrust OCR for TOC

11 PubMed Indexing and API

12 Path for automation (For citations in PubMed for which the HathiTrust has a single volume) Query: PubMed Volume AND Hathi Catalog ID against Hathi File to get all corresponding object id’s from the METS. Query: METS object id’s AND the PubMed start page for each citation to find the Orderlabel to get the Order number from METS files. Create each URL: The Hathi METS object id and Order number are used to create the URL, e.g http://babel.hathitrust.org/cgi/pt?id=coo.31924051143075;view=1up;seq=11 http://babel.hathitrust.org/cgi/pt?id=coo.31924051143075;view=1up;seq=11

13 The Metadata that Got Away…  Articles not indexed by PubMed (1991-1914)  Supplemental volumes What we hope to do about it:  Still working to see if we can programmatically create URL’s for Supplemental Volumes  Manually capture citation data and URL’s for pre-1945 articles using OCR.

14 PubMed Data Requirements  Linking Format (when we’re only contributing URL’s)  PubMed Id’s and corresponding URL’s  Administrative metadata, e.g. access restrictions, contributing source.  Required data elements for contributing citations  Journal ISSN  Journal ID or Journal title abbreviation  Journal Publisher  Copyright statement, where applicable  Volume/Issue/Article sequence or pagination  Issue publication date  Article electronic publication date?  AND URL’s

15 What does it all mean? For the project:  The Cornell Veterinarian should be available via PubMed for the years already indexed soon.  We’re still scoping out what it would take to capture the remaining citations manually. If funded this will be sent to PubMed to complete the backfile. Larger picture:  Potential for improved access to other titles currently lacking full-text linking in PubMed [if in HathiTrust]  Consider suggesting improvements to the Hathi workflows.


Download ppt "The Cornell Veterinarian A Metadata Perspective."

Similar presentations


Ads by Google