Presentation is loading. Please wait.

Presentation is loading. Please wait.

NATIONAL LIBRARY OF MEDICINE PubMed Central Martha Fishel National Library of Medicine CENDI Meeting September 15, 2004.

Similar presentations


Presentation on theme: "NATIONAL LIBRARY OF MEDICINE PubMed Central Martha Fishel National Library of Medicine CENDI Meeting September 15, 2004."— Presentation transcript:

1 NATIONAL LIBRARY OF MEDICINE PubMed Central Martha Fishel National Library of Medicine CENDI Meeting September 15, 2004

2 NATIONAL LIBRARY OF MEDICINE What is PubMed Central? Digital archive of life sciences journals includes health policy, bioinformatics and other fields Participation is open to journals: covered by a major abstracting/indexing service or, that have 3 editorial board members with current grants from major non-profit funding agencies Free access to full-text articles and supporting data Integrated with PubMed and other bibliographic and factual databases in NCBI’s Entrez network

3 NATIONAL LIBRARY OF MEDICINE PMC Basic Policy Journal deposits an authoritative electronic copy that meets PMC data quality standards full-text XML original high-resolution graphics PDF supplementary data Journal may delay free access (up to 2 years) research articles usually free in one year or less Copyright is retained by publisher or author Deposits – and free access permissions – are permanent journal may stop depositing new material but may not withdraw material already deposited

4 NATIONAL LIBRARY OF MEDICINE Back Issue Digitization Objective: Create a complete digital archive of PMC journals back to volume 1 Cover-to-cover digital copy of everything up to where journal began producing electronic copy (includes articles, covers, TOCs, advertisements and administrative matter) Publisher gets free, unencumbered copy

5 NATIONAL LIBRARY OF MEDICINE Back Issue Digitization 1 st set of scanned journals covered 62 titles (and title variations) for approximately 2.5 million pages As of September 2004, 193,436 scanned articles are included in PMC Starting September 2004, a new cooperative agreement with the Wellcome Trust and JISC in UK to cover an additional 1.7 million pages or more

6 NATIONAL LIBRARY OF MEDICINE Titles Scanned Back to Volume 1 Antimicrobial Agents and Chemotherapy v.1,1972 BMLA v. 1,1911 Clinical Microbiology Reviews v. 1, 1988 J of Bacteriology v.1, 1916 J Clinical Investigation v.1, 1924 J Clinical Microbiology v.1, 1975 J Virology v. 1, 1967 Molecular and Cellular Biology v.1, 1981 Nucleic Acids Research v.1, 1974 Texas Heart Institute Journal v.1, 1974

7 NATIONAL LIBRARY OF MEDICINE Scanning Specifications 1-bit B&W 600 dpi G4 TIFFs 8-bit 300 dpi grayscale TIFF 24-bit 300 dpi color TIFF for illustrations Unedited prime OCR (5-pass engine) PDF with hidden text (searchable OCR “hidden behind” pg. images)

8 NATIONAL LIBRARY OF MEDICINE Digitized Samples

9 NATIONAL LIBRARY OF MEDICINE Secure permission to digitize Acquire disposable content Donor sources include publishers, associations, individuals Create issue-level inventory Prepare content for digitization – create journal style sheets Pack and ship materials Back Issue Scanning Tasks

10 NATIONAL LIBRARY OF MEDICINE QA Tasks Receive deliverables (DVDs) at NLM (NCBI) Run automated QA programs Mark random issues from each title for manual QA Perform QA (NLM contractor compares digitized image to original volumes pulled from NLM shelves) Accept or Reject a batch based on rigid criteria

11 NATIONAL LIBRARY OF MEDICINE QA Criteria XML Character and Tag accuracy – 99.95% Inventory Error – 100% Image quality – 100% Distortion Color Visible pixilation OCR quality – 100% for completeness and zoning (unedited) PDF sequence and source accuracy – 100%

12 NATIONAL LIBRARY OF MEDICINE Sample Batch Status Report XML# of samples # of samples failed % failedStatus Tag content accuracy (99.95%) 5135 tags12 tags0.23%Failed PDF# of samples # of samples failed % failedStatus Visible skew (99%) 675 pages1 page0.15%OK OCR file# of samples # of samples failed % failedStatus Full Page Image # of samples # of samples failed % failedStatus Illustration Image # of samples # of samples failed % failedStatus Other# of samples # of samples failed % failedStatus Recommended action Reject

13 NATIONAL LIBRARY OF MEDICINE Final data Preparation Update Inventory database indicating issues returned Format TIFFs, PDFs, Organize journal parts Load to Preview Site for publisher review Load to Live PMC site Retain indefinitely!

14 NATIONAL LIBRARY OF MEDICINE Progress to Date 25,000: Issues received 1.8 million: pages scanned 156,000: XML Citations created

15 NATIONAL LIBRARY OF MEDICINE Challenges To Date Locating old, rare copies in good condition Scanning and delivering fill-in pages at NLM Feeding the pipeline Maintaining even workflow at NLM Quality Assurance (understanding requirements)

16 NATIONAL LIBRARY OF MEDICINE Find Out More PubMed Central home http://www.pubmedcentral.gov/ NLM Journal XML DTDs http://dtd.nlm.nih.gov/


Download ppt "NATIONAL LIBRARY OF MEDICINE PubMed Central Martha Fishel National Library of Medicine CENDI Meeting September 15, 2004."

Similar presentations


Ads by Google