Presentation is loading. Please wait.

Presentation is loading. Please wait.

PubMed Central Archive at the US National Institutes of Health Prepared byMartha R. Fishel Prepared by Martha R. Fishel Deputy Chief, Public Services Division.

Similar presentations


Presentation on theme: "PubMed Central Archive at the US National Institutes of Health Prepared byMartha R. Fishel Prepared by Martha R. Fishel Deputy Chief, Public Services Division."— Presentation transcript:

1 PubMed Central Archive at the US National Institutes of Health Prepared byMartha R. Fishel Prepared by Martha R. Fishel Deputy Chief, Public Services Division Presented byBecky J. Lyon Presented by Becky J. Lyon Deputy Associate Director, Library Operations 10 th International Conference of Medical and Health Librarians Cluj-Napoca, Romania September 19, 2006

2 Focus of today’s Presentation 1. What is PubMed Central? 2. What material is included? (current content, scanned back files, Manuscripts, books) 3. How does the content get added? 4. How is it used? 5. Value added features

3 W hat is PubMed Central? Digital archive of life sciences journals Digital archive of life sciences journals –includes health policy, bioinformatics and other fields Participation is open to journals that are: Participation is open to journals that are: –covered by a major abstracting/indexing service –or, that have 3 editorial board members with current grants from major non-profit funding agencies Free access to full-text articles and supporting data Free access to full-text articles and supporting data Integrated with PubMed and other bibliographic and factual databases in NCBI’s Entrez network Integrated with PubMed and other bibliographic and factual databases in NCBI’s Entrez network

4 PMC Basic Policy Journal deposits an authoritative electronic copy that meets PMC data quality standards Journal deposits an authoritative electronic copy that meets PMC data quality standards –full-text XML –original high-resolution graphics –PDF –supplementary data Journal may delay free access (up to 2 years) Journal may delay free access (up to 2 years) –research articles usually free in one year or less Copyright is retained by publisher or author Copyright is retained by publisher or author Deposits – and free access permissions – are permanent Deposits – and free access permissions – are permanent –journal may stop depositing new material but may not withdraw material already deposited

5 PMC Numbers – August 2006 Articles available: 696,000 Articles available: 696,000 –60 percent from back issue digitization –Oldest content – Trans Am Ophthalmol Soc – is from 1865 –Total items incl. issue covers, corrections, etc.: 758,000 Unique IP addresses: 2.07 million Unique IP addresses: 2.07 million –Estimated unique users (1.5x): 2.35 million Articles retrieved – HTML full text, PDF, or scanned article summary: 7.7 million Articles retrieved – HTML full text, PDF, or scanned article summary: 7.7 million Total page views, incl. searches: 11.5 million Total page views, incl. searches: 11.5 million

6 How content gets in 1. SGML/XML from Publishers 2. Back Issue Scanning 3.NIH Manuscripts from Publ i c Access 4. Books and other non-article content

7 Current Content – XML format 1. Each journal workflow is different, and frequency depends on publishing cycle of the journal 2. PMC accepts SGML or XML – must meet certain criteria 3. Problems are often present (e.g. special characters that must be mapped to NLM DTD) 4. Time consuming and error prone to adjust data in every issue

8 Text Processing Source SGML Source XML OpenSX XML Resolve Named Character Entities Parse Source-specific XSL Transform to PMC Style Validate with PMC StyleChecker Load to PMC QA These steps can take a lot of time and cause NLM to reject or send content back for rework. All articles must pass validation and style checker

9 1. SGML/XML from Publishers 2. Back Issue Scanning 3.NIH Manuscripts from Publ i c Access 4. Books and other non-article content How content gets in

10 Back Issue Digitization Objective: Create a complete digital archive of PMC journals back to volume 1 Objective: Create a complete digital archive of PMC journals back to volume 1 Cover-to-cover digital copy of everything up to where journal began producing electronic copy Cover-to-cover digital copy of everything up to where journal began producing electronic copy –(includes articles, covers, TOCs, advertisements and administrative matter) Publisher gets free, unencumbered copy Publisher gets free, unencumbered copy

11 Scanning Highlights Currently in Production: American Journal of Public Health from 1873 American Journal of Public Health from 1873 Biophysical Journal (1960) Biophysical Journal (1960) Canadian Veterinary Medical Association Titles Canadian Veterinary Medical Association Titles Environmental Health Perspectives (1972) Environmental Health Perspectives (1972) Health Services Research (1966) Health Services Research (1966) Public Health Reports (1896) Public Health Reports (1896) Transactions of the American Ophthalmological Society (1864) Transactions of the American Ophthalmological Society (1864)

12 Production Planning Contractor’s capacity at 5 facilities is approximately 250,000 to 300,000 pages per month Contractor’s capacity at 5 facilities is approximately 250,000 to 300,000 pages per month Production is currently planned through December 2006 Production is currently planned through December 2006 Schedules are adjusted monthly based on deliveries Schedules are adjusted monthly based on deliveries New shipments are sent from NLM every 6-8 weeks New shipments are sent from NLM every 6-8 weeks

13 Progress to Date 83 Journal Titles Included 83 Journal Titles Included 5,606,059: Pages collected for processing 5,606,059: Pages collected for processing 3,500,000: Page images delivered 3,500,000: Page images delivered 297,000: XML Citations created 297,000: XML Citations created 470,000: Scanned articles in PMC 470,000: Scanned articles in PMC

14 Digitized Samples

15 Wellcome Trust Collaboration September 2004 - Cooperative agreement signed September 2004 - Cooperative agreement signed Expect to complete an additional 2.7 million pages Expect to complete an additional 2.7 million pages –Biochemical Journal - Largest archive to date – 350,000 pages scanned – Annals of Surgery –British Journal of Pharmacology –Journal of Physiology –Journal of Anatomy

16 Challenges To Date Locating old, rare copies in good condition Locating old, rare copies in good condition Assessing donations for completeness (covers, TOCs) Assessing donations for completeness (covers, TOCs) Scanning and delivering fill-in pages at NLM Scanning and delivering fill-in pages at NLM Feeding the pipeline Feeding the pipeline Quality Assurance (understanding requirements) Quality Assurance (understanding requirements) Information tracking Information tracking Every title is different Every title is different

17 Related Activities Citations for scanned articles are being phased into PubMed Citations for scanned articles are being phased into PubMed Some completed archives delivered to their publishers Some completed archives delivered to their publishers –ASM titles (Highwire) –Plant Physiology (Highwire) –Biochemical Journal (Portland Press)

18 1. SGML/XML from Publishers 2. Back Issue Scanning 3.NIH Manuscripts from Publ i c Access 4. Books and other non-article content How content gets in

19 NIH Public Access Program The National Institutes of Health (NIH) Policy on Enhancing Public Access to Archived Publications Resulting from NIH- Funded Research (Public Access Policy), which took effect on May 2, 2005, requests and strongly encourages all investigators to make their NIH-funded peer-reviewed, author's final manuscript available to other researchers and the public through the NIH National Library of Medicine's (NLM) PubMed Central (PMC) immediately after the final date of journal publication. The NIH has developed a password-protected, Web-based, NIH Manuscript Submission (NIHMS) system to implement the NIH Public Access Policy. PubMed CentralNIH Manuscript SubmissionPubMed CentralNIH Manuscript Submission

20 NIH Manuscript Submission System Author deposits began May 2, 2005 Author deposits began May 2, 2005 Voluntary submissions by NIH funded authors Voluntary submissions by NIH funded authors Third party deposits began in July 2005 Third party deposits began in July 2005 August 2006 - Just under 5% of qualifying authors are submitting manuscripts August 2006 - Just under 5% of qualifying authors are submitting manuscripts

21 NIH Public Access Policy - Status As of February 10, 2006, two bills are in Congress that mandate participation in the NIHMS As of February 10, 2006, two bills are in Congress that mandate participation in the NIHMS Neither bill is expected to pass in this legislative session Neither bill is expected to pass in this legislative session

22 NIH Manuscript Submission http://www.nihms.nih.gov/ http://www.nihms.nih.gov/

23 1. SGML/XML from Publishers 2. Back Issue Scanning 3.NIH Manuscripts from Publ i c Access 4. Books and other non-article content How content gets in

24 Bookshelf The books may be accessed in two ways: (1) searched directly using any search term or phrase (in the same way as the bibliographic database PubMed); More... (1) searched directly using any search term or phrase (in the same way as the bibliographic database PubMed); More...More... (2) found through links to PubMed abstracts. Each PubMed abstract has a "Books" button that displays a facsimile of the abstract, in which some phrases are hypertext links. (2) found through links to PubMed abstracts. Each PubMed abstract has a "Books" button that displays a facsimile of the abstract, in which some phrases are hypertext links.

25 NCBI Bookshelf NLM prefers the content of the book to be supplied in SGML NLM prefers the content of the book to be supplied in SGML The book text files are converted into XML according to the NCBI Book Document Type Definition (DTD) The book text files are converted into XML according to the NCBI Book Document Type Definition (DTD)

26 PMC – Added Value Separate, high resolution images for all illustrations Separate, high resolution images for all illustrations Links from the PMC article to its bibliographic citation in PubMed Links from the PMC article to its bibliographic citation in PubMed Links from the references in the bibliography to the citation in PubMed Links from the references in the bibliography to the citation in PubMed Links from the original article to corrections and retractions and vice versa Links from the original article to corrections and retractions and vice versa Links to PubMed related articles Links to PubMed related articles Links to PubMed articles by each author Links to PubMed articles by each author Links to related resources – such as chemical compounds and protein sequences Links to related resources – such as chemical compounds and protein sequences

27 PMC International Collaboration between NLM, publishers in PMC and international partners Collaboration between NLM, publishers in PMC and international partners Portable PMC (pPMC) Portable PMC (pPMC) Literature Archiving Software Suite Literature Archiving Software Suite –pPMC –NLM XML DTD Suite –NLM XML Authoring Tool –Portable NIHMS (pNIHMS)

28 PMC News – August 2006 Springer ‘Open Choice’ and Blackwell ‘Online Open’ articles now coming in to PMC Springer ‘Open Choice’ and Blackwell ‘Online Open’ articles now coming in to PMC Also working with OUP ‘Oxford Open’ Also working with OUP ‘Oxford Open’ Detailed tagging guidelines released for NLM Journal Publishing DTD Detailed tagging guidelines released for NLM Journal Publishing DTD Library of Congress and British Library are adopting NLM Journal DTD as a standard Library of Congress and British Library are adopting NLM Journal DTD as a standard

29 Links PubMed Central PubMed Central http://www.pubmedcentral.gov NLM DTDs and documentation NLM DTDs and documentation http://dtd.nlm.nih.gov

30 Thank you! Contact: Martha Fishel mfishel@nlm.nih.gov

31 Costs Costs vary title to title – factors include: Quantity of color and or grayscale images vs. straight black & white text Quantity of color and or grayscale images vs. straight black & white text Quantity of new xml citations prepared Quantity of new xml citations prepared Errors in the deliverables (e.g.image quality, accuracy of xml, OCR) Errors in the deliverables (e.g.image quality, accuracy of xml, OCR) Media (number of DVDs) Media (number of DVDs)

32 Costs – real example Biochemical Journal – (scanned 1906-1995) 287,000 pages Total cost=$152,000 USD* Approximately $.053 per page *Excludes project management costs at NLM and Wellcome/JISC

33 Sample Pages


Download ppt "PubMed Central Archive at the US National Institutes of Health Prepared byMartha R. Fishel Prepared by Martha R. Fishel Deputy Chief, Public Services Division."

Similar presentations


Ads by Google