Presentation on theme: "NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs."— Presentation transcript:
NATIONAL LIBRARY OF MEDICINE PubMed Central and the NLM DTDs
NATIONAL LIBRARY OF MEDICINE PubMed Central PubMed Central (PMC) is NLM's digital archive of life sciences journal literature. Dual Purpose: Archiving journals Display of “free” full-text journal articles PMC contains over 100,000 articles from more than 100 titles. (the BMC disclaimer)
NATIONAL LIBRARY OF MEDICINE Back Issue Scanning PMC has started a pilot project to digitize back issues of journals. start with journals participating in PMC (JMLA, PNAS, ASM titles) journal is scanned cover to cover (including frontmatter and ads. article headers and abstracts (that are not available through PubMed) are being keyed in XML articles will be displayed as HTML headers with PDF or TIFF representations of the pages. 4C and halftone images will be scanned and displayed with the article. All current journals should be scanned by Spring 2004.
NATIONAL LIBRARY OF MEDICINE Intermission The NLM DTDs
NATIONAL LIBRARY OF MEDICINE PubMed Central DTD History pmc-1.dtd DTD currently in production. Derived from keton.dtd and BMC article.dtd. Designed to be a simple DTD for online display and archive. Written with samples from PNAS, MBC, and BMC. Why a new DTD? Elements/attributes had to be added to accommodate new journals. DTD would become cumbersome quickly if we had to keep making changes for each new title. Original “simplicity” of design would lead to confusing data structures as the dtd expanded. Moved away from standard XML practices to accommodate source SGML. Needed an independent review.
NATIONAL LIBRARY OF MEDICINE The Reviewers Mulberry Technologies, Inc The Task Review the pmc-1.dtd for XML best practices, applicability to archive and online retrieval use, and completeness in application to STM journals. Create an updated version of the DTD Document the new DTD. An electronic publishing consultancy specializing in SGML- and XML- based systems. Has been active in SGML since 1984 and in XML since 1996. Has extensive experience in the development and maintenance of SGML and XML applications for STM publishers.
NATIONAL LIBRARY OF MEDICINE The Results pmc-2.dtd Mulberry’s Suggestions Create two DTDs: one for archiving to allow us to convert data from multiple sources to our DTD. a subset for authoring to allow us to retain some control when publishers create articles to the DTD. Use proven solutions like XLINK and the XHTML table standard. Use data models to simplify the DTD.
NATIONAL LIBRARY OF MEDICINE Harvard E-Journal Archiving Project The Melon Foundation funded the Harvard Library to study the feasibility of using one DTD for archiving journal articles. Harvard commissioned Inera, Inc. for the E-Journal Archive DTD Feasibility Study. Conclusion – yes, it is feasible, but the right DTD does not exist. A meeting was held in April 2002 to discuss the changes needed to the PMC2 DTD to expand its range to include most any journal. Attendees included PMC, Mulberry Technologies, Inc. (consultant to PMC), The Mellon Foundation, The Harvard Library, and Inera (consultant to Harvard- Mellon).
NATIONAL LIBRARY OF MEDICINE Conclusions 1.PMC and Harvard-Mellon had different ideas about what the DTD should do. Harvard was interested in an Interchange DTD, which would allow publishers to submit in multiple formats, which would all be valid. PMC was interested in an Archive DTD, which would be open enough to allow conversion of multiple sources into one single format. 2. If the PMC2 DTD was modularized, and some pieces were added (like the OASIS table model), many DTDs could be built using the same elements, giving both flexibility and consistency.
NATIONAL LIBRARY OF MEDICINE Status The “NLM Archiving and Interchange DTD Suite” has been created and released. Mulberry and Inera analyzed hundreds of journals across subjects to insure that the DTD Suite was powerful enough to tag them. The “NLM Journal Archiving DTD” and the “Journal Publishing DTD” have been created from the DTD Suite. The Archiving DTD and the Suite were circulated through the Mulberry’s and Inera’s contacts in the electronic publishing world for comments and suggestions. Suggestions that made the DTD more useable were incorporated.
NATIONAL LIBRARY OF MEDICINE Archiving / Publishing DTDs PLoS is using the DTD for their journals TechBooks is using Journal Publishing DTD to send PMC content for J. Athletic Training High Wire Press analyzing the DTDs for its use JSToR will use the DTD for its E-Journal Archive CSIRO (Australia's Commonweath Scientific & Industrial Research Organisation) will tag its journals with the new DTD Several others small journals trying to use the DTD to submit content to PMC
NATIONAL LIBRARY OF MEDICINE The Metadata The DTDs are article-based. Metadata in each article is broken down into two parts: Journal Metadata Article Metadata
NATIONAL LIBRARY OF MEDICINE Journal Metadata Journal Metadata carries all information about the journal that the article is (was) published in. Journal Identifier(s) (by archive name, doi, nlm title abbreviation, publisher ids) Journal Title Abbreviated Journal Title ISSN(s) – print and/or electronic Publisher information
NATIONAL LIBRARY OF MEDICINE Article Metadata Article Metadata carries information about the article (and its ‘address’ related to the journal. Article Id(s) Article Categories –subject categories or TOC sections Article Titles – includes title, subtitle, translated title and alternate title. Contributors – authors and editors and their affiliations Author notes – ‘footnotes’ specific to authors Publication Dates – print, electronic, preprint, collection Volume and Issue
NATIONAL LIBRARY OF MEDICINE More Article Metadata Pagination – first/last page or ‘elocation-id’ Article-level links Product information – for book, software, or hardware reviews Article History – dates received, accepted, etc Copyright information Related article information Abstracts Keywords Contract/Grant information Counts – figures, tables, equations, pages
NATIONAL LIBRARY OF MEDICINE What’s Next?: Working Group To keep the DTD relevant to the publishing and archiving communities, we have created the XML Interchange Structure Working Group. This group advises NLM on recommended changes in and/or additions to the tagset. The Working group met for the first time on August 18, 2003. The recommendations from this meeting led to version 1.1 of the DTDs, released on November 1, 2003.
NATIONAL LIBRARY OF MEDICINE What’s Next?: Other DTDs Because the DTD is built as a set of DTD modules, other document types can be created (relatively) easily using the same content models. We are building a Books DTD and planning an Online Documentation DTD.
NATIONAL LIBRARY OF MEDICINE Links PubMed Central – http://www.pubmedcentral.gov NLM DTDs and documentation http://dtd.nlm.nih.gov firstname.lastname@example.org
NATIONAL LIBRARY OF MEDICINE The PMC Team Andrei KolotevMarla Fogelman Anh NguyenMorais Burge Brooke DineSergey Koshelkov Ed SequeiraSergey Krasnov Jane DavenportVladimir Sarkisov Jeff BeckVladislav Merker Laura Kelly