Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,

Similar presentations


Presentation on theme: "Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,"— Presentation transcript:

1 Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24, 2004 by Diane Boehr Cataloging Unit Head, National Library of Medicine, National Institutes of Health, Health & Human Services boehrd@mail.nlm.nih.gov

2 Scope Historical medical works Historical medical works The NLM Archive The NLM Archive PubMed Central PubMed Central

3 Considerations as you begin a project It will take much longer than you anticipate It will take much longer than you anticipate You will learn a great deal about topics outside your normal work duties You will learn a great deal about topics outside your normal work duties Be willing to take baby steps and make a start Be willing to take baby steps and make a start It is very rewarding to see the fruits of your labor It is very rewarding to see the fruits of your labor

4 HMD Projects Historical Anatomies Historical Anatomies Medicine in the Americas Medicine in the Americas

5 Historical Anatomies http://www.nlm.nih.gov/exhibition/historic alanatomies/home.html http://www.nlm.nih.gov/exhibition/historic alanatomies/home.html Provides high-resolution downloadable scans of selected important images from illustrated anatomical atlases dating from the 15th to the 20th century Provides high-resolution downloadable scans of selected important images from illustrated anatomical atlases dating from the 15th to the 20th century Titles and images selected by Michael North, Head of Rare Books and Early Manuscripts Titles and images selected by Michael North, Head of Rare Books and Early Manuscripts

6 Historical Anatomies Consists of large JPEGs and zoomable digitized images from the books and a brief bibliographical and historical introduction to each title Consists of large JPEGs and zoomable digitized images from the books and a brief bibliographical and historical introduction to each title

7 Technical details The imaging for this project is contracted out The imaging for this project is contracted out The contractor makes archival quality TIFF files (800 ppi resolution) and from that, thumbnail and JPEG images are made for the site, using Adobe Photoshop The contractor makes archival quality TIFF files (800 ppi resolution) and from that, thumbnail and JPEG images are made for the site, using Adobe Photoshop Zoomifyer Pro is used to create the pan and zoom images Zoomifyer Pro is used to create the pan and zoom images The TIFF files are backed up on CD-ROMs The TIFF files are backed up on CD-ROMs

8 Search and retrieval Individual images do not have any metadata associated with them at this time Individual images do not have any metadata associated with them at this time Bibliographic citations on the site match the LocatorPlus records Bibliographic citations on the site match the LocatorPlus records As the focus of the site is selected individual images from the books, rather than the entire text, there are currently no links from the LocatorPlus records for the individual titles to images on the Web site As the focus of the site is selected individual images from the books, rather than the entire text, there are currently no links from the LocatorPlus records for the individual titles to images on the Web site

9 Sample screen

10 Medicine in the Americas Monographic original source materials on the development of medicine in New World published prior to 1914 are being digitized in their entirety Monographic original source materials on the development of medicine in New World published prior to 1914 are being digitized in their entirety (http://www.ncbi.nlm.nih.gov/entrez/quer y.fcgi?db=Books) (http://www.ncbi.nlm.nih.gov/entrez/quer y.fcgi?db=Books)

11 Technical details Digitizing is being done in-house Digitizing is being done in-house Books are scanned, and from the initial scan a photocopy and a TIFF file are created Books are scanned, and from the initial scan a photocopy and a TIFF file are created Photocopies are scanned to create OCR Word text files, which are then manually reviewed and cleaned up to create a searchable, downloadable PDF text in modern font Photocopies are scanned to create OCR Word text files, which are then manually reviewed and cleaned up to create a searchable, downloadable PDF text in modern font TIFF file is used to create the typeface and layout of the original published work TIFF file is used to create the typeface and layout of the original published work

12 Technical details Mounting of these texts on the Web and the XML coding of the Word files done using the NLM Bookshelf platform Mounting of these texts on the Web and the XML coding of the Word files done using the NLM Bookshelf platform Bookshelf developed by NCBI for medical texts supplied by publishers in SGML, or other desktop publishing formats Bookshelf developed by NCBI for medical texts supplied by publishers in SGML, or other desktop publishing formats Platform has an existing template that allows the record creators to easily input metadata without needing to know XML Platform has an existing template that allows the record creators to easily input metadata without needing to know XML

13 Search and Retrieval Bookshelf site only supports keyword searching Bookshelf site only supports keyword searching Standard bibliographic data from LocatorPlus and brief historical data is included with the text Standard bibliographic data from LocatorPlus and brief historical data is included with the text Catalog records have hot links to the Bookshelf site Catalog records have hot links to the Bookshelf site

14

15

16

17

18 Timeframes Both projects went from planning to implementation in about one year, although both projects will be adding more material to their sites Both projects went from planning to implementation in about one year, although both projects will be adding more material to their sites Use of standard, off the shelf products or existing technologies made implementation easier Use of standard, off the shelf products or existing technologies made implementation easier

19 NLM Archives A site to store material of permanent value that has been published on the NLM Web site, but is now outdated or superseded A site to store material of permanent value that has been published on the NLM Web site, but is now outdated or superseded Searchable, yet clearly distinguished from current material Searchable, yet clearly distinguished from current material

20 What do we mean by permanent? Three aspects to permanence were identified: Three aspects to permanence were identified: 1) Identifier validity: The extent to which the given name or identifier will always provide access to the same resource 1) Identifier validity: The extent to which the given name or identifier will always provide access to the same resource 2) Resource availability: The extent to which a given resource is guaranteed to remain available in electronic form 2) Resource availability: The extent to which a given resource is guaranteed to remain available in electronic form 3) Content invariability: The extent to which the content of the resource could change 3) Content invariability: The extent to which the content of the resource could change

21 NLM Permanence Ratings Four categories of permanence have been defined: Four categories of permanence have been defined: 1) Permanent, unchanging content: NLM has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content will not change. 1) Permanent, unchanging content: NLM has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content will not change.

22 NLM Permanence Ratings 2) Permanent, stable content: NLM has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content is subject only to minor corrections or additions. 2) Permanent, stable content: NLM has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content is subject only to minor corrections or additions.

23 NLM Permanence Ratings 3) Permanent, dynamic content: NLM has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content could be revised, replaced. 3) Permanent, dynamic content: NLM has made a commitment to keep this resource permanently available. Its identifier will always provide access to the resource. Its content could be revised, replaced.

24 NLM Permanence Ratings 4) Permanence not guaranteed: NLM has made no commitment to retain this resource. It could become unavailable at any time. Its identifier could be changed. 4) Permanence not guaranteed: NLM has made no commitment to retain this resource. It could become unavailable at any time. Its identifier could be changed.

25 Workflows Permanence ratings are assigned when a resource is promoted to the NLM Web site Permanence ratings are assigned when a resource is promoted to the NLM Web site Default permanence ratings are generated based on the category to which the resource belongs Default permanence ratings are generated based on the category to which the resource belongs Resource creators use a template which adds basic metadata, in addition to the category and permanence rating Resource creators use a template which adds basic metadata, in addition to the category and permanence rating

26 Templates Metadata input template is a feature of TeamSite, our Web content management software Metadata input template is a feature of TeamSite, our Web content management software No knowledge of HTML is needed to use these templates No knowledge of HTML is needed to use these templates Minimal set of required fields, with default values or drop-down menus supplied wherever possible Minimal set of required fields, with default values or drop-down menus supplied wherever possible

27 Required metadata 1) Title 7) Rights 2) Heading 8) Contact e-mail 3) Date first published 9) Language 4) Date last modified10) Document category 5) Next scheduled review date 11) Permanence level 6) Publisher12) URL

28

29 The NLM metadata set is based on Dublin Core, with some local adaptations The NLM metadata set is based on Dublin Core, with some local adaptations The full scheme may be seen at The full scheme may be seen at http://www.nlm.nih.gov/tsd/cataloging/metafi lenew.html http://www.nlm.nih.gov/tsd/cataloging/metafi lenew.html

30 Workflows Every resource has the minimal metadata assigned by the resource creator Every resource has the minimal metadata assigned by the resource creator Permanent resources are routed to the Cataloging Section Permanent resources are routed to the Cataloging Section Complete MARC bibliographic records are created Complete MARC bibliographic records are created Includes standardized access points, including MeSH and an NLM classification number Includes standardized access points, including MeSH and an NLM classification number Accessible in LocatorPlus Accessible in LocatorPlus Distributed to the utilities and other NLM licensees. Distributed to the utilities and other NLM licensees.

31 Workflows The enhanced metadata created in Cataloging is then added back to the header information of the online resource The enhanced metadata created in Cataloging is then added back to the header information of the online resource Preliminary metadata and the enhanced versions can be seen by clicking on "View source" Preliminary metadata and the enhanced versions can be seen by clicking on "View source"

32

33

34 Basic metadata

35

36 Enhanced metadata

37 Archive Design Separate, distinct, but integral part of the NLM Web site Separate, distinct, but integral part of the NLM Web site Searchable with standard NLM search software: Mindserver from Recommind Searchable with standard NLM search software: Mindserver from Recommind

38 Archive contents Out-of-date resources--older material that was once up on the site, but is no longer of current interest Out-of-date resources--older material that was once up on the site, but is no longer of current interest Earlier versions of current documents that have undergone major revisions Earlier versions of current documents that have undergone major revisions

39

40

41

42 Still to come Archiving non-HTML files, such as PDF, video and audio clips, etc. Archiving non-HTML files, such as PDF, video and audio clips, etc. Archiving resources from areas in the library which do not get promoted through TeamSite Archiving resources from areas in the library which do not get promoted through TeamSite

43 Impact on Cataloging PubMed Central (PMC) PubMed Central (PMC) A bibliographic record must exist in the NLM catalog before a journal is added to PMC A bibliographic record must exist in the NLM catalog before a journal is added to PMC Records must be created if the title is not already in the catalog Records must be created if the title is not already in the catalog Downloaded from OCLC Downloaded from OCLC Skeletal record created from local template Skeletal record created from local template High-priority, 24 hr. turnaround time High-priority, 24 hr. turnaround time Records are then fully cataloged Records are then fully cataloged

44 Impact on Cataloging PMC PMC If the title is already in the catalog, holdings must be updated If the title is already in the catalog, holdings must be updated Indicate the title is available in PMC Indicate the title is available in PMC Range of issues Range of issues Any embargo periods Any embargo periods

45 Impact on Cataloging NLM Archive NLM Archive Cataloger creates core level MARC records for any new resource on the NLM Web site rated Permanent Cataloger creates core level MARC records for any new resource on the NLM Web site rated Permanent View the site, as well as utilize metadata supplied by record creator for descriptive data View the site, as well as utilize metadata supplied by record creator for descriptive data Supply MeSH and NLM classification Supply MeSH and NLM classification Establish authorized name headings in the national authority file Establish authorized name headings in the national authority file Transfer this enhanced metadata back to the resource Transfer this enhanced metadata back to the resource

46 Impact on Cataloging HMD projects HMD projects Minimal impact on Cataloging Minimal impact on Cataloging Books being digitized already have records in the catalog Books being digitized already have records in the catalog HMD has its own cataloging staff who can make links between existing catalog records and digitized material HMD has its own cataloging staff who can make links between existing catalog records and digitized material

47 Impact on Cataloging Despite the increased workload, we think archiving projects are enhanced when catalogers are involved in the projects Despite the increased workload, we think archiving projects are enhanced when catalogers are involved in the projects Catalogers increase their knowledge by becoming involved in these projects Catalogers increase their knowledge by becoming involved in these projects


Download ppt "Digital Archives at the National Library of Medicine A presentation at the MLA Session Lighting the Path: Digital Repositories in the Real World May 24,"

Similar presentations


Ads by Google