Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Digital Library Service at Higher Education in India
Collecting data Chapter 6. What is data? Data is raw facts and figures. In order to process data it has to be collected. The method of collecting data.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Strategic issues for digital projects... …or, what are we doing here?
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Module 5a: Authority Control and Encoding Schemes IMT530: Organization of Information Resources Winter 2007 Michael Crandall.
Internet Research Techniques Graham Seibert Copyright 2006 This is a segment of the draft version of a large syllabus. I need your feedback to improve.
ISP 433/533 Week 8 IR in libraries. Goal Universal Access to Information Vannevar Bush 1945 article Memex A memex is a device in which an individual stores.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
Toward Automatic Processing and Indexing of Microfilm.
Introducing Symposia : “ The digital repository that thinks like a librarian”
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
1 of 5 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2006 Microsoft Corporation.
DIGITIZATION OF COMPUTER SCIENCE QUESTION PAPERS IN BHARATHIDASAN UNIVERSITY LIBRARY By V. MUTHULAKSHMI SUPERVISOR Dr. M. SURULINATHI Assistant Professor.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Document Delivery Formats for the Web and Legal Digital Collections Kevin Reiss June 18 th, 2004 Law Library Rutgers-Newark School of Law.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Introduction to digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Digitisation of Archival and Manuscript Materials in Libraries Presentation by Martin Bradley.
WORKFLOWS AND OTHER CONSIDERATIONS FOR DIGITIZATION  Steve Bingo  Processing Archivist Washington State University Libraries  Alex Merrill  Assistant.
Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and attributes Create.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
11-15 April 2011 Mauritius Institute of Health S.S.Pillai
7/14/09. Robert L. Maxwell RDA Lecture Series National Library of South Africa 22 July /14/09 Cataloging: Still a Professional Asset to Become Excited.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
The Library Cataloging Tradition Marty Kurth CS 431 February 9, 2005 [slides stolen from Diane Hillmann]
System Analysis and Design
Producción de Sistemas de Información Agosto-Diciembre 2007 Sesión # 8.
Chapter One Orientation: The world of digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitizing Photographs For Sustainable Heritage Workshop, June 12-15, 2014 By Steven Bingo Project Archivist, Washington State University.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Digitization An Introduction to Digitization Misheck Nyaluso University of Malawi College of Medicine.
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
Introduction to metadata
VIVO and Scholarly Repositories: Synergistic Opportunities.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
Evidence from Metadata INST 734 Doug Oard Module 8.
PAN-European Exploitation of the Results of the Libraries Programme - EXPLOIT German Libraries Institute Berlin EXPLOIT 1 Electronic library materials.
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Materials Digitized Alumni magazines Yearbooks 175 th Anniversary Collection USTA Boys 18/16 National Tennis Championship Drawsheets College Building Blueprints.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Chapter Two Preliminaries: Sorting out the ingredients How to Build a Digital Library Ian H. Witten and David Bainbridge.
IMT530- Organization of Information Resources1 Feedback Lectures –More practical examples –Like guest lecturers –Generally helpful in understanding concepts.
1/16/2016I. Revels Digital Imaging Workshop 1 Selection Considerations For Digital Imaging Projects.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Subject Description LIS 571 The Organization and Control of Recorded Information.
HOW SCANNERS WORK A scanner is a device that uses a light source to electronically convert an image into binary data (0s and 1s). This binary data can.
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Presenting Documents How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
DIGITIZATION IN THEORY AND PRACTICE WEBSITE: Helen Nneka Okpala Presentation done at University of.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
1 Midterm Examination. 2 General Observations Examination was too long! Most people submitted by .
Digital Stewardship Curriculum
DIGITIZATION OF PAPER DOCUMENTS OF INSTITUTE OF OCEANOGRAPHY’S LIBRARY
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Metadata to fit your needs... How much is too much?
Presentation transcript:

Planning a digital library How to Build a Digital Library Ian H. Witten and David Bainbridge

Planning a Digital Library Responsibilities Responsibilities Technology to be used Technology to be used Greenstone, DSpace, Fedora, Eprints Greenstone, DSpace, Fedora, Eprints Metadata standard to be used Metadata standard to be used Dublin Core, METS, etc. Dublin Core, METS, etc. Types of access Types of access Retrospective or Born Digital? Retrospective or Born Digital?

Responsibilities Legal Issues Legal Issues Distributing information carries responsibilities Distributing information carries responsibilities Copyright Copyright Social Issues Social Issues Respect customs of the community Respect customs of the community Both source and use communities Both source and use communities Ethical issues Ethical issues

Ideology Ideology – a clear conception of what you plan to achieve with the collection of information Ideology – a clear conception of what you plan to achieve with the collection of information Ideology of a Collection: Ideology of a Collection: Purpose Purpose Objectives Objectives Principles Principles guide what is to be included in the collection guide what is to be included in the collection Placed in Introduction to Digital Library Placed in Introduction to Digital Library

Document versus Work Work Work The disembodied content of a message The disembodied content of a message Pure information Pure information Document Document Traditional library: a physical object that embodies the work Traditional library: a physical object that embodies the work Digital library: a particular electronic encoding of a work Digital library: a particular electronic encoding of a work How are distinctions made between different manifestations of a single work? How are distinctions made between different manifestations of a single work?

Converting an Existing Library Digitizing an existing paper-based collection is the most expensive kind of project Digitizing an existing paper-based collection is the most expensive kind of project Consider whether it is worth the effort and expense Consider whether it is worth the effort and expense 16 th Century Mexican Library 16 th Century Mexican Library Incunabula Incunabula Broadsides Broadsides

Advantages of Digital Libraries Easier to access remotely than conventional libraries Easier to access remotely than conventional libraries Powerful search and browsing Powerful search and browsing Easier to add additional services Easier to add additional services Easier to organize and reorganize Easier to organize and reorganize Easier to maintain? Easier to maintain? Easier to preserve? Easier to preserve? Does your collection have these advantages? Does your collection have these advantages?

Questions to Address Will the digital library coexist with an existing physical one? Will the digital library coexist with an existing physical one? What is the collection’s growth rate? What is the collection’s growth rate? How dynamic is the collection? How dynamic is the collection? Should you consider outsourcing the whole digital library operation? Should you consider outsourcing the whole digital library operation? Could user needs be satisfied in alternative ways? Could user needs be satisfied in alternative ways?

Prioritizing Materials Special collections and unique materials Special collections and unique materials Rare books and manuscripts Rare books and manuscripts High use items High use items Research and teaching materials Research and teaching materials Low-use items Low-use items

Criteria for Digital Conversion Intellectual content Intellectual content Scholarly value Scholarly value Desire to enhance access to information Desire to enhance access to information Funding available Funding available Educational value Educational value Classroom support Classroom support Background reading Background reading Distance education Distance education Institutional Institutional Resource sharing Resource sharing Promote strengths of an institution Promote strengths of an institution Reduce handling of fragile originals Reduce handling of fragile originals Cost and space savings Cost and space savings

Building a New Collection New material New material The copyright holder may be the best one to create a digital collection The copyright holder may be the best one to create a digital collection Metadata Metadata Where will it come from? Where will it come from?

Bibliographic Entities Documents Documents Works Works Distinction between document and work Distinction between document and work Editions Editions Electronic documents use terms such as version, release and revision Electronic documents use terms such as version, release and revision Authors Authors Authority control – standardized names for authors Authority control – standardized names for authors Titles Titles Attributes of works Attributes of works

Bibliographic Entities Subjects Subjects Two approaches to automatically assign subject: Two approaches to automatically assign subject: Key-phrase extraction Key-phrase extraction Key-phrase assignment Key-phrase assignment Literary and artistic works Literary and artistic works Style, form, content, genre Style, form, content, genre Library of Congress Subject Headings (LCSH) Library of Congress Subject Headings (LCSH) Controlled vocabularies: 30,000 pages, 2,000,000 entries Controlled vocabularies: 30,000 pages, 2,000,000 entries Hierarchical relationship of broader and narrower topics Hierarchical relationship of broader and narrower topics Subject classifications Subject classifications Traditional libraries have a linear arrangement Traditional libraries have a linear arrangement Digital collection can be rearranged at the click of a mouse Digital collection can be rearranged at the click of a mouse

Digitizing Documents Digitization Digitization The process of taking traditional library materials and converting them to electronic form The process of taking traditional library materials and converting them to electronic form Allows storage and manipulation by a computer Allows storage and manipulation by a computer The process is time-consuming and expensive The process is time-consuming and expensive

Stages of Digitization Scanning Scanning Creates a digitized image of each page Creates a digitized image of each page Usually presented to the user Usually presented to the user Optical Character Recognition (OCR) Optical Character Recognition (OCR) Creates an encoded representation of the textual content of the pages Creates an encoded representation of the textual content of the pages Necessary for full-text indexing Necessary for full-text indexing Allows searching Allows searching

Decisions in Scanning Black-and-white, grayscale or color Black-and-white, grayscale or color Resolution Resolution number of pixels per linear unit number of pixels per linear unit Bits per pixel Bits per pixel Monochrome display: 16 or 256 levels of gray Monochrome display: 16 or 256 levels of gray Color display: up to 24 or 32 bpp Color display: up to 24 or 32 bpp Quality Quality Increases storage space and time to access Increases storage space and time to access

Optical Character Recognition Manual cleanup is necessary Manual cleanup is necessary Less efficient than manual keying when error rate drops below 95 percent Less efficient than manual keying when error rate drops below 95 percent

Interactive OCR Optical character recognition should be done as an interactive process Optical character recognition should be done as an interactive process Acquisition Acquisition Input from scanner or read a file Input from scanner or read a file Cleanup Cleanup Filtering, deskewing and manual cleanup of unwanted areas Filtering, deskewing and manual cleanup of unwanted areas Page analysis Page analysis Examine layout Examine layout Recognition Recognition The “OCR” part The “OCR” part Checking Checking Saving Saving Plain text, HTML, RTF, PDF, MS Word Plain text, HTML, RTF, PDF, MS Word

Page Handling Unbinding Unbinding Microfiche or microfilm Microfiche or microfilm Two most expensive parts Two most expensive parts Handling the paper Handling the paper OCR OCR

Planning a Digitization Project Outsourcing Outsourcing Cost Cost $1 to $2 for scanning and OCR $1 to $2 for scanning and OCR Quality control Quality control Verification Verification