Presentation on theme: "Digital Libraries with Greenstone: an open source solution Tod Olson - University of Chicago Fred Miller - Illinois Wesleyan University Curtis Kelch -"— Presentation transcript:
Digital Libraries with Greenstone: an open source solution Tod Olson - University of Chicago Fred Miller - Illinois Wesleyan University Curtis Kelch - Illinois Wesleyan University Copyright Tod Olson, Fred Miller, and Curtis Kelch 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate otherwise or to republish requires written permission from the author.
Digital Libraries with Greenstone Introduction About digital libraries Greenstone overview Examples Future Live demos Q & A
The World of Digital Libraries Access to Digital Collections –Text, images, audio, video –Searching and metadata Digital libraries versus repositories –Access and preservation Digital Preservation Tutorial http://www.library.cornell.edu/iris/tutorial/dpm/ http://www.library.cornell.edu/iris/tutorial/dpm/
Sorting Out the Ingredients Raw materials User interface Elements of organization Building the collection
Greenstone New Zealand Digital Library Project at the University of Waikato with UNESCO, Human Info NGO International, every continent Examples: Academic –Digitization projects –Classes on digital libraries Non-academic –UNESCO humanitarian documentation
Greenstone features Works with existing documents –Imports several formats Searching: full text and metadata –Dublin Core, custom metadata Browse Structured documents –Indexing, access Extensible & customizable OpenSource software (GPL)
User Interface overview Finding documents –Search full text and metadata indexes –Classifiers: browse lists for navigating collections Navigating documents –Navigate hierarchical documents by logical structure –Simple page turning (not shown) –Single page for simple documents (not shown)
Greenstone Architecture Receptionist Collection Server DB & Indexes Redrawn from Witten & Bainbridge, How to Build a Digital Library, p. 356 Protocol Collection Import DB & Indexes Collection Import DB & Indexes Collection Import Receptionist
Greenstone Architecture Receptionist Provides user interface Accept user input Send to appropriate collection server Accept results Dynamic page generation Collection Server Handle collection content Search and filter information Return results multiple collections
DB & Indexes HTML PDF ImportBuild GSAF ??? Building Collections
Building collections Create a collection framework –or work with an old collection Select documents Import documents –Converts to internal XML format (GSAF) Build collection –creates search indexes and browse listings
GSAF: internal XML format Section: Description –Metadata fields Content –Text,internal markup, images Section –No limit in number or depth Hierarchical documents Sections nest, tree structure
[Text, images, links, etc.] … GSAF: internal XML format
Config file: collect.cfg Collection-specific configuration file, collect.cfg, specifies: file types to import Indexes and browse lists –Document or section level –paragraph (text index only) display of results and browse listings document displays
Chopin Early Editions Over 400 early edition Chopin scores 1830’s to 1880’s Target audience: music scholars & musicians. On web, page-turnable JPEG images. Online in March 2003 Currently 374 scores in online collection Usage: Nearly100 hits per day, > 30% of use is international.
Catalog records Scanned Images Structural metadata METS XSLT Greenstone Archive Format Greenstone Dig. Library Software Human processing XML-based automated processing Build overview
Greenstone benefits for Chopin Robust, mature system Recovered time in project –Fast to bring up –UI out of the box –Dynamic page generation –Incremental customization XML compliant –Natural mapping from METS to GSAF
The Argus Digital Collection Illinois Wesleyan Student Newspaper –1894 to 2000 Preservation and Access Image PDF versus full text Web interface for building metadata Customized searches
Greenstone Librarian Interface (GLI) Collection management –Informed by work at GS sites –Assist collection designer –Support all phases of collection build process –Do not specify workflow Java-based GUI tool –Formerly called the “Gatherer” 2 yrs in development –Beta sites: Bangalore and elsewhere Training sessions –UNESCO sessions in Asia, Africa –JCDL 2004 tutorial
GLI functions Establish new collection (or work on old) Select files to include in collection Enrich files with metadata Select indexes, classifiers Build collection Customize appearance Preview collection
Greenstone 3 GS2 mature, 5+ yrs., wide deployment –Constraints: support legacy systems –Other technologies have matured: Java, XML GS3: rewrite in Java, XML, XSLT Distributed architecture, SOAP METS as internal format –Group assembled for Greenstone METS profile(s) OAI support planned 1 year in dev; alpha testing in lab
Links & Further Information Greenstone: http://www.greenstone.org/http://www.greenstone.org/ Chopin Early Editions: http://chopin.lib.uchicago.edu/http://chopin.lib.uchicago.edu/ Argus Digital Collection: http://www.iwu.edu/library/services/argus1.htm http://www.iwu.edu/library/services/argus1.htm Argus Greenstone Documentation: http://www.iwu.edu/~ckelch/ArgusProjectDoc12.pdf http://www.iwu.edu/~ckelch/ArgusProjectDoc12.pdf Witten & Bainbridge. How to Build a Digital Library. Morgan Kaufman, 2003.