Open Source Software for Digital Libraries Jon Dunn Associate Director for Technology Associate Director for Technology John A. Walsh Manager of Electronic.

Slides:



Advertisements
Similar presentations
Richard Jones, Systems Developer Technical Issues for Repository Software Theses Alive! Edinburgh University Library SHERPA Nottingham.
Advertisements

IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Theo Andrew, Edinburgh University Library Choosing Suitable Open-Source Repository Software Choosing Suitable Open Source Repository Software Theo Andrew.
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Developing a Generic Toolkit: Architecture and technology issues ALLC/ACH Conference.
Comparison of EPrints 3.0 and DSpace digital library systems Kuzma Kudim, Galina Proskudina.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Open Source software for libraries By Katharina Penner.
A. Grigorov, A. Georgiev, M. Petrov, S. Varbanov, K. Stefanov Building a Knowledge Repository for Life-long Competence Development.
Digital Libraries: Study into the features of the DSpace Suite Devika P. Madalli Documentation Research and Training Centre Indian Statistical Institute.
Wangga: Songs of North Australia The University of Sydney Library Ross Coleman Sten Christensen Gary Browne Department of Music, University of Sydney Professor.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
Portal Technologies An overview of portal products and other software.
Developing the NSDL User Portal Dean Krafft, Cornell University
Knowledge Management Oswaldo Salcedo Brian Wight Travis Gibbs.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support, South Asia Librarian & Head, CDDL, IIM Kozhikode OPEN SOURCE TECHNOLOGIES FOR LIBRARIES.
Platforms, installation, configuration; accessing example collections Course material prepared by Greenstone Digital Library Project University of Waikato,
Choosing an IR Platform Charl Roberts – University of the Witwatersrand, Johannesburg.
Geneve, February 12, 2004 CERN OAI 3 Workshop - Tutorial 2 F. Lützenkirchen Implementing institutional Content Repositories with MyCoRe and MILESS 3rd.
Cocoon and Digital Libraries in the Humanities Hugh A. Cayless UNC Chapel Hill.
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Digital Library Architecture and Technology
Developing Interfaces and Interactivity for DSpace with Manakin Part 2: Technical and Conceptual Overview of Dspace and Manakin Eric Luhrs Digital Initiatives.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Copyright 2006, The Ohio State University Mary Manning Eric Schnell Using Greenstone Open-Source Digital Library Software at a Cultural Heritage Institution.
SDPL 2002Notes 7: Apache Cocoon1 7 XML Web Site Architecture Example: Apache Cocoon, a Web publishing architecture based on XML technology
Offline aAQUA. Developmental Informatics Lab Availability: Offline Access Works in resource constrained environment –intermittent and low bandwidth connectivity.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
Building XML Portals with Cocoon M atthew Langham S&N AG
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Archivists' Toolkit - CRADLE Presentation, 10 Feb The Archivists’ Toolkit CRADLE Presentation 10 Feb
Archivists' Toolkit - CDL Presentation, October 17, 2005 The Archivists’ Toolkit Lee Mandell Brad Westbrook.
METS Dissemination METS Opening Day Corey Keith
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science NOTE: CSG
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
AxKit A member of the Apache XML project Ryan Maslyn Kyle Bechtel.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
DSpace - Digital Library Software
DAEDALUS: ePrints Overview Web Meeting, 4th December 2004 William J Nixon Project Manager (DAEDALUS)
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
A Basic Introduction By Scott Phillips 2005/8/7. Agenda What is DSpace and what does it do? The DSpace Information Model Components & Features of DSpace.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
DSpace An Open Source Dynamic Digital Repository Xizi (Cecilia) Cai IS565 Spring 2013 DL Topic Presentation.
Rendering Syndicated Library Content in an Institutional Portal: Integrating MyLibrary into uPortal John Fereira: Cornell University Eric Lease Morgan:
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
Introduction, Features & Technology
VI-SEEM Data Repository
Introduction to DSpace
Open Source software for libraries
Implementing an Institutional Repository: Part II
The Fedora Project April 28-29, 2003 CNI, Washington DC
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Open Source Software for Digital Libraries Jon Dunn Associate Director for Technology Associate Director for Technology John A. Walsh Manager of Electronic Text Technologies Indiana University Digital Library Program IU Digital Library Brown Bag Series Bloomington, IN 09 April 2004

Outline  Open Source Introduction  Categories of Open Source Software for Libraries  Open Source Digital Library Systems  Open Source XML Tools and Systems

What is open source software?  In the phrase open source, source refers to source code, the human-readable computer code which is the origin, or source, of the computer application. Open refers to the terms of access to that computer source code. So open source software is software for which the source code is freely available. But this is a very general and incomplete definition.  A detailed definition of open source software is maintained by the Open Source Initiative Open Source InitiativeOpen Source Initiative

Advantages and Disadvantages Advantages  Access to source code and ability and right to modify it  Right to redistribute modifications to benefit wider community  Free  Excellent support networks  Large and enthusiastic user base Disadvantages  Limited or no accountability  Informal and unaccountable support channels

Categories of Open Source Software  Operating Systems Linux Linux  Programming Languages Perl, PHP, Python Perl, PHP, Python  Applications Apache, Tomcat, emacs, grep, MySQL, sendmail, ssh Apache, Tomcat, emacs, grep, MySQL, sendmail, ssh

Different Open Source Licenses  GNU GPL ("General Public License")  GNU Lesser GPL  BSD License  Mozilla Public License  IU Open Source License  And more... And more... And more...

Open Source Software in the DLP  Linux, Apache, Tomcat, PHP, Perl, DLXS, ImageMagick, ePrints, MySQL, Darwin Streaming Server, emacs, CVS, Webalizer, LibXML, LibXSLT, Saxon, and more!

Open Source Resources  Open Source Initiative Open Source Initiative Open Source Initiative  GNU GNU  SourceForge SourceForge

Some categories of open source library software  Library-oriented search engines Cheshire, Pears Cheshire, Pears  Z39.50 toolkits ZetaPerl (Perl), JAFER (Java), YAZ (C/C++) ZetaPerl (Perl), JAFER (Java), YAZ (C/C++)  MARC parsers MARC.pm (Perl), MARC4J (Java) MARC.pm (Perl), MARC4J (Java)  Image processing ImageMagick, tiffinfo/tiffdump ImageMagick, tiffinfo/tiffdump

Some categories of open source library software  Portals MyLibrary MyLibrary  OAI service providers and data providers PHP OAI Data Provider PHP OAI Data Provider Lots! See Lots! See  METS tools Page turners, toolkits, more: see Page turners, toolkits, more: see  Digital object repositories Fedora Fedora

A Good Starting Point  oss4lib: Open Source Systems for Libraries

Complete DL Systems  DSpace  Eprints  Greenstone

DSpace  “DSpace is a groundbreaking digital institutional repository that captures, stores, indexes, preserves, and redistributes the intellectual output of a university’s research faculty in digital formats.”  Developed jointly by MIT Libraries and Hewlett- Packard  Licensed under BSD distribution license 

DSpace  Supports submission of, management of, and access to digital content Formats: text, images, audio, video Formats: text, images, audio, video  Organized based on organizational needs of a large university Communities and collections Communities and collections

DSpace Features  Digital preservation Persistent IDs, support levels for different file formats Persistent IDs, support levels for different file formats  Access control  Versioning  Search and retrieval Based on qualified Dublin Core metadata Based on qualified Dublin Core metadata  OAI-PMH data provider To support metadata harvesters To support metadata harvesters

DSpace Technology  OS: Unix or Linux  Written in Java  PostgreSQL relational database  Provides complete Web user interface, but Java APIs available

DSpace Data Model

DSpace Architecture

DSpace Demonstration  MIT DSpace dspace.mit.edu dspace.mit.edu dspace.mit.edu

EPrints  “free software which creates online archives”  Developed by University of Southampton, UK  Supports self-archiving of e-prints  Can be configured as institutional repository or otherwise, e.g. repository focused on particular research area or discipline  Licensed under GNU General Public License  software.eprints.org software.eprints.org

EPrints  Supports submission, management of, and access to digital content  Can support multiple archives on one server  Moderated or unmoderated archives  Search and retrieval Based on metadata Based on metadata Metadata can be customized for different archives and document types Metadata can be customized for different archives and document types  No access control  OAI-PMH data provider

EPrints Technology  OS: Unix or Linux  Written in Perl  Requirements: Apache web server Apache web server MySQL relational database MySQL relational database

EPrints Demonstration  Digital Library of the Commons dlc.dlib.indiana.edu dlc.dlib.indiana.edu dlc.dlib.indiana.edu

Greenstone  “Suite of software for building and distributing digital library collections”  Developed by University of Waikato, New Zealand Developed in cooperation with UNESCO and the Human Info NGO Developed in cooperation with UNESCO and the Human Info NGO  Licensed under GNU General Public License 

Greenstone Features  Supports creation and management of collections by administrator(s)  Web interface for search and retrieval Customizable metadata Customizable metadata Supports full text search of content Supports full text search of content  Extensive document filters Word, Excel, PowerPoint, PDF,... Word, Excel, PowerPoint, PDF,... Can extract metadata from documents Can extract metadata from documents  Many ways to build a collection, including: Local files Local files Retrieve web sites Retrieve web sites Retrieve objects via OAI-PMH Retrieve objects via OAI-PMH

Greenstone Features  Focus on: Ease of installation Ease of installation Ease of use Ease of use Internationalization Internationalization Full support for English, French, Spanish, Russian, and KazakhFull support for English, French, Spanish, Russian, and Kazakh Support for many other languagesSupport for many other languages Low barriers to use Low barriers to use Minimal system requirementsMinimal system requirements Creation of CD-ROMsCreation of CD-ROMs

Greenstone Technology  Runs on Windows (back to 3.1), Linux, Mac OS X, Unix  Written in C++, Perl, and Java  Uses MG/MG++ search engine  Several different Web and Java/Swing user interfaces for various functions  Web interface for user access

Greenstone Demonstration  Examples at

Open Source XML Tools and Systems  Utilities Xalan, Xerces, libxml, libxslt, saxon Xalan, Xerces, libxml, libxslt, saxon  Editors emacs / nxml-mode emacs / nxml-mode  Database / Search Engines Apache XindiceApache Xindice Berkeley DB XMLBerkeley DB XML eXisteXist  Publishing/WebApplication Frameworks AxKitAxKit CocoonCocoon

XML Databases & Search Engines  Apache Xindice Apache Xindice Apache Xindice  Berkeley DB XML Berkeley DB XML Berkeley DB XML  eXist eXist

Apache Xindice   Technology: Java  Optimized for large numbers of small XML files. Does not work well on large files.

Berkeley DB XML   Technology: C  C++ and Java APIs

eXist   Technology: Java

XML Publishing / Web Application Frameworks  XML Publishing, or Web Application, Frameworks provide systems for publishing XML data in a variety of formats, such as HTML, WAP/WML, PDF, etc. Both AxKit and Cocoon use a "pipeline" paradigm to route incoming requests through different processing routines.  Apache AxKit Apache AxKit Apache AxKit  Apache Cocoon Apache Cocoon Apache Cocoon

Apache AxKit   Technology: Perl  AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation.

Apache Cocoon   Technology: Java  "Apache Cocoon is a web development framework built around the concepts of separation of concerns and component- based web development."

Cocoon: Key Concepts  publishing framework  XML and XSLT  "pipelined SAX processing"  separation of: content content logic logic style style  centralized configuration  sophisticated caching

Cocoon: Problems to Be Solved  Separation of content, style, logic, and management functions in an XML content based web site:

Cocoon: Problems to be Solved (cont.)  Data mapping:

Cocoon: Basic mechanisms for processing XML documents  Dispatching based on Matchers.  Generation of XML documents (from content, logic, Relation DB, objects or any combination) through Generators  Transformation (to another XML, objects or any combination) of XML documents through Transformers  Aggregation of XML documents through Aggregators  Rendering XML through Serializers

Cocoon: Basic mechanisms for processing XML documents

Cocoon: The Pipeline Sequence of interactions:

Cocoon: The Pipeline

Generators, Transformers, & Serializers  Generators Generators  Transformers Transformers  Serializers Serializers

Cocoon: Configuration: The Sitemap <map:components>...</map:components><map:views>...</map:views><map:pipelines><map:pipeline><map:match>...</map:match>...</map:pipeline>...</map:pipelines>...</map:sitemap>

Cocoon: Configuration: A Pipeline <map:pipelines><map:pipeline> <map:serialize/></map:match> </map:match> <map:serialize/></map:match> <map:read mime-type="text/css" src="technochat/resources/styles/{1}.css“ /> /></map:match> </map:match> </map:match> </map:match></map:pipeline>