Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support, South Asia Librarian & Head, CDDL, IIM Kozhikode OPEN SOURCE TECHNOLOGIES FOR LIBRARIES.

Slides:



Advertisements
Similar presentations
Creating Institutional Repositories Stephen Pinfield.
Advertisements

Theo Andrew, Edinburgh University Library Choosing Suitable Open-Source Repository Software Choosing Suitable Open Source Repository Software Theo Andrew.
Daedalus Service Development Stephen Gallacher Lesley Drysdale.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Digital Libraries: Study into the features of the DSpace Suite Devika P. Madalli Documentation Research and Training Centre Indian Statistical Institute.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
M.G. Sreekumar Center for Development of Digital Libraries (CDDL) Indian Institute of Management Kozhikode (IIMK) Information Management.
MIT’s DSpace A good fit for ETDs Margret Branschofsky Keith Glavash MIT LIBRARIES.
The KnowledgeBank: Powered by DSpace Laura Tull Systems Librarian Ohio State University Libraries WiLSWorld July 27, 2004.
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
Greenstone Digital Library Usage and Implementation By: Paul Raymond A. Afroilan Network Applications Team Preginet, ASTI-DOST.
AgriDrupal - a “suite of solutions” for agricultural information management and dissemination, built on the Drupal CMS; - the community of practice around.
ROLE OF LIBRARY & INFORMATION RESOURCE CENTER
Open Source Software for Digital Libraries Jon Dunn Associate Director for Technology Associate Director for Technology John A. Walsh Manager of Electronic.
Overview of Search Engines
Platforms, installation, configuration; accessing example collections Course material prepared by Greenstone Digital Library Project University of Waikato,
Maintain and Modify By: Sahar Aftab (1253 ) and Mehboob Nazim (1085) Central Library.
Geneve, February 12, 2004 CERN OAI 3 Workshop - Tutorial 2 F. Lützenkirchen Implementing institutional Content Repositories with MyCoRe and MILESS 3rd.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
NAL-Institutional Repository: A Case Study CSIR Metadata Harvester I.R.N. Goudar Head, ICAST, NAL National Symposium on Open Access and.
Building Library Web Site Using Drupal
ROLE OF LIBRARY & INFORMATION RESOURCE CENTER
ETD Repositories Using DSpace Software Andrew Penman The Robert Gordon University 27 th September 2004.
Digital Library Architecture and Technology
Chapter 4 Computer Software.
Introduction to digital libraries How to Build a Digital Library Ian H. Witten and David Bainbridge.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
Dspace 1 Introduction to DSpace Mukesh Pund Scientist NISCAIR, New Delhi.
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
From Creation to Dissemination A Case Study in the Library of Congress’s use Open Source Software DLF Spring Forum Corey Keith
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
IUScholarWorks is a set of services to make the work of IU scholars freely available. Allows IU departments, institutes, centers and research units to.
Dr. M.G. Sreekumar Centre for Development of Digital Libraries (CDDL) Indian Institute of Management Kozhikode (IIMK) IIMK ’ s Experience with Greenstone.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
ReSearcher Software Update Kevin Stranack Consortial Support Librarian SFU Library ACCOLEDS/DLI Training Session - November 28 th, 2005.
ReSearcher Software Update Kevin Stranack SFU Library
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
1 By: Suman Negi, Technical Officer ‘B’ DESIDOC, DRDO, Delhi Presentation at NACLIN 14 (During 9-11 December 2014, Pondicherry) Design and Development.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
ScholarSpace & Open UH Mānoa March 2013 Beth Tillinghast Web Support Librarian ScholarSpace & eVols Project Manager UHM Library.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
DSpace - Digital Library Software
DAEDALUS: ePrints Overview Web Meeting, 4th December 2004 William J Nixon Project Manager (DAEDALUS)
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Serenate1 The librarian’s view Raf Dekeyser K.U.Leuven.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
A Project of the University Libraries Ball State University Libraries A destination for research, learning, and friends.
Research, IT & SFU Library Lynn Copeland IT & Advanced Networks Symposium May 8–9, 2006.
CERN Document Server 19 tth January 2006 CERN Document Server Jean-Yves Le Meur 19 th January 2006.
William J Nixon Setting up a Repository. Introduction Key Features to consider (and review) Wide Range of Technology Available –Best fit for purpose –Clear.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
CONTENTdm A proven solution September A complete digital collection management software solution Stores, manages and provides access for all digital.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
GNU EPrints 2 Overview Christopher Gutteridge 19 th October 2002 CERN. Geneva, Switzerland.
EnhanceEdu IIIT-Hyderabad. Agenda What’s a wiki? Comparison with a website Wiki Formatting ‘My’ Page Fun with wiki 2EnhanceEdu, IIIT-Hyderabad.
VI-SEEM Data Repository
Introduction to DSpace
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Dr. M.G. Sreekumar UNESCO Coordinator, Greenstone Support, South Asia Librarian & Head, CDDL, IIM Kozhikode OPEN SOURCE TECHNOLOGIES FOR LIBRARIES

Agenda The Current Information Landscape The Current Information Landscape Open Source Overview Open Source Overview The OS Treasure Trove The OS Treasure Trove Categories of Open Source Software for Libraries Categories of Open Source Software for Libraries Open Source Digital Library Systems Open Source Digital Library Systems Greenstone Greenstone DSpace DSpace Open Source Suite from PKP, SFU Open Source Suite from PKP, SFU Open Source XML Tools and Systems Open Source XML Tools and Systems

Foreword Demand for improved information and knowledge management solutions - universities, enterprises and institutions Demand for improved information and knowledge management solutions - universities, enterprises and institutions Digital Libraries gaining increasing social attention, academic and research interest Digital Libraries gaining increasing social attention, academic and research interest Need for Integrated access to disparate information resources Need for Integrated access to disparate information resources Key challenge - how to create online information environments facilitating internal content publishing and single point access to internal/external information sources Key challenge - how to create online information environments facilitating internal content publishing and single point access to internal/external information sources Latest DL technologies Vs Traditional libraries and knowledge management Latest DL technologies Vs Traditional libraries and knowledge management Options before us – Proprietary Vs Open Standards / Open Source Software Options before us – Proprietary Vs Open Standards / Open Source Software Fortunately we have a plethora of Open Source Solutions available for Library applications Fortunately we have a plethora of Open Source Solutions available for Library applications

The Current Environment Fascinating times in the history of libraries, information systems and electronic publishing Fascinating times in the history of libraries, information systems and electronic publishing Possibilities of building large-scale services Possibilities of building large-scale services Collections are in digital formats and Collections are in digital formats and Retrieved over networks Retrieved over networks Materials are stored on computers Materials are stored on computers Network connects the computers to personal computers on the users' desks Network connects the computers to personal computers on the users' desks In a complete digital library, nothing need ever reach paper In a complete digital library, nothing need ever reach paper

Feel of the Hour

Need of the Hour

Future Libraries? What is a library and what should it be in 2012, 2020 and beyond… What does the academic library of the future look like? What does the academic library of the future look like? Where do its walls begin and end? Where do its walls begin and end? On campus? On campus? On our desktop? On our desktop? At home? At home? Does it still have a function as a separate and distinct space? Does it still have a function as a separate and distinct space? Or has it become the first step to an all-virtual future? Or has it become the first step to an all-virtual future? Libraries have never been more interesting, difficult and challenging… Libraries have never been more interesting, difficult and challenging…

Challenges of the Day Relevance of Libraries in the Google Era Retention of Users, especially the New Gen Proliferation of Content Diverse Datastreams - Content Categories, Publication Types Multimedia, Polymedia, Multiformats Collection Building – Acquisition, Subscriptions, Licensing… Copyright, Intellectual Property, Fair Use… Technology Complexities, Infrastructure Issues Publishers’ Stringent Policies / Monopolies Integration of legacy systems and the new genre

Information Strategy Tips Context = Scenarios, Paradigms Constant = Change Technology = Facilitate, and NO intimidate Information = The Big Picture - Landscape Content = Aggregate, Integrate Service = Markup, Market Capital = Human, Tacit, Values, and Users

Factors of Change Enterprise IKM Electronic Publishing Internet & Web Intranets Digital Libraries Knowledge Management E-Information: Usage and User Behavior Experiences? Lessons? Impact? Implications?

IM: Key Goals  Develop and manage a dynamic, unified information resource base (content repository) that gathers and organizes relevant internal and global information resources, based on a taxonomy of information needs of the enterprise, and make these available for learning and informed decision making.

IM : Key Goals…  Support different manifestations of information sources – implicit/explicit, print/digital, local/remote, free/commercial, etc.  Support for delivering personalized information services to staff, both on demand and in anticipation.

User Generated Internet Content: Blogs YouTube MySpace And the same is TRUE with Scholarly Communication too!!

Top Tech Trends in IT / LIS Web 2.0 / Library 2.0 Web 2.0 / Library 2.0 Blogs / RSS Feeds / Wikis / Podcasts / Webcasts Blogs / RSS Feeds / Wikis / Podcasts / Webcasts Open Source Software, Open Standards, Open URL Open Source Software, Open Standards, Open URL User Tagging, Automated Tagging User Tagging, Automated Tagging Web OPACs, and Interface Design Web OPACs, and Interface Design Seamless Integration / Aggregation Seamless Integration / Aggregation OA -> OAP + OAA OA -> OAP + OAA Open Resource Discovery Tools - Google Scholar Open Resource Discovery Tools - Google Scholar E-Books, E-Journals, E-Resources E-Books, E-Journals, E-Resources Harvesting, Federation, Metasearching Harvesting, Federation, Metasearching Digital Rights Management Digital Rights Management

Multimedia Library Info System Multimedia Library Info System Internet / Intranet Gateway-out Data capture anywhere (access to information from anywhere)

Penetration of E-Content in Libraries PUBLICATION TYPES E-Books, E-Journals… Aggregated Scholarly E- Journal Databases Databases, CBT/ WBT Portals, Vortals… Value added services Preprints, Eprints, E- Documents…. DOCUMENT FORMATS ASCII, RTF, HTML, SGML, Postscript, PDF, Proprietary, Native Application Formats Images, Graphics Audio Video XHTML, ASP, PHP, XML...

Internally Generated Internally Generated Externally Sourced Externally Sourced Lib 2.0 & Patron 2.0 Lib 2.0 & Patron 2.0 Open Access Information Landscape [print/digital] Information Landscape [print/digital] Processes Procedures Data/Info. Manuals Reports… Processes Procedures Data/Info. Manuals Reports… Books, eBooks, Journals, eJournals, Databases, Patents, Reports, Online Resources… Books, eBooks, Journals, eJournals, Databases, Patents, Reports, Online Resources… Social Computing & Social Software Social Computing & Social Software OA Journals, OA Archives, Scholarly Articles, ePrint Archives, ETDs, eCoursewares OA Journals, OA Archives, Scholarly Articles, ePrint Archives, ETDs, eCoursewares

Shift in Approaches Traditional Automated Dig. Library AACR2 ISO 2709 CCF MARC Thesauri AACR2 CCC CC / LCCS DDC / UDC Thesauri/LCSH Metadata DCMI -- W3C EAD, TEI, DTD METS,MODS, Z39.50 MARC21 OAI-PMH Limited/ Rigid Efficient/ Flexible Improved

What Distinguishes a DL?  Site Neutrality (3 in 1 Access-Anytime,  Anywhere by Anyone Access)  Open Access  Greater variety and granularity of information  Sharing of information ‘Sharium’  Up-to-date ness  Always available (365*7*24)  New forms of rendering (New Genre)

Digital Libraries: An Overview Digital Libraries ComputingNetworkingContentCollectionsServicesCommunity

What is open source software? In the phrase open source, source refers to source code, the human-readable computer code which is the origin, or source, of the computer application. In the phrase open source, source refers to source code, the human-readable computer code which is the origin, or source, of the computer application. Open refers to the terms of access to that computer source code. Open refers to the terms of access to that computer source code. So open source software is software for which the source code is freely available. So open source software is software for which the source code is freely available.

Advantages and Disadvantages Advantages Mostly issued under an internationally accepted License Mostly issued under an internationally accepted License Access to source code and ability and right to modify it Access to source code and ability and right to modify it Right to redistribute modifications to benefit wider community Right to redistribute modifications to benefit wider community Free Free Excellent support networks Excellent support networks Large and enthusiastic user base Large and enthusiastic user baseDisadvantages Limited or no accountability Limited or no accountability Informal and unaccountable support channels Informal and unaccountable support channels

Different Open Source Licenses GNU GPL ("General Public License") GNU GPL ("General Public License") GNU Lesser GPL GNU Lesser GPL BSD License BSD License Mozilla Public License Mozilla Public License IU Open Source License IU Open Source License And more... And more... And more... And more...

A Good Starting Point oss4lib: Open Source Systems for Libraries oss4lib: Open Source Systems for Libraries

Open Source Resources Open Source Initiative Open Source Initiative Open Source Initiative Open Source Initiative GNU GNU GNU SourceForge SourceForge SourceForge

Categories of Open Source Software Operating Systems Operating Systems Linux, Free / Open BSD, Open Solaris… Linux, Free / Open BSD, Open Solaris… Programming Languages Programming Languages Perl, PHP, Python Perl, PHP, Python Applications Applications Apache, Tomcat, emacs, grep, MySQL, sendmail, ssh Apache, Tomcat, emacs, grep, MySQL, sendmail, ssh

Open Source Software for DLs Linux, Apache, Tomcat, PHP, Perl, DLXS, ImageMagick, Unreal Media Server, Greenstone, DSpace, ePrints, FEDORA, CDSWare, MySQL, Darwin Streaming Server, emacs, CVS, Webalizer, LibXML, LibXSLT, Saxon, and more! Linux, Apache, Tomcat, PHP, Perl, DLXS, ImageMagick, Unreal Media Server, Greenstone, DSpace, ePrints, FEDORA, CDSWare, MySQL, Darwin Streaming Server, emacs, CVS, Webalizer, LibXML, LibXSLT, Saxon, and more!

Some categories of open source library software Library-oriented search engines Library-oriented search engines Cheshire, Pears, dbWiz… Cheshire, Pears, dbWiz… Z39.50 toolkits Z39.50 toolkits ZetaPerl (Perl), JAFER (Java), YAZ (C/C++), Mercury Z39.50 Client… ZetaPerl (Perl), JAFER (Java), YAZ (C/C++), Mercury Z39.50 Client… MARC parsers MARC parsers MARC.pm (Perl), MARC4J (Java) MARC.pm (Perl), MARC4J (Java) MarcEdit x.php MarcEdit x.php Image processing Image processing ImageMagick, tiffinfo/tiffdump ImageMagick, tiffinfo/tiffdump

Some categories of Open Source Library software / Open Standards Portals Portals MyLibrary MyLibrary OAI service providers and data providers OAI service providers and data providers PHP OAI Data Provider PHP OAI Data Provider Lots! See Lots! See METS tools METS tools Page turners, toolkits, more: see Page turners, toolkits, more: see Dublin Core Dublin Core

Web Server Apache Apache Lots in Java! see at… Lots in Java! see at…

Database Management Systems (DBMS) MySQL MySQL PostgreSQL PostgreSQL mSQL mSQL CDS/ISIS, Win/ISIS, GenISIS etc. CDS/ISIS, Win/ISIS, GenISIS etc.

Web Server-Side Scripting PHP PHP Architecture Architecture Linux, Apache, MySQL, PHP (LAMP) Linux, Apache, MySQL, PHP (LAMP) Windows, Apache, MySQL, PHP (WAMP) Windows, Apache, MySQL, PHP (WAMP)

Web Services Apache Tomcat Web Container/Service Apache Tomcat Web Container/Service Apache Cocoon Content Framework/Service Apache Cocoon Content Framework/Service Apache Ant Build Tool Apache Ant Build Tool

Integrated Library Management System (ILMS) Managing legacy systems Managing legacy systems KOHA KOHA Evergreen Evergreen Emilda Emilda OpenBiblio OpenBiblio phpMyLibrary phpMyLibrary NewGenLib NewGenLib

Server Log Analysis Webalizer Webalizer

Z39.50 Protocol for online/remote Search & Retrieval ( ) Interoperability standard (ANSI/NISO Standard) and a software which facilitates cross- database/archives search Interoperability standard (ANSI/NISO Standard) and a software which facilitates cross- database/archives search A client-server protocol for searching and retrieving information from remote computer databases A client-server protocol for searching and retrieving information from remote computer databases YAZ Z39.50 Client - YAZ Z39.50 Client - 'Mercury' Z39.50 Client - 'Mercury' Z39.50 Client -

Serials Manager CUFTS CUFTS

Citation Manager Citation Manager (from PKP, Simon Fraser University, Canada ) Citation Manager (from PKP, Simon Fraser University, Canada ) Bibliographic Management ( Bibliographic Management (

Link Resolving GODOT - Electronic (Online) Resources Management GODOT - Electronic (Online) Resources Management

Open Journal Publishing OJS OJS

Open Conference Systems OCS : Conference workflow automation OCS : Conference workflow automation

Open URL Systems Open URL 1.0 Open URL software/openurl/default.htm software/openurl/default.htm

Open Digital Libraries Greenstone Greenstone DSpace DSpace Eprints Eprints FEDORA etc. FEDORA etc.

Open Access Archives / IRs DSpace DSpace Eprints Eprints FEDORA FEDORA CDSWare CDSWare Greenstone etc. Greenstone etc.

Learning Management Systems (LMS) E-learning Systems E-learning Systems Moodle Moodle Manhattan etc. Manhattan etc.

Content Management Systems (CMS) Joomala Joomala Drupal Drupal MediaWiki MediaWiki

Open Archives Harvester Harvester Harvester

Federated Searching dbWiz : PKP Project dbWiz : PKP Project Google Custom Search Google Custom Search

Social Computing/Software Blogs Blogs Tags Tags Wikis Wikis RSS RSS Feed Aggregation etc. Feed Aggregation etc.

Open Courses Open Courseware Open Courseware

What are digital libraries for?  Knowledge/content management  Manage and access internal information assets  Scholarly communication, education, research  E-journals, e-prints, e-books, data sets, e-learning  Access to cultural collections  Cultural, heritage, historical & special collections, museums, biodiversity  E-governance  Improved access to government policies, plans, procedures, rules and regulations  Archiving and preservation  Many more …

DL Software: Alternatives  What are your expectations?  Develop local web-based application?  Commercial DL solution?  Adopt open source software?  Greenstone  Eprints  DSpace  Fedora…

Digital Library Technologies  Interoperability  Unified interface for heterogeneous libraries  Metadata mapping across different libraries  OAI-compliant data and service providers  Multilingual digital libraries  Scalable digital library architectures  Publication tools  Searching tools

DLs: Workflows and Processes  Content selection  Content acquisition  Content publishing  Metadata preparation  Content loading  Content indexing & storage  Content access & delivery  Preservation  Access management  Usage monitoring and evaluation  Networking and interoperation  Maintenance

DL Software: Key requirements Document types (book, journal article, lecture …) Document formats (text, PDF, Word, PS, …) Content acquisition (online and offline) –Metadata description, content tagging –Content uploading Indexing and retrieval –Structured/ full text indexing –Automatic metadata extraction Storage –Data compression –Efficient storage for metadata –Efficient location of metadata and documents Access and delivery –Structured search, browse, hierarchical browsing –CD-ROM distribution

DL Software: More requirements Scaling up – for large collections Multilingual support Access management and security Usage monitoring and reporting Standards compliance –XML, Dublin Core, Unicode Interoperation –OAI, Z39.50 compliance, MARC21…

Complete DL Systems Greenstone Greenstone DSpace DSpace Eprints Eprints

Greenstone: Open source Software for Building Digital Library Collections

What is the Greenstone software?  Software suite for building, maintaining, and distributing digital library collections  Comprehensive, open-source  Developed by New Zealand Digital Library Project at the University of Waikato  Distribution and promotion partners:  UNESCO  Human Info NGO, Belgium  NCSI, Bangalore; UCT, Cape Town; Dakar, Senegal; Almaty, Kazakhstan; …  You!

Greenstone Features Supports creation and management of collections by administrator(s) Supports creation and management of collections by administrator(s) Web interface for search and retrieval Web interface for search and retrieval Customizable metadata Customizable metadata Supports full text search of content Supports full text search of content Extensive document filters Extensive document filters Word, Excel, PowerPoint, PDF,... Word, Excel, PowerPoint, PDF,... Can extract metadata from documents Can extract metadata from documents Many ways to build a collection, including: Many ways to build a collection, including: Local files Local files Retrieve web sites Retrieve web sites Retrieve objects via OAI-PMH Retrieve objects via OAI-PMH

Greenstone Features… Open Source Philosophy Interfacing & Content Delivery via Web Multi S/W Platform Multi Lingual Support Multi Formats Structured Metadata in XML using DC Metadata Extraction Searching & Browsing Plug-ins for Documents Full-text mirroring Text Level Penetration Data Compression Password protection Administrative Functions Concurrent & Dynamic Content Development Uniform Presentation Publishing on CDROMs International Presence

Greenstone Features contd... Easy Installation Easy Maintenance Content Development (3 alternate ways) Predominantly GLI now - since (V. 2.41) Hierarchy Structure Interface Customization –Front Page Design, Header for the Digital Library, Collection Icon, Cover Images Collection Configuration (Collect.cfg) File Scalability, Flexibility Interoperability (Crosswalk), OAI Compliance Lifeline : Listserv / E-Group / Archives

 Ghostscript  Kea  pdftohtml  rtftohtml  TextCat  wvWare  Xlhtml  XML::Parser Interpreter for Adobe Postscript documents (Postscript plugin) Keyphrase extraction program (to generate metadata) Converter for PDF documents (PDF plugin) Converter for RTF documents (RTF plugin) Detects languages and document encodings Converter for Word documents (Word plugin) Converter for Excel/Powerpoint documents (plugins) Parses XML documents, used to read and write Greenstone’s internal XML document format The power of open source: Greenstone uses …

 MG  GDBM  wget  YAZ  Stemmer  GCC  CVS  Perl  Apache  OAI-PMH Creates compressed full-text indexes and performs searches Database used for metadata etc Downloading pages from the Web when creating collections Client and server implementation of Z39.50 English language stemmer C/C++ compiler Version control system Used for plugins etc Web server used by many Greenstone installations OAI Performance and …

Example Greenstone collections Rapid growth in use International – Many Countries…China, Germany, India, UK, USA, Russia, Malaysia, Singapore... – Almost all countries/Continents Increasing activity on Greenstone mailing list Promotion by UNESCO – “deployment of DL’s for sharing public domain information” Wide variety of DL collections have been developed in several languages –historical, educational, cultural, and research

Greenstone Technology Runs on Windows (back to 3.1), Linux, Mac OS X, Unix Runs on Windows (back to 3.1), Linux, Mac OS X, Unix Written in C++, Perl, and Java Written in C++, Perl, and Java Uses MG/MG++ search engine Uses MG/MG++ search engine Several different Web and Java/Swing user interfaces for various functions Several different Web and Java/Swing user interfaces for various functions Web interface for user access Web interface for user access

Greenstone Demonstration Examples at Examples at

DSpace

DSpace “DSpace is a groundbreaking digital institutional repository that captures, stores, indexes, preserves, and redistributes the intellectual output of a university’s research faculty in digital formats.” “DSpace is a groundbreaking digital institutional repository that captures, stores, indexes, preserves, and redistributes the intellectual output of a university’s research faculty in digital formats.” Developed jointly by MIT Libraries and Hewlett- Packard Developed jointly by MIT Libraries and Hewlett- Packard Licensed under BSD distribution license Licensed under BSD distribution license

DSpace Supports submission of, management of, and access to digital content Supports submission of, management of, and access to digital content Formats: text, images, audio, video Formats: text, images, audio, video Organized based on organizational needs of a large university Organized based on organizational needs of a large university Communities and collections Communities and collections

DSpace Data Model

DSpace Features Digital preservation Digital preservation Persistent IDs, support levels for different file formats Persistent IDs, support levels for different file formats Access control Access control Versioning Versioning Search and retrieval Search and retrieval Based on qualified Dublin Core metadata Based on qualified Dublin Core metadata OAI-PMH data provider OAI-PMH data provider To support metadata harvesters To support metadata harvesters

DSpace Technology OS: Unix or Linux OS: Unix or Linux Written in Java Written in Java PostgreSQL relational database PostgreSQL relational database Provides complete Web user interface, but Java APIs available Provides complete Web user interface, but Java APIs available

DSpace Architecture

DSpace Software / Utilities 1. Java SDK Apache Maven Tomcat Apache Ant PostgreSQL DSpace 1.5x / 2.x

DSpace Demonstration MIT DSpace MIT DSpace dspace.mit.edu dspace.mit.edu dspace.mit.edu

EPrints “Open Source software which creates online archives” “Open Source software which creates online archives” Developed by University of Southampton, UK Developed by University of Southampton, UK Supports self-archiving of e-prints Supports self-archiving of e-prints Can be configured as institutional repository or otherwise, e.g. repository focused on particular research area or discipline Can be configured as institutional repository or otherwise, e.g. repository focused on particular research area or discipline Licensed under GNU General Public License Licensed under GNU General Public License software.eprints.org software.eprints.org software.eprints.org

EPrints Supports submission, management of, and access to digital content Supports submission, management of, and access to digital content Can support multiple archives on one server Can support multiple archives on one server Moderated or unmoderated archives Moderated or unmoderated archives Search and retrieval Search and retrieval Based on metadata Based on metadata Metadata can be customized for different archives and document types Metadata can be customized for different archives and document types No access control No access control OAI-PMH data provider OAI-PMH data provider

EPrints Technology OS: Unix or Linux OS: Unix or Linux Written in Perl Written in Perl Requirements: Requirements: Apache web server Apache web server MySQL relational database MySQL relational database

EPrints Demonstration Digital Library of the Commons Digital Library of the Commons dlc.dlib.indiana.edu dlc.dlib.indiana.edu dlc.dlib.indiana.edu

Open Source XML Tools and Systems Utilities Utilities Xalan, Xerces, libxml, libxslt, saxon Xalan, Xerces, libxml, libxslt, saxon Editors Editors emacs / nxml-mode emacs / nxml-mode Database / Search Engines Database / Search Engines Apache Xindice Apache Xindice Berkeley DB XML Berkeley DB XML eXist eXist Publishing/WebApplication Frameworks Publishing/WebApplication Frameworks AxKit AxKit Cocoon Cocoon

XML Databases & Search Engines Apache Xindice Apache Xindice Apache Xindice Apache Xindice Berkeley DB XML Berkeley DB XML Berkeley DB XML Berkeley DB XML eXist eXist eXist

Greenstone Windows Installation Version 2.81rc

Opening Greenstone on Browser Digital Library Server Greenstone Digital Library

Opening Greenstone on Browser Greenstone Digital Library Collections

GLI

GLI Functions Establish new collection (or work on old) Select files to include in collection (Gather) Enrich files with metadata (Enrich) Select Plugins, Indexes, Classifiers (Design) Build Collection (Create) Format and Control Display (Format) Customize Appearance Preview Collection

Collection Building… Greenstone used to have three modes of collection building, viz., Command Line, Web Interface and the GLI (Greenstone Librarian Interface) Greenstone used to have three modes of collection building, viz., Command Line, Web Interface and the GLI (Greenstone Librarian Interface) Progressing with version 2.4x., the GLI got strengthened as well as popularized Progressing with version 2.4x., the GLI got strengthened as well as popularized Web Interface mode has been withdrawn temporarily. Web Interface mode has been withdrawn temporarily. The GLI based collection building is quite easy and simple a method. The GLI based collection building is quite easy and simple a method. Collection developers can activate the GLI software and use the ‘Gather’, ‘Enrich’, ‘Design’, ‘Format’ and ‘Create’ panel for making collection Collection developers can activate the GLI software and use the ‘Gather’, ‘Enrich’, ‘Design’, ‘Format’ and ‘Create’ panel for making collection

Customization  Greenstone is specifically designed to be highly extensible and customizable.  New document and metadata formats are accommodated by writing "plugins" (in Perl).  Analogously, new metadata browsing structures can be implemented by writing "classifiers."  The user interface look-and-feel can be altered using "macros" written in a simple macro language.  A Corba protocol allows agents (e.g. in Java) to use all the facilities associated with document collections.  Finally, the source code, in C++ and Perl, is available and accessible for modification

Customizing with macros Let you customize presentation Let you customize presentation Present pages in different languages Present pages in different languages Print variables into the page text (e.g. number of search hits) Print variables into the page text (e.g. number of search hits) Macro files Macro files stored in greenstone2/macros folder stored in greenstone2/macros folder each file defines one or more “packages” each file defines one or more “packages” (A “package” is a group of macros) loaded on startup (note difference between Local and Web Library) loaded on startup (note difference between Local and Web Library) listed in etc/main.cfg listed in etc/main.cfg Collection-specific macros Collection-specific macros Stored in greenstone2/collect/mycol/macros/extra.dm Stored in greenstone2/collect/mycol/macros/extra.dm Or include argument [c=collectionname] for each macro Or include argument [c=collectionname] for each macro

Hierarchy Structure

Collection configuration Collection configuration file determines content conversion, extraction and building of indexes and browsing structures –indexes, classifiers, plugins Presentation of search/browse results and collection interface is determined by “format” strings and “macros”

DL - Hardships Copyright Issues Copyright Issues Technology Complexities Technology Complexities Infrastructure Issues Infrastructure Issues Publications/Formats – Diverse Datastreams Publications/Formats – Diverse Datastreams Digital Objects/Formats - Multiple Digital Objects/Formats - Multiple Publishers’ Policies – Stringent, Inconsistent Publishers’ Policies – Stringent, Inconsistent

Major Tasks Content identification (internal / external) Content identification (internal / external) Content Creation Content Creation Content Collation/Signposts Content Collation/Signposts Organisation Organisation Updation Updation Retrieval / Dissemination Retrieval / Dissemination User Training User Training Archiving Archiving

Data/ Objects METS/MODS EAD TEI DCMI OS Z39.50 /OAI-PMH Network DL Software DIGITAL LIBRARY ARCHITECTURE

Acknowledgement Team Greenstone, New Zealand Team Greenstone, New Zealand Greenstone Support South Asia Greenstone Support South Asia IIM Kozhikode, India IIM Kozhikode, India UNESCO UNESCO Indiana University Digital Library Program Indiana University Digital Library Program