Introduction to Digital Libraries hussein suleman uct cs honours 2004.

Slides:



Advertisements
Similar presentations
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Advertisements

Digital Library Architecture: A Service-Based Approach
Lawrence Webley, Hussein Suleman, Tatenda Chipeperekwa University of Cape Town Department of Computer.
Towards a repository – independent implementation of Digital Object Prototypes K. Saidis 1, G. Pyrounakis 2 1 Department of Informatics And Telecommunications.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
Flexible and Extensible Digital Object and Repository Architecture (FEDORA) Sandra Payette Cornell University CS 502 Computing Methods.
ISP 433/533 Week 8 IR in libraries. Goal Universal Access to Information Vannevar Bush 1945 article Memex A memex is a device in which an individual stores.
Fun with Geospatial Metadata, CUGIR, CORC, MARC, and OAI: The CSDGM to MARC Grant Project Adam Chandler, Olin Library Elaine Westbrooks, Mann Library Vivek.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) FDL Examples.
Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS
Policy-Carrying, Policy-Enforcing Digital Objects Sandra Payette Project Prism - Cornell University DLI2 All-Projects Meeting June 14, 2000.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Digital Library in a Box Ming Luo, Hussein Suleman, Edward Fox Virginia Tech Subcontract to Collaborative Project led by University of Florida (also with.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Identifiers and Repositories hussein suleman uct cs honours 2008.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
Open Virginia Tech DLRL Hussein Suleman
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Digital Library Component Models hussein suleman uct cs honours 2005.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
DSpace - Digital Library Software
ETD Search Services Ming Luo Edward A. Fox Virginia Tech.
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
ETDs and NDLTD Hussein Suleman University of Cape Town May 2004.
Foundations of, and Experiences with, Componentized Digital Libraries OCKHAM Panel ECDL Rome, Italy Edward A. Fox Digital Library Research.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
Open Digital Libraries Edward A. Fox Virginia Tech, Dept. of Computer Science.
Not to Wait is the Answer: Institutional Repositories from the Bottom-up Hussein Suleman University of Cape Town July 2004.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
Identifiers and Repositories hussein suleman uct cs honours 2006.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Toward an Open Architectural Framework for Digital Objects M. Cristina Pattuelli INLS March 19, 2001.
NDLTD Toward Universal Accessibility of ETDs: Building the NDLTD Union Archive Hussein Suleman, Edward A. Fox,
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
An Architecture for Complex Objects and their Relationships
NASA Technical Report Server (NTRS) Project Overview April 2, 2003
SCALABLE OPEN ACCESS Hussein Suleman
Implementing an Institutional Repository: Part II
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Introduction to Digital Libraries hussein suleman uct cs honours 2004

Open Digital Libraries: a Component Model

Program Document Document Document Program Program Image Image Image Video Video Video usersdigital objects ? Introduction

? Program Document Document Document Program Program Image Image Image Video Video Video ? digital library Monolithic and/or Custom-built web-based application Introduction …

Problems  Digital Libraries are difficult to build – lots of standards and evolving architectures e.g., Dienst, EPrints  Interoperability is (was) hard e.g., NCSTRL, Z39.50  Software development is time-consuming e.g., CSTC, WCR, EPrints

More Problems  Poor software engineering Tight coupling Too much complexity Inadequate testing methods  Lessons from Internet development ignored Simplicity Independence Layering etc.

Program Document Document Document Program Program Image Image Image Video Video Video componentised digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Solution ?

Some Component Architectures Dienst RAP (KW) ODL (Open Digital Library) OpenDLib OAI-PMH Convergence? Web Services?

Open Digital Library (ODL)  Digital Libraries can be modeled as networks of extended Open Archives, where each extended Open Archive is a source of data and/or a provider of services.  Each component is independent and has well-defined external interfaces that are Web-based, e.g., OAI-PMH.

Open DL Design  Each component is encapsulated in an extended Open Archive.  Communication with other components and user interfaces use specialised versions of the extended OAI-PMH (XOAI- PMH).  Digital Libraries are constructed as networks of extended Open Archives.

Program Document Document Document Program Program Image Image Image Video Video Video open digital library OA PMH XPMH Problem Revisited

Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting Protocol Layers

Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE Component Layers

Program Document Document ETD Program ETD Image Image ETD Video Video ETD-4 ETD Digital Library Search Filter Union Recent Browse PMH ODL-Recent ODL-Browse ODL-Union ODL-Search ODL-Union PMH USER INTERFACE Students and researchers ETD collections Example Open Digital Library

Protocols and Components ProtocolComponentDescription ODL-UnionDBUnionMerge archives together ODL-SearchIRDBSearch engine ODL-BrowseDBBrowseCategory-based browser ODL-RecentWhatsNewTracker for recent entries ODL-SubmitBoxArchive supporting submit and retrieve operations ODL-AnnotateThreadThreaded annotation engine ODL-RecommendSuggestRecommendation system ODL-RateDBRateRatings manager ODL-ReviewDBReviewPeer review workflow manager

Example: IRDB Search Engine  Encapsulate search capability in an OA  OAI-PMH to gather data for indexing  ODL-Search to submit queries and get results IRDB Search Engine Component OAI-PMH ODL-Search

Example: ODL-Search Protocol  Parameters query - list of searchable keywords query language – “odlsearch1” start/stop - subset of ranked list  Encoding verb=ListIdentifiers&set=odlsearch1/query/start/stop… verb=ListRecords&set=odlsearch1/query/start/stop…  Results Standard OAI response - list of identifiers or records  Example verb=ListRecords&set=odlsearch1/computer science/1/10…

Case Study: ETD Union Catalog

ETD Union Catalog - Front

ETD Union Catalog - Search

ETD Union Catalog - Browse

The Ultimate Goal  Package different configurations of components into instant DL systems  DL building = component configuration  All DLs speak the same language(s)  Basic services are trivial to provide so more effort is spent on advanced capabilities of DLs  Information is more accessible to users

Repository+Component Models

Repository Access Protocol (RAP)  A repository can be defined as a network- accessible server.  RAP specifies a simple interface to access and manage digital objects in a repository.  RAP is an abstract model, with concrete implementations in the Dienst, OpenDLib, OAI and ODL projects.  This is usually referred to as the “Kahn/Wilensky architecture”. does Kahn ring any bells?

RAP Operations  ACCESS_DO Return a manifestation (dissemination) of a digital object based on its identifier and a specification of what service is being requested.  DEPOSIT_DO Submit a digital object to the repository, assigning or specifying an identifier for it.  ACCESS_REF List services and their access mechanisms for the repository.

RAP: Naming of Digital Objects  Each digital object must have a location- independent name (handle), made up of a repository identifier and a local name. Example:  berkeley.cs/csd  where berkeley.cs is the repository and csd refers to a technical report.  Handles are resolved by a handle server to redirect a service provider to a repository containing an object identified only by its location-independent handle.

Handle Servers  A handle server stores the association between handles and physical locations of objects.  Handle servers follow a DNS model: they are distributed and replicated there are global and local servers handles may be cached locally after being resolved to minimise resolution traffic management of servers/handles requires an authority system for management, accountability, delegation, etc.

Handle Example

Digital Object Identifiers (DOIs)  DOIs are a standardised implementation of the handle concept.  Handles/DOIs are URIs that refer to digital objects while URLs are URIs that refer to network services.  Handle/DOI resolution can be performed transparently using a browser plug-in.

Dienst  Dienst (German for “service”) is a suite of protocols and components to build distributed digital libraries.  Dienst is the software suite that supported document management at each of the older NCSTRL (Networked Computer Science Technical Reference Library) sites, and transparently linked them into an international federation of sites.  Dienst uses federation for interoperability, with a “backup server” for robustness.

Dienst Service Architecture Example from Dienst website at

Dienst Example  Example Request: List the handles in the high energy (hep) partition within the physics partition. /Dienst/Repository/4.0/List-Contents?partitionspec=physics;hep  Example Response: handlecorp/ handlecorp/ from Dienst website at

Dienst  OAI-PMH  Dienst formed the foundation for the current OAI-PMH – hence the terminology is sometimes similar.  NCSTRL has moved to a model based on harvesting and OAI-PMH is being used to connect sites together. In 2001, data from the existing NCSTRL sites was harvested and archived (for preservation) using an early version of an ODL component! see

Dienst  OpenDLib  OpenDLib is a component model similar to ODL, but based on Dienst rather than OAI-PMH.  OpenDLib attempts to define services (mediators) and repositories based on Dienst and updated best practices in DLs.  OpenDLib uses a well-defined document model for structured content: the Document Model for Digital Libraries (DoMDL).

Other repository/component models  FEDORA (Flexible Extensible Digital Object and Repository Architecture) defines a generic interface to manage digital objects at a lower layer in an information system. see  SODA (Smart Objects Dumb Archive) packages digital objects into buckets containing the data along with the code to mediate access, display the objects, enforce rights, etc.

References  Suleman, H. and E. A. Fox (2001) “A Framework for Building Open Digital Libraries”, in D-Lib Magazine, Vol 7., No. 12, December Available  Kahn, Robert and Robert Wilensky (1995) “A Framework for Distributed Digital Object Services”, CNRI. Available  Lagoze, Carl and James Davis (1995) “Dienst: an architecture for distributed document libraries”, Communications of the ACM, ACM, Vol. 38, No. 4, p. 47.  Castelli, Donatella and Pasquale Pagano (2002) “OpenDLib: A Digital Library Service System”, in Proceedings of Research and Advanced Technology for Digital Libraries: 6th European Conference (ECDL 2002), Rome, Italy, September 2002, Lecture Notes in Computer Science 2458, p Maristella Agosti, Costantino Thanos (eds.). Springer,  Maly, Kurt, Michael L. Nelson and Mohammed Zubair (1999) “Smart Objects, Dumb Archives: A User-Centric, Layered Digital Library Framework”, in D-Lib Magazine, Vol. 5, No. 3, March Available