1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Open Archives Initiative Primer DC2001 – Tokyo, October 25, 2001 Thomas Krichel Palmer School of Library and Information Science Long Island University.
The DRIVER Infrastructure (Digital Repository Infrastructure Vision for European Research) Paolo Manghi ISTI - National Research Council, Italy.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Building Digital Libraries on Open Archives Donatella Castelli IEI-CNR Italy.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
UCLA Digital Library Online User Services Committee Tech Talk #9 February 27, 2003 Specialized OAI Service Providers: A Sheet Music Harvester Data Provider,
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
NAL-Institutional Repository: A Case Study CSIR Metadata Harvester I.R.N. Goudar Head, ICAST, NAL National Symposium on Open Access and.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Geoff Payne ARROW Project Manager 1 April Genesis Monash University information management perspective Desire to integrate initiatives such as electronic.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
1. 2 introductions Nicholas Fischio Development Manager Kelvin Smith Library of Case Western Reserve University Benjamin Bykowski Tech Lead and Senior.
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
SCIELO AS AN OPEN ARCHIVE: the development of SciELO / OpenArchives data provider interface Prof. Carlos H. Marcondes Federal Fluminense University/ Information.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Research Library, Los Alamos National Laboratory RESEARCH OAI4 - Geneva, Switzerland Digital Library Research & Prototyping Team Multi-Graph.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
SPASE and the VxOs Jim Thieman Todd King Aaron Roberts.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Feb 21-25, 2005ICM 2005 Mumbai1 Converting Existing Corpus to an OAI Compliant Repository J. Tang, K. Maly, and M. Zubair Department of Computer Science.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
Herbert van de sompel CNI FALL 2000 – San Antonio, Texas – December 8th 2000 Closing Keynote Address Herbert Van de Sompel Cornell University Computer.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Introduction: AstroGrid increases scientific research possibilities by enabling access to distributed astronomical data and information resources. AstroGrid.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Georges Arnaout Chaitanya Krishna
OAI and Metadata Harvesting
Digitometric Services for Open Archives Environments
The New Face of Information Retrieval: The Ankara University Open Access Platform Prof. Dr. Sekine Karakaş Prof. Dr. Doğan.
Open Archive Initiative
Institutional Repositories
Presentation transcript:

1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February, 2005

2 Outline Introduction OAI and Federated Digital Library Architecture Prototype Conclusion

3 Introduction - Motivation Digital libraries are playing a key role in managing online information explosion for the end-users by structuring the content so it is discovered easily and effectively A federated digital library provides that harvests metadata from multiple digital libraries and provide a unified interface to these libraries

4 Introduction - Challenge Building a scalable federated digital library that can handle millions of records and work with thousands of digital libraries is a big challenge. One of the existing federated digital library, ARC, running on a single processor takes over four days for each cycle of harvesting over 160 collections about two days for indexing for search resulting in a large sorted result set, we are seeing query execution time of the order of 15 minutes

5 Introduction - Solution The Grid is an emerging technology for infrastructure that enables the integrated, collaborative use of high-end computers, networks, and databases owned by multiple organizations. Since grid nodes by definition have unused capacity, we can use the Grid resources to realize a federated digital library. How? Distribute the cost of harvesting to existing grid nodes, and only leave the cost of maintaining the federated search service to one institution (service provider), thus making it more sustainable.

6 Open Archives Initiative OAI-PMH 2.0

7 Connecting Islands of Digital Libraries Islands of digital libraries need to be interconnected for users to access different information resources from anywhere Need for manipulating, organizing, and correlating information from different repository for better discovery Open Archives Protocol for Metadata Harvesting (OAI-PMH) is an international effort to facilitate bridges across islands of digital libraries. OAI does to digital libraries what Internet did for islands of isolated networks.

8 The goal of the Open Archives Initiative Protocol for Metadata Harvesting is to supply and promote an application-independent interoperability framework. The OAI protocol permits metadata harvesting of a data provider by a service provider. Data Provider supports the OAI protocol as a means of exposing metadata about the content in their systems Service Providers issue OAI protocol requests to the systems of data providers and use the returned metadata as a basis for building value-added services. Background - Open Archives Initiative (OAI) The word “open” in OAI is from the architectural perspective – defining and promoting machine interfaces. Openness does not mean “free” or “unlimited” access to the information repositories that conform to the OAI technical framework. The OAI is an International effort. Major sponsors are: Council on Library and Information Resources (CLIR), the Digital Library Federation (DLF), the Scholarly Publishing & Academic

9 What does it mean making an existing digital library OAI enabled ? Digital Library Storage OAI Layer Exposing metadata to OAI service providers – DC and Parallel metadata sets ONLY METADATA

10 Metadata Harvesting -Move away from distributed searching. - cannot scale well to large number of participants. -Extract metadata from various sources. - Build services on local copies of metadata. - data remains at remote repositories RCDL 2003, St. Petersburg

11 OAI Request and OAI Response. - OAI Request for Metadata is embedded in HTTP. - OAI Response to OAI Request is encoded in XML. - XML Schema specification for OAI Response is provided in OAI-PMH document. RCDL 2003, St. Petersburg

12 ARC - A Cross Archive Search Service (OAI Based Discovery Service)

13 Free software - Arc Arc harvests metadata currently from about 165 OAI compliant archives normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle) over 6 Million metadata records from various subject domains Arc also provides OAI layer, thus making hierarchical harvesting possible

14 History of Arc development

15 Architecture of Arc

16

17

18 Open Source Arc System In April, 2002 we were approached by MetaScholar.org for the software and license issues. The Arc system was released as an open source project hosted at SourceForge in September 2002 Continuous development with contributions from Archon/Kepler project and interested parties/individuals.

19

20 Usage of Arc Open Source System

21 Architecture

22 Initial Architecture

23 Final Architecture

24 DL Grid Test Bed A 16 node cluster with each machine having a 64-bit processor. Fedora Core 2 (64-bit version) was installed on each of those machines. A total of six machines were made as part of the grid – one Scheduler, four Harvesters and one Metadata Collection Service node. Globus Toolkit (GT 3.2) was installed on each of these machines. The remaining ten machines were used for indexing, storage and the search node.

25

26

27 Prototype

28

29

30

31

32 Conclusion It is feasible to use Grid resources to build a scalable federated digital libraries. Current Limitation Managing and administrating grid resources for the purpose of harvesting is tedious and requires a technical expert. Future Build tools for managing Grid for the federated digital library. Develop and study high-performance search methods on clusters connected to grid

33 RepositoryRepository HarvesterHarvester Service ProviderData Provider Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord RCDL 2003, St. Petersburg

34 Repository name Base-URL Admin OAI protocol version Description Container RepositoryRepository HarvesterHarvester Service ProviderData Provider Identify RCDL 2003, St. Petersburg

35 REPEAT Format prefix Format XML schema /REPEAT RepositoryRepository HarvesterHarvester Service ProviderData Provider ListMetadataFormats RCDL 2003, St. Petersburg

36 REPEAT Set Specification Set Name /REPEAT RepositoryRepository HarvesterHarvester Service ProviderData Provider ListSets RCDL 2003, St. Petersburg

37 REPEAT Identifier Datestamp Metadata About Container /REPEAT * from=a * until=b * set=klm ListRecords * metadataPrefix=oai_dc RepositoryRepository HarvesterHarvester Service ProviderData Provider RCDL 2003, St. Petersburg

38 REPEAT Identifier Datestamp /REPEAT RepositoryRepository * from=a * until=b *metadataprefix=oai_dc ListIdentifiers * set=klm HarvesterHarvester Service ProviderData Provider RCDL 2003, St. Petersburg

39 Identifier Datestamp Metadata About RepositoryRepository HarvesterHarvester Service ProviderData Provider * identifier=oai:mlib:123a GetRecord * metadataPrefix=oai_dc RCDL 2003, St. Petersburg

40 OAI Mechanics Request is encoded in http Response is encoded in XML XML Schemas for the responses are defined in the OAI-PMH document Courtesy: Michael Nelson