Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,

Similar presentations


Presentation on theme: "1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,"— Presentation transcript:

1 1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February, 2005

2 2 Outline Introduction OAI and Federated Digital Library Architecture Prototype Conclusion

3 3 Introduction - Motivation Digital libraries are playing a key role in managing online information explosion for the end-users by structuring the content so it is discovered easily and effectively A federated digital library provides that harvests metadata from multiple digital libraries and provide a unified interface to these libraries

4 4 Introduction - Challenge Building a scalable federated digital library that can handle millions of records and work with thousands of digital libraries is a big challenge. One of the existing federated digital library, ARC, running on a single processor takes over four days for each cycle of harvesting over 160 collections about two days for indexing for search resulting in a large sorted result set, we are seeing query execution time of the order of 15 minutes

5 5 Introduction - Solution The Grid is an emerging technology for infrastructure that enables the integrated, collaborative use of high-end computers, networks, and databases owned by multiple organizations. Since grid nodes by definition have unused capacity, we can use the Grid resources to realize a federated digital library. How? Distribute the cost of harvesting to existing grid nodes, and only leave the cost of maintaining the federated search service to one institution (service provider), thus making it more sustainable.

6 6 Open Archives Initiative OAI-PMH 2.0 http://www.openarchives.org

7 7 Connecting Islands of Digital Libraries Islands of digital libraries need to be interconnected for users to access different information resources from anywhere Need for manipulating, organizing, and correlating information from different repository for better discovery Open Archives Protocol for Metadata Harvesting (OAI-PMH) is an international effort to facilitate bridges across islands of digital libraries. OAI does to digital libraries what Internet did for islands of isolated networks.

8 8 The goal of the Open Archives Initiative Protocol for Metadata Harvesting is to supply and promote an application-independent interoperability framework. The OAI protocol permits metadata harvesting of a data provider by a service provider. Data Provider supports the OAI protocol as a means of exposing metadata about the content in their systems Service Providers issue OAI protocol requests to the systems of data providers and use the returned metadata as a basis for building value-added services. Background - Open Archives Initiative (OAI) http://www.openarchives.org The word “open” in OAI is from the architectural perspective – defining and promoting machine interfaces. Openness does not mean “free” or “unlimited” access to the information repositories that conform to the OAI technical framework. The OAI is an International effort. Major sponsors are: Council on Library and Information Resources (CLIR), the Digital Library Federation (DLF), the Scholarly Publishing & Academic

9 9 What does it mean making an existing digital library OAI enabled ? Digital Library Storage OAI Layer Exposing metadata to OAI service providers – DC and Parallel metadata sets ONLY METADATA

10 10 Metadata Harvesting -Move away from distributed searching. - cannot scale well to large number of participants. -Extract metadata from various sources. - Build services on local copies of metadata. - data remains at remote repositories RCDL 2003, St. Petersburg

11 11 OAI Request and OAI Response. - OAI Request for Metadata is embedded in HTTP. - OAI Response to OAI Request is encoded in XML. - XML Schema specification for OAI Response is provided in OAI-PMH document. RCDL 2003, St. Petersburg

12 12 ARC - A Cross Archive Search Service (OAI Based Discovery Service) http://arc.cs.odu.edu/

13 13 Free software - Arc Arc harvests metadata currently from about 165 OAI compliant archives normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle) over 6 Million metadata records from various subject domains Arc also provides OAI layer, thus making hierarchical harvesting possible

14 14 History of Arc development

15 15 Architecture of Arc

16 16

17 17

18 18 Open Source Arc System In April, 2002 we were approached by MetaScholar.org for the software and license issues. The Arc system was released as an open source project hosted at SourceForge in September 2002 Continuous development with contributions from Archon/Kepler project and interested parties/individuals.

19 19

20 20 Usage of Arc Open Source System http://www.rdn.ac.uk/resourcefinder/ http://www.metaarchive.org http://arc.cs.odu.edu http://archon.cs.odu.edu http://www.ncstrl.org http://www.snelonline.net/snel/index.jsp

21 21 Architecture

22 22 Initial Architecture

23 23 Final Architecture

24 24 DL Grid Test Bed A 16 node cluster with each machine having a 64-bit processor. Fedora Core 2 (64-bit version) was installed on each of those machines. A total of six machines were made as part of the grid – one Scheduler, four Harvesters and one Metadata Collection Service node. Globus Toolkit (GT 3.2) was installed on each of these machines. The remaining ten machines were used for indexing, storage and the search node.

25 25

26 26

27 27 Prototype

28 28

29 29

30 30

31 31

32 32 Conclusion It is feasible to use Grid resources to build a scalable federated digital libraries. Current Limitation Managing and administrating grid resources for the purpose of harvesting is tedious and requires a technical expert. Future Build tools for managing Grid for the federated digital library. Develop and study high-performance search methods on clusters connected to grid

33 33 RepositoryRepository HarvesterHarvester Service ProviderData Provider Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord RCDL 2003, St. Petersburg

34 34 Repository name Base-URL Admin e-mail OAI protocol version Description Container RepositoryRepository HarvesterHarvester Service ProviderData Provider Identify RCDL 2003, St. Petersburg

35 35 REPEAT Format prefix Format XML schema /REPEAT RepositoryRepository HarvesterHarvester Service ProviderData Provider ListMetadataFormats RCDL 2003, St. Petersburg

36 36 REPEAT Set Specification Set Name /REPEAT RepositoryRepository HarvesterHarvester Service ProviderData Provider ListSets RCDL 2003, St. Petersburg

37 37 REPEAT Identifier Datestamp Metadata About Container /REPEAT * from=a * until=b * set=klm ListRecords * metadataPrefix=oai_dc RepositoryRepository HarvesterHarvester Service ProviderData Provider RCDL 2003, St. Petersburg

38 38 REPEAT Identifier Datestamp /REPEAT RepositoryRepository * from=a * until=b *metadataprefix=oai_dc ListIdentifiers * set=klm HarvesterHarvester Service ProviderData Provider RCDL 2003, St. Petersburg

39 39 Identifier Datestamp Metadata About RepositoryRepository HarvesterHarvester Service ProviderData Provider * identifier=oai:mlib:123a GetRecord * metadataPrefix=oai_dc RCDL 2003, St. Petersburg

40 40 OAI Mechanics Request is encoded in http Response is encoded in XML XML Schemas for the responses are defined in the OAI-PMH document Courtesy: Michael Nelson


Download ppt "1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,"

Similar presentations


Ads by Google