Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS 502 2000-03-09.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Distributed Data Processing
Digital Library Architecture: A Service-Based Approach
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Objectives In this session, you will learn to:
Advisory Board Meeting  Portland, Oregon  08 November 2000 System Architecture David Maier
Technical Architectures
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
Chapter 9: Moving to Design
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Chapter 9 Elements of Systems Design
The Design Discipline.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Research Interests Georgia Koloniari Computer Science Department University of Ioannina, Greece.
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
PSI Peer Search Infrastructure. Introduction What are P2P Networks? The term "peer-to-peer" refers to a class of systems and applications that employ.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Research of P2P Architecture based on Cloud Computing Speaker : 吳靖緯 MA0G0101.
Object storage and object interoperability
Jens Hartmann York Sure Raphael Volz Rudi Studer The OntoWeb Portal.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Identifiers and Repositories hussein suleman uct cs honours 2006.
Document Management with Office SharePoint Server 2007 Jason Morrill Program Manager Windows SharePoint Services.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
9 Systems Analysis and Design in a Changing World, Fifth Edition.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Building a Data Warehouse
Sabri Kızanlık Ural Emekçi
WEB SERVICES.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
OAI and Metadata Harvesting
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
NSDL Data Repository (NDR)
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
A Semantic Peer-to-Peer Overlay for Web Services Discovery
Presentation transcript:

Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS

Component-Based Digital Library Architecture Building digital libraries out of a toolkit: –modular services –protocol interface to each service –mechanisms to combine services into digital library collections Advantages: –ability to build digital libraries that conform to specific service and collection needs –ability to add new services that enhance functionality

7 Dienst is a protocol and reference implementation of a distributed digital library service where a network of services provide World Wide Web browser access, uniform search over distributed indexes, and access to structured documents.

Defining the services Repository – deposit, storage, and access to structured documents. Index – process queries on documents and returned handles Query Mediator – route queries to appropriate indexes Collection – define services and content in logical collections User Interface – human-oriented front- end for services.

Why a service based protocol? Expose the operational semantics of the services through an API, to permit flexible integration of the services, and use of the services by other clients/consumers/services.

Defining the protocol Structured messages –Service –Version –Verb –Arguments Template /Dienst/ / / [?/] Example /Dienst/Repository/4.0/Formats/ncstrl.cornell/TR

Why a Document Model? “Documents” in current web are both: –Unstructured (GET) –Chaotic (CGI) Different views and pieces of contents are needed for: –Bandwidth reduction –Rights management –Usability

Dienst Document Model Views – alternative expression or structural representation of the content encapsulated in the digital object Divs – hierarchically nested structure contained in a view Metadata – support for multiple descriptive formats

Expressing the document model in the protocol Structure – expose the views and structure for the digital object Disseminate – select the structural component (and packaging of it) to disseminate List-Meta-Formats – list available descriptive formats

Protocol Demonstration Contents?file-after= http://cs-tr.cs.cornell.edu/Dienst/Repository/4.0/List- Contents?file-after= tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR /%23oams/xmlhttp://cs- tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR /%23oams/xml tr.cs.cornell.edu/Dienst/Repository/2.0/Structure/ncstrl.cornell/T R http://cs- tr.cs.cornell.edu/Dienst/Repository/2.0/Structure/ncstrl.cornell/T R tr.cs.cornell.edu/Dienst/Repository/4.0/Formats/ncstrl.cornell/TR ?part=bodyhttp://cs- tr.cs.cornell.edu/Dienst/Repository/4.0/Formats/ncstrl.cornell/TR ?part=body tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR /body/inline?pageimage=3http://cs- tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR /body/inline?pageimage=3

11 Collection Service Periodically polled by each user interface server for –elements of the collection –index servers for the collection User Interface Servers Index Servers

Deploying Collection Globally Internet connectivity varies considerably Good connectivity between nodes often does not correspond to geographic proximity Connectivity Region - a group of nodes on the network that among them have good connectivity, relative to nodes outside of the region.

Connectivity Regions When possible route queries within region In case of failure, use an alternate either within the region or in a “nearby” region

Dienst Services WWW browser User Interface Repository Index Repository QM user query generic search request specific search request NS user document request URI document request Collection Collection metadata

NCSTRL Digital Library Demonstration

Distributed Searching - Motivation Information infrastructure with distributed overlapping indexes –scalability –connectivity –domain-specificity –intellectual property issues Research Issue: Search Request Routing

17 Broadcast Distributed Search

18 Backup Index server replicates all index servers used by user interface when primary is down backup index

19 Regional Structure central collection server regional collection server regional merged index server

Routing Problem Disjoint Indexes Hopcroft I1, I3 Hartmanis I3 Tarjan I1, I2 Wilensky I2 I1I2I3 I1,I3 doc8 doc1, doc2 Content Summary author=Hopcroft? Hopcroft doc8 Tarjan doc9 Tarjan doc6 Wilensky doc7 Hopcroft doc1, doc2 Hartmanis doc3, doc4

Routing Problem Replicated Distributed Indexes author=Hopcroft? Hopcroft doc8 Tarjan doc9 Tarjan doc6 Wilensky doc7 Hopcroft doc8 Tarjan doc9 Tarjan doc6 Wilensky doc7

Routing Issues Choice of primary?, secondary?, etc. Fault-tolerance Routing Factors –Performance-based (Cornell and Virginia) –Freshness-based (Stanford) –Cost-based (FORTH) –weighted mix based on user preference

Components of Replicated Routing Problem Metadata Issue: metadata made available by indexer to aid in routing Metadata Distribution Issue: topology of metadata repositories Decision Issue: routing decision algorithms Fault-tolerance: use of backup indexers

Distributed Metadata for Query Routing central metadata store

Performance-based Routing 8 present- T Average response time Timed low pass filter Predicted response time New = low pass filter(T, actual response time, old )