Presentation is loading. Please wait.

Presentation is loading. Please wait.

Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS 502 2000-03-09.

Similar presentations


Presentation on theme: "Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS 502 2000-03-09."— Presentation transcript:

1 Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS 502 2000-03-09

2 Component-Based Digital Library Architecture Building digital libraries out of a toolkit: –modular services –protocol interface to each service –mechanisms to combine services into digital library collections Advantages: –ability to build digital libraries that conform to specific service and collection needs –ability to add new services that enhance functionality

3 7 Dienst is a protocol and reference implementation of a distributed digital library service where a network of services provide World Wide Web browser access, uniform search over distributed indexes, and access to structured documents.

4 Defining the services Repository – deposit, storage, and access to structured documents. Index – process queries on documents and returned handles Query Mediator – route queries to appropriate indexes Collection – define services and content in logical collections User Interface – human-oriented front- end for services.

5 Why a service based protocol? Expose the operational semantics of the services through an API, to permit flexible integration of the services, and use of the services by other clients/consumers/services.

6 Defining the protocol Structured messages –Service –Version –Verb –Arguments Template /Dienst/ / / [?/] Example /Dienst/Repository/4.0/Formats/ncstrl.cornell/TR94-1418

7 Why a Document Model? “Documents” in current web are both: –Unstructured (GET) –Chaotic (CGI) Different views and pieces of contents are needed for: –Bandwidth reduction –Rights management –Usability

8 Dienst Document Model Views – alternative expression or structural representation of the content encapsulated in the digital object Divs – hierarchically nested structure contained in a view Metadata – support for multiple descriptive formats

9 Expressing the document model in the protocol Structure – expose the views and structure for the digital object Disseminate – select the structural component (and packaging of it) to disseminate List-Meta-Formats – list available descriptive formats

10 Protocol Demonstration http://cs-tr.cs.cornell.edu/Dienst/Repository/4.0/List- Contents?file-after=2000-01-01http://cs-tr.cs.cornell.edu/Dienst/Repository/4.0/List- Contents?file-after=2000-01-01 http://cs- tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR2000-1780/%23oams/xmlhttp://cs- tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR2000-1780/%23oams/xml http://cs- tr.cs.cornell.edu/Dienst/Repository/2.0/Structure/ncstrl.cornell/T R2000-1780http://cs- tr.cs.cornell.edu/Dienst/Repository/2.0/Structure/ncstrl.cornell/T R2000-1780 http://cs- tr.cs.cornell.edu/Dienst/Repository/4.0/Formats/ncstrl.cornell/TR 2000-1780?part=bodyhttp://cs- tr.cs.cornell.edu/Dienst/Repository/4.0/Formats/ncstrl.cornell/TR 2000-1780?part=body http://cs- tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR2000-1780/body/inline?pageimage=3http://cs- tr.cs.cornell.edu/Dienst/Repository/1.0/Disseminate/ncstrl.cornell/ TR2000-1780/body/inline?pageimage=3

11 11 Collection Service Periodically polled by each user interface server for –elements of the collection –index servers for the collection User Interface Servers Index Servers

12 Deploying Collection Globally Internet connectivity varies considerably Good connectivity between nodes often does not correspond to geographic proximity Connectivity Region - a group of nodes on the network that among them have good connectivity, relative to nodes outside of the region.

13 Connectivity Regions When possible route queries within region In case of failure, use an alternate either within the region or in a “nearby” region

14 Dienst Services WWW browser User Interface Repository Index Repository QM user query generic search request specific search request NS user document request URI document request Collection Collection metadata

15 NCSTRL Digital Library Demonstration http://cs-tr.cs.cornell.edu

16 Distributed Searching - Motivation Information infrastructure with distributed overlapping indexes –scalability –connectivity –domain-specificity –intellectual property issues Research Issue: Search Request Routing

17 17 Broadcast Distributed Search

18 18 Backup Index server replicates all index servers used by user interface when primary is down backup index

19 19 Regional Structure central collection server regional collection server regional merged index server

20 Routing Problem Disjoint Indexes Hopcroft I1, I3 Hartmanis I3 Tarjan I1, I2 Wilensky I2 I1I2I3 I1,I3 doc8 doc1, doc2 Content Summary author=Hopcroft? Hopcroft doc8 Tarjan doc9 Tarjan doc6 Wilensky doc7 Hopcroft doc1, doc2 Hartmanis doc3, doc4

21 Routing Problem Replicated Distributed Indexes author=Hopcroft? Hopcroft doc8 Tarjan doc9 Tarjan doc6 Wilensky doc7 Hopcroft doc8 Tarjan doc9 Tarjan doc6 Wilensky doc7

22 Routing Issues Choice of primary?, secondary?, etc. Fault-tolerance Routing Factors –Performance-based (Cornell and Virginia) –Freshness-based (Stanford) –Cost-based (FORTH) –weighted mix based on user preference

23 Components of Replicated Routing Problem Metadata Issue: metadata made available by indexer to aid in routing Metadata Distribution Issue: topology of metadata repositories Decision Issue: routing decision algorithms Fault-tolerance: use of backup indexers

24 Distributed Metadata for Query Routing central metadata store

25 Performance-based Routing 8 present- T Average response time Timed low pass filter Predicted response time New = low pass filter(T, actual response time, old )


Download ppt "Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS 502 2000-03-09."

Similar presentations


Ads by Google