Presentation is loading. Please wait.

Presentation is loading. Please wait.

MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries Marcos André Gonçalves Robert K. France Edward A. Fox Tamas E. Doszkocs.

Similar presentations


Presentation on theme: "MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries Marcos André Gonçalves Robert K. France Edward A. Fox Tamas E. Doszkocs."— Presentation transcript:

1 MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries Marcos André Gonçalves Robert K. France Edward A. Fox Tamas E. Doszkocs Work performed at Virginia Tech, Blacksburg, VA USA Support provided in part by NSF & National Library of Medicine.

2 JCDL 2001 First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg) http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U. …

3 Outline NDLTD Harvesting Strategies and the OAI MARIAN Middleware Generating Digital Libraries with 5SL Future Directions

4 NDLTD (1 of 3) Context: Networked Digital Library of Theses and Dissertations, www.ndltd.org, www.theses.orgwww.theses.org Please join! Submit your (student’s) works! International federation of universities, libraries, supporting institutions (e.g., VTLS union catalog) Extremely heterogeneous Autonomy of management and decentralization Disparate protocols, metadata, repositories (e.g., UMI, OCLC’s WorldCat), language, encodings, user characteristics and preferences

5 NDLTD (2 of 3) Worldwide organization: educational/social context National/regional projects in Australia, Catalunya, Germany, India, Latin America (UNESCO/OAS/ISTEC), South Africa (Mellon), USA (including OhioLINK), … International conference (225 in March 2000, more expected for next, at Caltech) Steering committee representing supporting groups as well as the hundreds of universities

6 NDLTD (3 of 3) Unique collection – discipline/document context Multilingual and multimedia content Large book-size documents Full-content in several formats (XML, PDF, etc.) Large number of bibliographic references Several sets of metadata with different ranges of quality, that can fit with the Open Archives Initiative (www.openarchives.org)

7 Harvesting Strategies Harvesting vs. Federated Search Harvesting plus Federated Search Plus local collections The NDLTD Union Collection Multiple Harvesting Protocols Harvest™ System Z39.50 Dienst OAI

8 Union Collection Architecture

9 Open Archives Initiative (OAI) Interoperability Standards: Released - Jan/Feb Data + Service Providers Metadata Harvesting Protocol Unique identifiers (URNs) for each record Date-stamp for each record when last modified/created/deleted HTTP server with scripting capabilities 6 Service requests (verbs) Identify, ListMetaFormats, ListSets ListIdentifiers, GetRecord, ListRecords

10 low-barrier interop umbrella herbert van de sompel metadata OPACimageFTXTA&Ie-print

11 OAI harvesting tools herbert van de sompel service provider harvester data provider repository Datestamp Identifier Set Records repositoryrepository

12 OAI harvesting tools herbert van de sompel service provider harvester data provider repository Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository

13 Design Features Combined Harvesting, Federated Search, and Local Collections Object-Oriented Information Graph Representation 5S Model and 5SL Specification Language

14 MARIAN Middleware Flexible Representation Model Information Graph Class Hierarchies Weights and Weighted Sets (w. lazy eval) Class-Based Search Unified Searcher API Combining Heterogeneous Information Structural Matching Synthetic Superclasses

15 Information Graph Model (1/2) Each Information Object is a Node. Structure: exposed through Links Features of interest can become Nodes or can remain Hidden within Node Class Search Methods.

16 Information Graph Model (2/2)

17 Class-Based Search Common Search Methods Text Link / Weighted Link Node in Context Common Searcher Operations Match Best (weighted maximum) Match Most (summative union)

18 Class-Based Search public interface ClassManager { public WtdObjSet match(InfoDesc description); public boolean isInClass(FullID id); public Object idToObject(FullID id); public Vector idsToObjects(Vector ids); }

19 Class-Based Search

20 Combining Sources of Information Structural Matching Extends Weighted Retrieval to include “Best Match to Document Structure” Recursive, Extensible Collection Views Simple Interface to Complex Collections Common Interface to Diverse Collections Weighted Interface to Collections of Varying Quality

21 Dc.creatorHasDcCreator HasCrawlerAuthor Headings Dc.Subject Keywords HasDcSubject HasHeadings HasKeywords dc.title crawlerTitle PhysDis-ETD (SOIF) dc.description crawlerDescription body Individual HasAuthor HasSubject title ThesisDissertation description SubClasses 0.8 1.00.91.0 0.8 SubClasses 1.0 0.8 0.9 Subject Individual Dc.creatorHasDcCreator HasCrawlerAuthor Headings Dc.Subject Keywords HasDcSubject HasHeadings HasKeywords dc.title crawlerTitle PhysDis-ETD (SOIF) dc.description crawlerDescription body Individual HasAuthor HasSubject title ThesisDissertation description SubClasses 0.8 1.00.91.0 0.8 SubClasses 1.0 0.8 0.9 Subject Individual NDLTD Collection View (part)

22 5S Model for Digital Libraries (1/2) Formal Model Streams Structures Spaces Services Societies

23 5S Model for Digital Libraries (2/2) Formal Model Streams Structures Spaces Services Societies NDLTD / MARIAN Example Document (presentable, indexable information object) Weighted Set (e.g., of results to a match operation) Collection Graph; Inheritance Lattice; Measure Space Adaptive Search; Query History Maintenance Library End-Users; DL Builders

24 5SL Generates Digital Library (Components)

25 Generating Digital Libraries: XML

26 Interoperability with 5S and 5SL Reductionist / Constructivist Approach Compositional mappings between DLs Composition of S-based constructs Mapping language

27 Student Projects to Integrate Schedule-driven Harvester SDI / Filtering for NDLTD MARIAN-Phronesis (Spanish – Monterrey); and work with German (Oldenburg / DFG), Portuguese, Chinese, Japanese, Korean TREC data formatted for loading

28 Future Work Fusion on hybrid architecture Incorporation of belief networks Using 5SL to generate wrappers New services/ functionalities Personalization (e.g., history, folders) Visualization (e.g., Envision applet) Integration with PetaPlex (100 nodes, 2.5 Tbytes disk capacity, > 300 Mbps to campus backbone, Sornil inversion)

29 Conclusions NDLTD provides a real, fertile, DL testbed. Harvesting strategies and the OAI MARIAN middleware: graphs, classes, views Generating Digital Libraries with 5SL Future: high performance services, experimental comparisons


Download ppt "MARIAN: Searching and Querying Across Heterogeneous Federated Digital Libraries Marcos André Gonçalves Robert K. France Edward A. Fox Tamas E. Doszkocs."

Similar presentations


Ads by Google