June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University
June 3-6, 2003E-Society Lisbon Overview Introduction Background Architecture & Design Experimentation & Implementation Conclusion & Future Works
June 3-6, 2003E-Society Lisbon Introduction Many approaches for DL Interoperation Harvesting and distributed search Earlier work on LFDL – Lightweight Federated Digital Library Universal search interface DL specification in DLDL DL registration Query mapping Limitations Organizing result set and performance Enhanced LFDL Interactive user-centered search
June 3-6, 2003E-Society Lisbon Background Levels of interoperability Technical: protocol, format Contents: data, metadata, messages Organizational: rules for access, payment, authentication General models Federation complete, but requires more from data providers Harvesting some efforts from both data and service providers Gathering Little from data providers
June 3-6, 2003E-Society Lisbon LFDL Introduction General principle Aim at non-cooperating digital libraries Distributed search Lightweight: both to data and service providers Basic solution DL specification definition language Dynamic DL metadata registration Universal interface Dynamic Query mapping Local repository
June 3-6, 2003E-Society Lisbon Limitations and Issues Limited service usability Search results presented in flat structure Need metadata to present rich search results Performance Caching is neither flexible nor efficient Need local metadata repository to generate intelligent cache Solution Retrieve metadata from remote digital libraries
June 3-6, 2003E-Society Lisbon Metadata Retrieval - Approach Available metadata sources List page of search results Detail page of a selected document/record Approach Define specification on how metadata are presented in those pages Use Dublin Core as common metadata mapping set Develop metadata parser to extract metadata Store parsed metadata in local repository
June 3-6, 2003E-Society Lisbon Architecture
June 3-6, 2003E-Society Lisbon
June 3-6, 2003E-Society Lisbon Metadata Retrieval Workflow Define metadata parsing rules in DL specification in DLDL Start parsing when search results arrive from remote DL Parse list page If metadata available at record level, parse record page for each document of results list Metadata are merged and presented to users Metadata are saved to a local repository
June 3-6, 2003E-Society Lisbon Metadata Parsing Rules Definition Extended DLDL Two levels: list page and record page String parsing: separate raw string to segments corresponding to metadata fields
June 3-6, 2003E-Society Lisbon Part of DTD for DL parsing rules specification
June 3-6, 2003E-Society Lisbon Sample Specification for CogPrints null name="DC.title" " name="DC.creator" /><meta content=" " name="DC.creator" ; CREATOR
June 3-6, 2003E-Society Lisbon Local Metadata Repository All searches are served locally first A secondary in memory metadata cache for better performance and system reliability Cache grouped by metadata instead of query string
June 3-6, 2003E-Society Lisbon Results
June 3-6, 2003E-Society Lisbon
June 3-6, 2003E-Society Lisbon Populate metadata repository more efficiently Richer functions, more user-friendly in presenting results Cache maintenance: size, consistency… Conclusion and Future Works