Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.

Slides:



Advertisements
Similar presentations
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Advertisements

OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
11 3 / 12 CHAPTER Databases MIS105 Lec14 Irfan Ahmed Ilyas.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Automatic Data Ramon Lawrence University of Manitoba
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Databases & Data Warehouses Chapter 3 Database Processing.
Cluj Napoca, 28 August IEEE International Conference on Intelligent Computer Communication and Processing Digital Libraries Workshop Towards.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
Nutch Search Engine Tool. Nutch overview A full-fledged web search engine Functionalities of Nutch  Internet and Intranet crawling  Parsing different.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Malaysian Grid for Learning October DC 2004, Shanghai, China. © 2004 MIMOS Berhad. All Rights Reserved Metadata Management System DC2004: International.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
LIS 506 (Fall 2006) LIS 506 Information Technology Week 11: Digital Libraries & Institutional Repositories.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
The Web-DL Environment for Building Digital Libraries from the Web P. Calado 1, M. Gonçalves 2, E. Fox 2, B. Ribeiro-Neto 1, A. Laender 1, A. da Silva.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
EPA Enterprise Data Architecture Metadata Framework Assessment Kevin J. Kirby, Enterprise Data Architect EPA Enterprise Architecture Team
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
1 The NSDL Program Stephen Griffin National Science Foundation.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Object storage and object interoperability
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Feb 21-25, 2005ICM 2005 Mumbai1 Converting Existing Corpus to an OAI Compliant Repository J. Tang, K. Maly, and M. Zubair Department of Computer Science.
Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
- How to draw a clear distinction between a client and a server(there is often no clear distinction) - A server may continuously act as a client - Distinction.
June 3-6, 2003E-Society Lisbon Automatic Metadata Discovery from Non-cooperative Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science.
A RCHIVAL COLLECTIONS IN A D IGITAL W ORLD Cheryl Walters Nov. 6, 2008.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Cloud based linked data platform for Structural Engineering Experiment
Outline Pursue Interoperability: Digital Libraries
Information Integration for Digital Libraries
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Presentation transcript:

Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003

July 6, 2010Automatic Metadata Discovery and Retrieval2 Table of content  Introduction  Motivation  Problem  Solution  Approach  Challenges  Automated Metadata Discovery and Retrieval  Future Works  Conclusion  Questions  References

July 6, 2010Automatic Metadata Discovery and Retrieval3 Introduction What is a digital library?

July 6, 2010Automatic Metadata Discovery and Retrieval4 Motivation  Growing number of digital libraries on the Internet  Each implementation done independently from the others  Provide interoperable service across heterogeneous systems

July 6, 2010Automatic Metadata Discovery and Retrieval5 Problems  Independent data providers without following any common protocol  Digital library does not provide metadata or a way to obtain its metadata  Each digital library has its own way to define metadata  Each digital library can display any subset of its metadata at its own discretion  Each digital library has its own rules as to which metadata to display and in what form

July 6, 2010Automatic Metadata Discovery and Retrieval6 Sample Search results of ACM DL

July 6, 2010Automatic Metadata Discovery and Retrieval7 Sample result list page and record page of Cogprint DL

July 6, 2010Automatic Metadata Discovery and Retrieval8 Proposed Solutions  Lightweight Federated Digital Library  Provide a metadata retrieval mechanism for non-cooperating digital libraries  Post processing techniques based on general web search-engines

July 6, 2010Automatic Metadata Discovery and Retrieval9 Approaches  Metadata Harvesting  Collect data at a central location from different digital libraries  Unified search interface  Distributed Search  Metadata resides at its original location  Only retrieve relevant metadata when needed

July 6, 2010Automatic Metadata Discovery and Retrieval10 Challenges  Flexible integration  Transparent relocation and/or deletion of digital libraries  Performance requires post processing of data

Automatic Metadata Discovery and Retrieval

July 6, 2010Automatic Metadata Discovery and Retrieval12 Approach  Generic universal search interface based on Dublin Core  Dublin Core is a set of metadata descriptions about resources on the Internet  Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements  Develop a search engine that retrieves pages with metadata  Define rules to extract metadata from these pages  Develop a metadata parser  Use Dublin Core metadata set as a common set  All individual DL’s metadata fields are mapped to the closest Dublin Core field

July 6, 2010Automatic Metadata Discovery and Retrieval13 Architecture

July 6, 2010Automatic Metadata Discovery and Retrieval14 Architecture (cont.)

July 6, 2010Automatic Metadata Discovery and Retrieval15 Retrieval and Parsing  Results Process Engine checks for parsing rules from the DL specifications  Process Engine applies parsing and generate metadata to be stored in a cache  If DL specification also defines lower level metadata parsing rules, all record HTML pages will be retrieved from remote DL, and parsed  Extra process on cached metadata so that they are ready to be displayed  Results are merged and then displayed to end-users  Periodically, cached metadata will be saved to persistent storage such as a database

July 6, 2010Automatic Metadata Discovery and Retrieval16 Metadata Parsing Rules Definition  Same DL XML specification for metadata parsing rules as for query mapping and metadata retrieval  Digital Library Definition Language is extended to:  Result list page level  Single record document level  Raw string is separated into several segments, each segment has one or several metadata fields

July 6, 2010Automatic Metadata Discovery and Retrieval17 Local Repository – Intelligence Cache  Parsed metadata is stored in local database  Improved search performance  Improved service reliability  Cache grouped by metadata group provides service quality as good as the search service provided by individual DL  Consistent engine maintains consistency between local storage and remote digital libraries

July 6, 2010Automatic Metadata Discovery and Retrieval18 Post processed results in LFDL after metadata parsing

July 6, 2010Automatic Metadata Discovery and Retrieval19 Future Works  Improve performance through intelligent caching  Improve service quality through better navigation tool sets

July 6, 2010Automatic Metadata Discovery and Retrieval20 Conclusions  Pros  Easy to follow  Comprehensive background information of the problem  Detail explanation on design architecture  Cons  Incomplete on caching and service  How to dedupe similar information  Repetitive information throughout the paper

July 6, 2010Automatic Metadata Discovery and Retrieval21 Conclusions (cont.)  Improvements  Combine crawling with LFDL  Clearly defined scope  Utilize open source architecture like Hadoop and/or Solr  Use internet cloud for better availability  Demonstrated financial incentives of this subject

July 6, 2010Automatic Metadata Discovery and Retrieval22 Questions

July 6, 2010Automatic Metadata Discovery and Retrieval23 Reference R. Shi, K Maly, M. Zubair, “ Automatic Metadat Discovery from Non-cooperative Digital libraries”, IADIS International Conference e-Society, Lisbon, Portugal, Nov 2003 Fotosearch, Wikipedia, Answers, R Shi, “Lightweight Federation of Non-Cooperative Digital Libraries”, Ph D Dissertation, Old Dominion University, 2005 W. Arms, Digital libraries. Cambridge, MA: MIT Press, 1999 S. M. Griffin, “ Taking the initiative for Digital Libraries,” The Electronic Library, vol. 16, no. 1, pp , Feb A. Paepcke, C. K. Chang, T. Winograd, and H. Garcia-Molina, “ Interoperability for digital libraries worldwide,” Communications of the ACM, vol. 41, no. 4, pp , April 1998