Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida

Similar presentations


Presentation on theme: "Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida"— Presentation transcript:

1 Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida suchen@cise.ufl.edu

2 Abstract OAI extensions to federated search and other services for MathML-based metadata indexing and subject classification of mathematical abstracts. Construction of ontology or conceptual maps of mathematics. Mathematical formulas are considered as elements of the ontology. Ontology indexing by clustering mathematical abstracts or full papers into an information visualization interface so that users may select using ontology as well as metadata.

3 DL Server Data Provider OAI_DC Data Provider OAI_XXX Service Provider Service Provider Data Mining Federated Search Harvester Harvest API A DL Server with OAI Extensions: Managing the Metadata Complexity

4

5 A DL Server with OAI Extensions: Managing the Metadata Complexity Built in capabilities: Harvester – harvest various OAI compliant data providers Data provider – expose harvested and existing metadata sets Service provider – federated search and data mining capabilities on metadata sets

6 Harvester DL Server Harvester Harvester Interface: URL to harvest Selective harvesting parameters Harvest API parameters harvest Data Providers … Harvested metadata

7 Harvester Interface

8

9 Data Provider Expose single or combined metadata sets harvested to other harvesters Reformat metadata from different data providers to be harvested by other service providers (e.g., originally Dublin Core, reformat to MARC before exposing)

10 Service Provider: Federated Search Emulating a federated search service on existing and combined harvested metadata sets Federated search across potentially other search protocols

11 Federated Search

12

13

14 Service Provider: Data Mining Knowledge discovery on harvested metadata sets Metadata classification using the Self- Organizing Map (SOM) algorithm Improving retrieval effectiveness by providing concept browsing and search services

15 Self-Organizing Map Algorithm Competitive and unsupervised learning algorithm Artificial neural network algorithm for visualizing and interpreting complex data sets Providing a mapping from a high- dimensional input space to a two- dimensional output space

16 Data Mining Service Provider System Architecture Metadata Database SOM Categorizer Concept Harvester Input Vector Generator Noun Phraser Browser Concept browsing request Concept search request Response RequestResponse Fetch metadataSave SOM

17 Concept Harvester Screenshot of the SOM Categorizer

18 Construction of Two-level Concept Hierarchy Constructing the SOM for each harvested metadata set SOMs of the lower layer are added to the upper-layer SOM. VTETD

19 Top-level Concept Browsing

20 Bottom-level Concept Browsing

21 MEDLINE Database  Developed by the National Library of Medicine (NLM)  Bibliographic citations and abstracts from more than 4,600 biomedical journals published in the United States and 70 other countries.  Covering the fields of medicine, nursing, dentistry, veterinary medicine, the health care system, and the preclinical sciences.  Over 12 million citations  Searchable via PubMed or the NLM Gateway

22 MeSH (Medical Subject Headings)  MEDLINE uses MeSH as its controlled vocabulary for indexing database articles  Indexers scan an entire article and assign MeSH headings (or MeSH descriptors) to each article  MeSH descriptors are arranged in both an alphabetic list and a hierarchical structure.  Updated annually to reflect the changes in medicine and medical terminology

23 Our Experimentation Problems  It is well known that searching by descriptors will greatly improve the search precision.  However, it is very difficult for naïve users to know and use exact MeSH descriptors to search.  In addition, as the database of MEDLINE grows, information overload would prevent users from finding relevant information of their interest. Proposed Approach  Categorizations according to MeSH terms, MeSH major topics, and the co-occurrence of MeSH descriptors  Clustering using the results of MeSH term categorization through the Knowledge Grid  Visualization of categories and hierarchical clusters

24 Data Access Services MeSH Major Topic Tree ViewSOM Tree View

25 Knowledge Grid Courtesy of Cannataro and Talia (Knowledge Grid: An Architecture for Distributed Knowledge Discovery) Knowledge Grid Architecture

26 Future Directions Develop a federated search service for OAI- compliant mathematical abstracts. Develop an ontology or conceptual maps for mathematics. Develop an ontology search service for mathematical abstracts and full papers. Develop an interoperable architecture with other services, such as OCR of mathematical formulas.

27 Acknowledgement Many thanks to the NSF NSDL Program. Collaborators – Joe Futrelle (NCSA), Ed Fox (Virginia Tech) Student Team – Hyunki Kim, Chee Yoong Choo, Xiaoou Fu, Yu Chen


Download ppt "Indexing Mathematical Abstracts by Metadata and Ontology IMA Workshop, April 26-27, 2004 Su-Shing Chen, University of Florida"

Similar presentations


Ads by Google