Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Semantically Enhanced Web Search Results

Similar presentations


Presentation on theme: "Clustering Semantically Enhanced Web Search Results"— Presentation transcript:

1 Clustering Semantically Enhanced Web Search Results
Anantha Bangalore, MSD, Vienna, VA Arun Sood, Professor and Chair, CS Dept Noorullah Moghul, CS PhD student George Mason University, Fairfax, VA. 9 September 2004

2 Overview DAIRS: Distributed Agents for Information Retrieval Systems
Software agents Applied to Image, geospatial and text processing. Tested within medical context Results of initial testing are provided Applicable in many domains Scope for discussion 9 September 2004 © 2004 by Arun Sood

3 DAIRS - Problem Statement
Data volume is exploding Data rich information poor environment Many search systems provide high recall but low precision (e.g. Google) Increased precision (relevance) Saves user time Enables a broader search of candidate URLs 9 September 2004 © 2004 by Arun Sood

4 Our Approach Assumption: Google (and other) search engines provide acceptable recall DAIRS extracts a robust and relevant result set Use an ontology to describe the user context Ontological filtering Clustering of the concepts 9 September 2004 © 2004 by Arun Sood

5 Subset of UMLS Semantic Net
9 September 2004 © 2004 by Arun Sood

6 Advantage of Our Agent Approach
Easily compose solution methodologies using light weight agents: 200 agents in our system Works in a distributed environment Agents are mobile Load balance agent assigns agents in the background Exploits parallelism Import functionality from 3rd party software, without importing the application 9 September 2004 © 2004 by Arun Sood

7 Interface to Compose Solutions
9 September 2004 © 2004 by Arun Sood

8 EXPERIMENT Google Search – {cold, strain, fluid, adjustment, fat, condition, etc.} Selected top 100 URLs in each search Classified the URLs using DAIRS ( UMLS as Ontology filter and Cluto as clustering software) Compared DAIRS results with the URL classification done manually 9 September 2004 © 2004 by Arun Sood

9 Words with Multiple Senses (Cold)
NLM has identified 50 words with multiple senses that occur frequently Cold disease, cold temperature, cold war, cold fusion, cold springs, cold calls, etc. Find URLs dealing with cold in a medical context (e.g. common cold) Ontology filter (UMLS – Metathesaurus) helps deemphasize non-medical URLs Clustering leads to separation of medical related URLs from other URLs 9 September 2004 © 2004 by Arun Sood

10 EXAMPLE URL CLASSIFICATION Common Cold
Common Cold URLs classified correctly Undetected URLs Contains images and links to other websites, little text Contains very little textual content Contains images and very little text Mostly image content 9 September 2004 © 2004 by Arun Sood

11 9 September 2004 © 2004 by Arun Sood

12 9 September 2004 © 2004 by Arun Sood

13 URL CLASSIFICATION EXAMPLE – 2
Cold URLs –False Alarms NEWS article describes a story at a cold place Winter wear Cold Calls – sales calls Music website 9 September 2004 © 2004 by Arun Sood

14 SUMMARY OF THE RESULTS: IR Measures
CONCEPT Google Hits Analyzed Correct Classification Undetected False Alarms Google DAIRS Common cold (Disease) 100 5 4 91 Cold Temperature 9 86 Strain (Muscle) 15 6 79 Strain (Bacterial) 42 14 44 8 Fluid (Substance) 2 96 Fluid (Behavior) 36 55 12 Like “cold” example, most of the misses are because of limited text at these sites – mostly images, and pointers to other web pages. 9 September 2004 © 2004 by Arun Sood

15 Location of hits: Usability Measures
CONCEPT Google Hits Analyzed Correct Classification Undetected Common cold (Disease) 100 44,45,49,62,83 19, 50, 51, 99 Cold Temperature 43, 53, 58, 74, 96 1, 9, 17, 31, 37, 54, 64, 84, 85 Strain (Muscle) 1, 17, 18, 22, 27, 33, 42, 43, 54, 67, 79, 80, 84, 92, 95 7, 11, 20, 38, 53, 85 9 September 2004 © 2004 by Arun Sood

16 Building a Robust DAIRS
Previous study shows that some sites were not properly classified because the text content was small Next steps Build agent to extract links to the next level of URLs Build agent to parse the next level of URL text and include in the search results Build agent to OCR the images, and extract text 9 September 2004 © 2004 by Arun Sood

17 DAIRS vs. Search Engines
DAIRS complements search engines to fine tune target specific searches DAIRS permits creating user based filters using ontologies DAIRS facilitates the creation of user guided technology specific dictionaries Our project on DAIRS for nanotechnology will build a mega-dictionary, which will be parsed into components of interest to clients 9 September 2004 © 2004 by Arun Sood

18 Commercial Applicability of DAIRS
For example the monitoring the developments in Nanotechnology The dynamic issues related with a growing field is an ideal place to use a DAIRS approach to manage information Date Google URLs Google News (30 days) 9/6 1.59 M 1150 6/6 712 4/29 1.42 M 1390 3/22 1.3 M 970 9 September 2004 © 2004 by Arun Sood

19 Review – Key issues Ontologies can be used to focus the search results
Significant reduction in false alarms, with some loss in detections Discussed strategies for improving DAIRS DAIRS complements search engines Broad applicability 9 September 2004 © 2004 by Arun Sood

20 Questions? Can DAIRS be used for composition of web services?
How to build an ontology? Is it possible to build a good enough representation? Single ontology or linked ontologies? Build a single ontology for an organization? How difficult is it to build an agent? What is under the hood? Why is agent mobility important? 9 September 2004 © 2004 by Arun Sood


Download ppt "Clustering Semantically Enhanced Web Search Results"

Similar presentations


Ads by Google