Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Modern Information Retrieval Chapter 1: Introduction
Information Retrieval in Practice
Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
1 A Web-Based Integral Evaluator: A Demonstration of the Successful Integration of WebEQ, Maple, and Java Wanda M. Kunkle Department of Mathematics & Computer.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 31/10/2007.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
An Overview of Relevance Feedback, by Priyesh Sudra 1 An Overview of Relevance Feedback PRIYESH SUDRA.
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
Overview of Search Engines
INTRODUCTION TO WEB DATABASE PROGRAMMING
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
SPEECH DESCRIPTORS GENERATION SOFTWARE UTILIZED FOR CLASSIFICATION AND RECOGNITION PURPOSES Lukasz Laszko Department of Biomedical.
CIS/SUSL1 Fundamentals of DBMS S.V. Priyan Head/Department of Computing & Information Systems.
World Wide Web Library 150 Week 8. The Web The World Wide Web is one part of the Internet. No one controls the web Diverse kinds of services accessed.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Search Engines LESSON 1-3. Objectives The student will: Perform searches and explain how to refine a search to retrieve better information Identify resources.
Information Retrieval
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Managing Learning Objects in Large Scale Courseware Authoring Studio Ivo Marinchev, Ivo Hristov Institute of Information Technologies Bulgarian Academy.
CS562 Advanced Java and Internet Application Introduction to the Computer Warehouse Web Application. Java Server Pages (JSP) Technology. By Team Alpha.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
1 CS 430: Information Discovery Lecture 5 Ranking.
Lecture 21: Component-Based Software Engineering
LE 1182 TREE TREE PMC3/97 UI Development. LE 1182 TREE TREE PMC3/97 P02 User Interface  Design Approach Rapid prototyping Rapid User evaluation  Requirements.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
WWW and HTTP King Fahd University of Petroleum & Minerals
Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Multilevel Marketing Tree Viewer
CS 430: Information Discovery
Search Engine Architecture
Implementation Issues & IR Systems
Tiers vs. Layers.
Evaluation of IR Performance
Introduction to Pattern Oriented Analysis and Design (POAD)
Search Engine Architecture
Information Retrieval and Web Design
Peer-to-Peer Information Systems Assignment #3
Peer-to-Peer Information Systems Week 6: Assignment #3
Presentation transcript:

Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999

Agenda Information on the Internet. Boolean Retrieval Model and the Internet. Concept-Based Retrieval (RUBRIC / CS 3 ). CS 3 and Boolean Search Engines. Future Work.

Information on the Internet Large volume. Rapid growth rate. Wide variations in quality and type.

Boolean Retrieval Model and the Internet Most Internet search engines are based on the Boolean Retrieval Model. Boolean Retrieval Model is relatively easy to implement. Limitations: –Inability to assign weights to query or document terms. –Inability to rank retrieved documents. – Naïve users have difficulty in using

Concept-Based Retrieval Address shortcomings of Boolean Retrieval Model. Search Requests specified in terms of concepts structured as rule-base trees.

Development of Rule-Base Trees (General) Top-down refinement strategy. Support for AND / OR relationships. Support for user-defined weights.

Development of Rule-Base Trees (CS 3 ) Concept-Set Structuring System (CS 3 ) CS 3 supports the creation, storage and modification of user-defined concepts Post-processing of results of sub-queries CS 3 user-interface.

CS3 User Interface

Evaluation of Rule-Base Trees (RUBRIC) Run-time, bottom-up analysis. Propagation of weight values (MIN / MAX). Disadvantage of run-time analysis.

Evaluation of Rule-Base Trees (CS 3 ) Static, bottom-up analysis. Construct Minimal Term Set (MTS). Propagation of terms. CS 3 user-interface.

MTS-Minimal Term Set lA MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. lA topic could have more than one MTSs. lA user can choose from those MTSs to perform a search to his needs.

Concept-Based Retrieval and Boolean Search Engines CS 3 is designed to interface with existing Boolean search engines. U.S. Department of Energy’s “Information- Bridge” search engine. U.S. Department of Transportation’s “National Transportation Library” search engine.

System Architecture Client (Java/ Applet ) CORBACGI Server (JAVA)Server (JAVA/C++) JDBC ORACLE DOE InfoBridge … etc.

Information-Bridge and CS 3 Search request: Boolean Vs. Concept Output: Non-Ranked Vs. Ranked. Calculation of RSV: –Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.

Information-Bridge and CS 3 (Example) Boolean search request (“Environmental Science Network” Form): –(“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). Concept (CS 3 ): –“Hydrogeology”. –Rule-Base Tree.

CS3 Hydrogeology Rule Base

CS3 search results

Current and Future Work Conduct experiments to evaluate effectiveness (future). Investigate alternative methods to compute RSVs [KADR00, KDR01*]. Learning edge weights through relevanace feedback [KR00]. Thesaurii based rulebase generation [KLR00].

Relevant URLs [LJRT99*] RaghavanHome  Publications since 1991