Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Idaho National Engineering and Environmental Laboratory What is a Framework? Web Service? Why do you need them? Wayne Simpson November.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Search Engines and Information Retrieval
Video Table-of-Contents: Construction and Matching Master of Philosophy 3 rd Term Presentation - Presented by Ng Chung Wing.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
Automated Reference Assistance: Reference for a New Generation Denise Troll Covey Associate University Librarian Carnegie Mellon CNI Meeting – April 2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
INTRODUCTION TO WEB DATABASE PROGRAMMING
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Search Engines and Information Retrieval Chapter 1.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
DYNAMIC WAP BASED VOTING SYSTEM Bertrand COLAS Submission date: May 2002 School of Computing Bachelor of Engineering with Honours in Computer.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
The Internet 8th Edition Tutorial 4 Searching the Web.
Project Overview Graduate Selection Process Project Goal Automate the Selection Process.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Search Engine Architecture
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
World Wide Web Library 150 Week 8. The Web The World Wide Web is one part of the Internet. No one controls the web Diverse kinds of services accessed.
Search Tools and Search Engines Searching for Information and common found internet file types.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
Crawling the Hidden Web Authors: Sriram Raghavan, Hector Garcia-Molina VLDB 2001 Speaker: Karthik Shekar 1.
Information Retrieval
CS562 Advanced Java and Internet Application Introduction to the Computer Warehouse Web Application. Java Server Pages (JSP) Technology. By Team Alpha.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999.
Web Search Architecture & The Deep Web
1 CS 430: Information Discovery Lecture 5 Ranking.
Lecture 21: Component-Based Software Engineering
LE 1182 TREE TREE PMC3/97 UI Development. LE 1182 TREE TREE PMC3/97 P02 User Interface  Design Approach Rapid prototyping Rapid User evaluation  Requirements.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Setting up a search engine KS 2 Search: appreciate how results are selected.
 The web is referred to as a “massive collection of web pages stored on millions of computers across the world that are linked by the Internet” (Chowdhury,
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
WWW and HTTP King Fahd University of Petroleum & Minerals
Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Multilevel Marketing Tree Viewer
Search Engine Architecture
SIS: A system for Personal Information Retrieval and Re-Use
Web Information retrieval
Tiers vs. Layers.
Search Engine Architecture
Information Retrieval and Web Design
Presentation transcript:

Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor

Agenda Information on the Internet. Boolean Retrieval Model and the Internet. Personalized Search. Concept-Based Retrieval (RUBRIC / CS 3 ). CS 3 and Boolean Search Engines. Deep Web Sources. Current & Future Work.

Information on the Internet Large volume. Rapid growth rate. Wide variations in quality and type.

Boolean Retrieval Model and the Internet Most Internet search engines are based on the Boolean Retrieval Model. Boolean Retrieval Model is relatively easy to implement. Limitations: –Inability to assign weights to query or document terms. –Inability to rank retrieved documents. – Naïve users have difficulty in using

Personalized Search Personalized Engine Query Processor User Query Search Engine Query AugmentationSearch Results Result Processor Personalized Results User Profile General Profile

Concept-Based Retrieval Address shortcomings of Boolean Retrieval Model. Search Requests specified in terms of concepts structured as rule-base trees.

Development of Rule-Base Trees (General) Top-down refinement strategy. Support for AND / OR relationships. Support for user-defined weights.

Development of Rule-Base Trees (CS 3 ) Concept-Set Structuring System (CS 3 ) CS 3 supports the creation, storage and modification of user-defined concepts Post-processing of results of sub-queries CS 3 user-interface.

CS3 User Interface

Evaluation of Rule-Base Trees (RUBRIC) Run-time, bottom-up analysis. Propagation of weight values (MIN / MAX). Disadvantage of run-time analysis.

Evaluation of Rule-Base Trees (CS 3 ) Static, bottom-up analysis. Construct Minimal Term Set (MTS). Propagation of terms. CS 3 user-interface.

MTS-Minimal Term Set lA MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. lA topic could have more than one MTSs. lA user can choose from those MTSs to perform a search to his needs.

CS 3 and Boolean Search Engines CS 3 is designed to interface with existing Boolean search engines. U.S. Department of Energy’s “Information- Bridge” search engine. U.S. Department of Transportation’s “National Transportation Library” search engine.

System Architecture Client (Java/ Applet ) CORBACGI Server (JAVA)Server (JAVA/C++) JDBC ORACLE DOE InfoBridge … etc.

Information-Bridge and CS 3 Search request: Boolean Vs. Concept Output: Non-Ranked Vs. Ranked. Calculation of RSV: –Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.

Information-Bridge and CS 3 (Example) Boolean search request (“Environmental Science Network” Form): –(“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). Concept (CS 3 ): –“Hydrogeology”. –Rule-Base Tree.

CS3 Hydrogeology Rule Base

CS3 search results

Deep Web Sources Also referred to as hidden Web or invisible Web Resides behind search forms in databases e.g. monster.com, louisiana1st.com, PubMed. Web pages in deep Web are generated dynamically based on the submitted queries. Not indexed by current search engines. Search engines index content on the surface Web.

Deep Web Sources and Concept- based Retrieval Deep Web in terms of size and quality: Size (Deep Web) = 500 * Size (Surface Web) Quality (Deep Web) = 1000 * Quality (Surface Web) Queries submitted at deep Web sources are more stable compared to queries submitted to search engines So, naturally concept-based retrieval is more suitable for deep Web sources

Current and Future Work Conduct experiments to evaluate effectiveness (future). Investigate alternative methods to compute RSVs [KADR00, KDR01*]. Learning edge weights through relevance feedback [KR00]. Thesaurii based rulebase generation [KLR00].

Relevant URLs [LJRT99*] RaghavanHome  Publications since