Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Project Title: Deepin Search Member: Wenxu Li & Ziming Zhai CSCI 572 Project.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Overview of Search Engines
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor.
Search Engines and Information Retrieval Chapter 1.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Web Pages with Features. Features on Web Pages Interactive Pages –Shows current date, get server’s IP, interactive quizzes Processing Forms –Serach a.
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
Evaluation of Agent Building Tools and Implementation of a Prototype for Information Gathering Leif M. Koch University of Waterloo August 2001.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
March 24, 2004Craig E. Ward, CMSI 698 Advanced Topics in Database Systems Database Architecture Overview Client-Server and Distributed Architectures.
Web Pages with Features. Features on Web Pages Interactive Pages –Shows current date, get server’s IP, interactive quizzes Processing Forms –Serach a.
Search Tools and Search Engines Searching for Information and common found internet file types.
CONTENTS  Definition And History  Basic services of INTERNET  The World Wide Web (W.W.W.)  WWW browsers  INTERNET search engines  Uses of INTERNET.
CS562 Advanced Java and Internet Application Introduction to the Computer Warehouse Web Application. Java Server Pages (JSP) Technology. By Team Alpha.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Enhancing Internet Search Engines to Achieve Concept- based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999.
1 CS 430: Information Discovery Lecture 5 Ranking.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
1 Chapter 22 World Wide Web (HTTP) Chapter 22 World Wide Web (HTTP) Mi-Jung Choi Dept. of Computer Science and Engineering
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
CS 501: Software Engineering Fall 1999 Lecture 23 Design for Usability I.
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
WWW and HTTP King Fahd University of Petroleum & Minerals
Search Engine Architecture
Search Engine Architecture
Implementation Issues & IR Systems
PHP / MySQL Introduction
SIS: A system for Personal Information Retrieval and Re-Use
Web Information retrieval
CSCE 561 Information Retrieval System Models
Chapter 27 WWW and HTTP.
Toshiyuki Shimizu (Kyoto University)
Introduction to Servlets
Tiers vs. Layers.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Search Engine Architecture
Client-Server Model: Requesting a Web Page
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Enhancing Internet Search Engines to Achieve Concept-based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor

Agenda Information on the Internet. Boolean Retrieval Model and the Internet. Personalized Search. Concept-Based Retrieval (RUBRIC / CS3). CS3 and Boolean Search Engines. Deep Web Sources. Current & Future Work.

Information on the Internet Large volume. Rapid growth rate. Wide variations in quality and type.

Boolean Retrieval Model and the Internet Most Internet search engines are based on the Boolean Retrieval Model. Boolean Retrieval Model is relatively easy to implement. Limitations: Inability to assign weights to query or document terms. Inability to rank retrieved documents. Naïve users have difficulty in using

Personalized Search User Query Personalized Results Personalized Engine Query Processor User Profile General Profile Result Processor Query Augmentation Search Results Search Engine

Concept-Based Retrieval Address shortcomings of Boolean Retrieval Model. Search Requests specified in terms of concepts structured as rule-base trees.

Development of Rule-Base Trees (General) Top-down refinement strategy. Support for AND / OR relationships. Support for user-defined weights.

Development of Rule-Base Trees (CS3) Concept-Set Structuring System (CS3) CS3 supports the creation, storage and modification of user-defined concepts Post-processing of results of sub-queries CS3 user-interface.

CS3 User Interface

Evaluation of Rule-Base Trees (RUBRIC) Run-time, bottom-up analysis. Propagation of weight values (MIN / MAX). Disadvantage of run-time analysis.

Evaluation of Rule-Base Trees (CS3) Static, bottom-up analysis. Construct Minimal Term Set (MTS). Propagation of terms. CS3 user-interface.

MTS-Minimal Term Set A MTS for a topic is a set of terms such that if each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0. A topic could have more than one MTSs. A user can choose from those MTSs to perform a search to his needs.

CS3 and Boolean Search Engines CS3 is designed to interface with existing Boolean search engines. U.S. Department of Energy’s “Information-Bridge” search engine. U.S. Department of Transportation’s “National Transportation Library” search engine.

System Architecture Client (Java/ Applet ) CORBA CGI Server (JAVA) Server (JAVA/C++) JDBC DOE InfoBridge etc. … ORACLE

Information-Bridge and CS3 Search request: Boolean Vs. Concept Output: Non-Ranked Vs. Ranked. Calculation of RSV: Given a document D and a set S of MTS expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.

Information-Bridge and CS3 (Example) Boolean search request (“Environmental Science Network” Form): (“Hydrogeology” OR “Dnapl” OR (“Colloid*” AND “Environmental Transport”)). Concept (CS3): “Hydrogeology”. Rule-Base Tree.

CS3 Hydrogeology Rule Base

CS3 search results

Deep Web Sources Also referred to as hidden Web or invisible Web Resides behind search forms in databases e.g. monster.com, louisiana1st.com, PubMed. Web pages in deep Web are generated dynamically based on the submitted queries. Not indexed by current search engines. Search engines index content on the surface Web.

Deep Web Sources and Concept-based Retrieval Deep Web in terms of size and quality: Size (Deep Web) = 500 * Size (Surface Web) Quality (Deep Web) = 1000 * Quality (Surface Web) Queries submitted at deep Web sources are more stable compared to queries submitted to search engines So, naturally concept-based retrieval is more suitable for deep Web sources

Current and Future Work Conduct experiments to evaluate effectiveness (future). Investigate alternative methods to compute RSVs [KADR00, KDR01*]. Learning edge weights through relevance feedback [KR00]. Thesaurii based rulebase generation [KLR00].

Relevant URLs [LJRT99*] RaghavanHome  Publications since 1991 www.allinonenews.com