1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
Online Clustering of Web Search results
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
Search engines. The number of Internet hosts exceeded in in in in in
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
Internet Research Search Engines & Subject Directories.
Internet Research, Second Edition- Illustrated 1 Internet Research: Unit A Searching the Internet Effectively.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
1999 Asian Women's Network Training Workshop Tools for Searching Information on the Web  Search Engines  Meta-searchers  Information Gateways  Subject.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Internet Business Foundations © 2004 ProsoftTraining All rights reserved.
CSCI-235 Micro-Computer in Science Internet Search.
Ontological Classification of Web Pages Zafer Erenel Many users use search engines to locate and buy goods and services (such as choosing a vacation).
 Search Engine Search Engine  Steps to Search for webpages pertaining to a specific information Steps to Search for webpages pertaining to a specific.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Querying Structured Text in an XML Database By Xuemei Luo.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Web Document Clustering: A Feasibility Demonstration Oren Zamir and Oren Etzioni, SIGIR, 1998.
SCATTER/GATHER : A CLUSTER BASED APPROACH FOR BROWSING LARGE DOCUMENT COLLECTIONS GROUPER : A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS MINAL.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engine Architecture
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Search Engines.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
Algorithmic Detection of Semantic Similarity WWW 2005.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
Search Tools and Search Engines Searching for Information and common found internet file types.
NATIONAL AGENCY FOR EDUCATION Check the Source! - Web Evaluation
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
CIW Lesson 6MBSH Mr. Schmidt1.  Define databases and database components  Explain relational database concepts  Define Web search engines and explain.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
1 Evaluating Web Sites For Teachers and Students by Nicole Slinger.
G042 - Lecture 09 Commencing Task A Mr C Johnston ICT Teacher
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Third Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
For Teachers and Students by Nicole Slinger
Lesson 6: Databases and Web Search Engines
Search Engine Architecture
CIW Lesson 6 Web Search Engines.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Search Engines & Subject Directories
Information Retrieval
Evaluating Information Sources
Disambiguation Algorithm for People Search on the Web
Lesson 6: Databases and Web Search Engines
Search Engines & Subject Directories
Search Engines & Subject Directories
Search Engine Architecture
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs) containing those keywords. Search engines attempt to organize and rank information pertaining to users’ search, they fail to group and form relationships between these results (e.g. Excite, Altavista).

2 Our goal is to develop an advanced search engine that makes the search results easy to browse by grouping, organizing and relating them. MARG-DARSHAK returns a graph containing a set of nodes and edges, where each node contains one or more URLs and the types of relationships between nodes are associated with each edge.

3 Issues: How to group the web pages, how to rank them, how to determine the relationship among web pages, how to ensure the reliability of data Currently, we use meta-search engine for retrieving the initial results.

4 User Interface Meta-search Engine Grouping of Results Ranking Web Database Relationships Results Display Web QL

5 Reliability of Data in WWW Information about a particular topic is available at many web sites. How do we know that the information provided on the web is reliable, up-to-date and accurate?

6 Evaluation Criteria Accuracy Coverage Currency Ownership Objectivity Authority Search Engine

7 Methods to Ensure Reliability ‘Last Update’ Method ‘Majority Basis’ Method ‘Polling’ Method ‘Query Driven’ Method ‘Home (Official) Site’ Method

8 User Interface To accept user input To display result To refine user input

9 Web Query Language To select the data pertaining to the user’s query To eliminate redundant data To manipulate web data

10 Grouping Documents Search engine returns several web sites that contain the same information. This information needs to be grouped based on keywords, and similarity measure. (Sub) Keywords are provided by the users or an ontology can be used.

11 Grouping based on phrases common to many documents; we use suffix tree clustering algorithm (O. Zamir and O. Etzioni, “We Document Clustering”, SIGIR’98). This algorithm has three steps: (1) Document cleaning (2) Identify document base(s) (3) Merge these document base(s)

12 A suffix tree is created from plain documents where each document is treated as a string. Strings are decomposed into words. Each leaf node contains a list of all the documents that contain the concatenation of all the strings from root node to that leaf node.

13 Merge these ‘document base(s) (leaf nodes) into larger group based on the ratio of their intersecting documents to the total number of documents that they contain. Two document bases B1 and B2 are merged if they satisfy the following two conditions: (1) |B1  B2|/|B1| > m and (2) |B1  B2|/|B2| > m, where 0  m  1 is merging threshold value.

14 The shared phrases of a group provide an informative way of summarizing its contents to the user. To identify the redundant phrases, we adopt ‘coverage’ method; that is, the domain of the user’s topic of interest.

15 Ordering Documents groups Based on their relevance to the query -prior knowledge of the domain of the search problem is must Ordering can be done within a group and among groups Indexing can be done on documents based on the occurrence of keywords

16 Defining Relationships between Documents Information about a topic is scattered on the web, there exits some relationship among web pages based on the contents of pages. Relationships can be defined on static web pages. Examples: Similar-To, Example-Of, Next-To, Previous-To, Derived-From, Same-As, Part-Of

17 Similar-To Two documents are similar to each other if they have the same semantic meaning of the documents. Document Analysis work can be adopted to find this relationship; based on common similar words, occurrences of these words in the same order, similarity measure.

18 Example-Of Find the presence of ‘e.g.’ or ‘example of’ or ‘explains’ etc before or after certain keywords. A web page u will be an example of web page v if there is at least m references from v to u.

19 Next-To, Previous-To Next-To: Search words like ‘refer to’, ‘further reading’, ‘more information’ etc Previous-To: Hard to determine such relationship because it is difficult to know which documents have link to a given web page.