Methods and Apparatus for Ranking Web Page Search Results

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
How PageRank Works Ketan Mayer-Patel University of North Carolina January 31, 2011.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Information Retrieval in Practice
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Link Analysis, PageRank and Search Engines on the Web
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Link Structure and Web Mining Shuying Wang
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Link Analysis HITS Algorithm PageRank Algorithm.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Databases & Data Warehouses Chapter 3 Database Processing.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
1 CS 430: Information Discovery Lecture 9 Term Weighting and Ranking.
Search. Search and Economics Search is ubiquitous –Money as a search efficiency Eliminates double coincidence of wants in search for barter exchange –Job.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Overview of Web Ranking Algorithms: HITS and PageRank
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Searching Tutorial By: Lola L. Introduction:  When you are using a topic, you might want to use “keyword topics.” Using this might help you find better.
Search Engines.
BING!-Microsoft's new search engine Launched May 28, 2009 Appealing interface A “decision engine” not just a search engine *Shopping, health, travel, local.
Understanding Search Engines. Basic Defintions: Search Engine Search engines are information retrieval (IR) systems designed to help find specific information.
Ranking Link-based Ranking (2° generation) Reading 21.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 CS 430: Information Discovery Lecture 5 Ranking.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Information Retrieval in Practice
Search Engine Architecture
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
HITS Hypertext-Induced Topic Selection
Search Engines and Link Analysis on the Web
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
IST 516 Fall 2011 Dongwon Lee, Ph.D.
Text & Web Mining 9/22/2018.
The Anatomy of a Large-Scale Hypertextual Web Search Engine
A Comparative Study of Link Analysis Algorithms
Lecture 22 SVD, Eigenvector, and Web Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Data Mining Chapter 6 Search Engines
Anatomy of a Search Search The Index:
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Information Retrieval and Web Design
COMP5331 Web databases Prepared by Raymond Wong
Presentation transcript:

Methods and Apparatus for Ranking Web Page Search Results Alireza Abbasi abbasi@snu.ac.kr 406.534 Industrial Information Technology Patent Presentation Information Technology Policy Program Seoul National University

Bibliographic Details Title: Methods and Apparatus for Ranking Web Page Search Results Inventor: Andrei Z. Broader, Menlo Park, CA Assignee: Alta Vista Company, Palo Alto, CA (US) Filed Date: Oct. 25 2000 Bibliographic Details

Context Abstract Background of Invention Summery of Invention Field of invention Related Art Summery of Invention Advantages & Applications Detailed Description Context

Abstract To Rank the quality of page Forming a linear combination of two/more matrices Using the coefficients of the eigenvector of the resulting matrix Abstract

Computerized Information Retrieval Invention Relates generally to Computerized Information Retrieval Identifying related pages in a hyperlinked database environment (Web) Challenge is to retrieve only the most relevant resources to the query Field of Invention

Related Art (1) Kleinberg Algorithm: Identify ‘Hub’ and ‘Authority’ pages in a neighborhood graph for a query Determines related pages starting with a single page. Problems: Doesn’t deal with popular URLs Doesn’t analyze contents of pages when it is computing the most related pages Related Art (1)

Related Art (2) Google Search Engine: Use ‘PageRank’ to prioritize the results. PageRank Page A has pages T1…Tn which point to it d: [0…1] damping factor C(A): # of pages going out of page A PR(A)=(1-d) + d( PR(T1)/C(T1)+…+PR(Tn)/C(Tn) ) Sum of all pages’ PageRank = 1 Problems: Ranking is independent of the search query No provision for externally evaluating sites Related Art (2)

R. Lempel, S. Moran ‘SALSA’ (Stochastic Approach for Link-Structure Analysis) Replace Kleinberg’s Mutual Reinforcement by stochastic method the coupling between hubs & authorities is less tight Based upon Markov chains Related Art (3)

Summery of Invention (1) Invention provides “a method whereby a linear combination of matrices (pages’ information) can be used to rank the pages” Highly relevant results to the user’ search The coefficients of the eigenvector provide a measure of the quality of each page in related to the other pages. Determining ranking categories based on # of pages to be ranked Classifying each page in one of the categories Summery of Invention (1)

Summery of Invention (2) A fixed amount of storage for representing the rank of each page Each bit represents one of the categories Bit assigned to the page: The rank of each page The eigenvector coefficients of neighboring pages can be used to generate a hub score => Small amount of storage and computational resources Summery of Invention (2)

Fig 1. block diagram of a hyperlinked environment Detailed Description Improve Ranking Method Fig 1. block diagram of a hyperlinked environment

A flow diagram of a method for ranking pages

Determine Matrices to Include in Leaner Combination 202 An example of matrices that maybe used in a method

Building Neighborhood Graph Assume that: Related pages will tend to be ‘near’ the selected page The same keywords appear as part of the content of related pages. an initial page is selected, page linked to that are represented as a graph in a memory Patent ‘Method for identification related page in a hyperlinked database’ Building Neighborhood Graph

Building Adjacency Matrix - a collection ‘C’ of web-sites - a given topic ‘t’ - a root set ‘S’ of sites - Search engine query ‘q’ From S a base C which consist of Sites in the root set S which point a site in S By using a search engine that stores linkage information Patent ‘Web Page Connectivity Server’ Which are pointed by a set in S C and its link structure directed Graph G Directed edge ‘ij’ appears in G, if site ‘i‘ consist of a link to site ‘j’ |C|*|C| is adjacency matrix of G Building Adjacency Matrix

Determining Attractor Matrices Indication that is provided by viewer Computerize utility program Analyzes content and recognizes Keywords, key phrases, page links, … Possible to be runtime or offline and update periodically Co-citation(GTG) # of sites jointly cite the page index by i & j Bibliographic coupling (GGT) # of sites jointly referred to by the page index by i & j Matrices can be included in linear combination Determining Attractor Matrices

A flow diagram of a method for ranking pages according eigenvector coefficients

Ranking according to eigenvector coefficient Minimal storage space Neighborhood graph may be so large Power low distribution # sites whose eigenvector coefficient have a value that is less than a chosen number Generating a hub score for one or more pages Based on the sum (or a function of sum) of the eigenvector coefficient of neighboring pages indication regarding the quality of pages as a hub or directory of other pages provide information that is valuable for the user. Ranking according to eigenvector coefficient

Ranking according to eigenvector coefficient Example: result of a query 0.5 billion pages (distributed geometrically) 1st category, high ranked page (50 pages) 2nd category, next high rank paged (geometric multiple of 50 pages) … Each page is assigned to a category by designating a corresponding bit from a multi bit word If 10 bits per page are allotted, 1024 categories are available Ranking according to eigenvector coefficient

Advantage & Applications The present invention is capable of being distributed as a program product in a variety of forms Each block diagram component, flowchart step, etc can be implemented by a wide range of hardware, software, firmware or …. Advantage & Applications

Thank you & Best wishes