Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.

Slides:



Advertisements
Similar presentations
Hubs and Authorities on the world wide web (most from Rao’s lecture slides) Presentor: Lei Tang.
Advertisements

Mining Web’s Link Structure Sushanth Rai University of Texas at Arlington
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
Our purpose Giving a query on the Web, how can we find the most authoritative (relevant) pages?
Authoritative Sources in a Hyperlinked environment Jon M. Kleinberg
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Item Selection By “Hub-Authority” Profit Ranking Presented by: Thomas Su.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
The PageRank Citation Ranking “Bringing Order to the Web”
Authoritative Sources in a Hyperlinked Environment By: Jon M. Kleinberg Presented by: Yemin Shi CS-572 June
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Advances & Link Analysis
1 ICS 215: Advances in Database Management System Technology Spring 2004 Professor Chen Li Information and Computer Science University of California, Irvine.
Link Structure and Web Mining Shuying Wang
(hyperlink-induced topic search)
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Order Out of Chaos Analyzing the Link Structure of the Web for Directory Compilation and Search. Presented by Benjy Weinberger.
Computer Science 1 Web as a graph Anna Karpovsky.
Prestige (Seeley, 1949; Brin & Page, 1997; Kleinberg,1997) Use edge-weighted, directed graphs to model social networks Status/Prestige In-degree is a good.
Link Analysis HITS Algorithm PageRank Algorithm.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Internet. Internet is Is a Global network Computers connected together all over that world. Grew out of American military.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presentation by Julian Zinn.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Web Intelligence Web Communities and Dissemination of Information and Culture on the www.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Web Mining Class Nam Hoai Nguyen Hiep Tuan Nguyen Tri Survey on Web Structure Mining
Link Analysis on the Web An Example: Broad-topic Queries Xin.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Overview of Web Ranking Algorithms: HITS and PageRank
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
Analysis of Link Structures on the World Wide Web and Classified Improvements Greg Nilsen University of Pittsburgh April 2003.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
CS155b: E-Commerce Lecture 16: April 10, 2001 WWW Searching and Google.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
1 CS 430: Information Discovery Lecture 5 Ranking.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Dec.
CSE326: Data Structures World Wide What? Hannah Tang and Brian Tjaden Summer Quarter 2002.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
HITS Hypertext-Induced Topic Selection
Link-Based Ranking Seminar Social Media Mining University UC3M
Greg Nilsen University of Pittsburgh April 2003
Lecture 22 SVD, Eigenvector, and Web Search
Anatomy of a search engine
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Authoritative Sources in a Hyperlinked environment Jon M. Kleinberg
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
COMP5331 Web databases Prepared by Raymond Wong
Digital Libraries IS479 Ranking
Presentation transcript:

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

Basic Idea  R is grown to a set S so that it contains a rich amount of authoritative pages. Include any page to S that is pointed to by a page in R. R- Root set Scontains t results. RS- Base set generated from algorithm. ‘S’ is used to determine the hubs and authorities. 2

 Get a set of results for a query string from a text based search query.  Take the top ‘t’ results out of it and put it in a set R.  For every page in set R, ◦ Add all the pages that the page points to into the set R. ◦ Add a maximum of d pages that points to the page, into the set R.  The new result set is named S. Result returned: Base set S out of which we compute the top authorities and hubs. 3

Heuristics To determine what pages to add to the set S.  Heuristic 1: Avoiding navigational links. ◦ Transverse links: links that are between pages with different domain names. ◦ Intrinsic links (navigational links): links that are between pages within a domain. ◦ Delete all intrinsic links.  Heuristic 2: Avoiding Mass endorsements. ◦ Mass endorsements: A large number of pages in a domain pointing to a single page. ◦ Example: “This site is designed by …” and a link. ◦ Eliminate this by setting a parameter m and allowing only m pages from a single domain to point to a page. 4

 Extracting authorities from the overall collection of pages, through an analysis of the link structure of G.  Good hub points to many good authorities and a good authority is pointed to by many good hubs. HubsAuthoritiesunrelated page of large in-degree 5

Basic Idea  Each page p has a non negative authority weight and non negative hub weight.  If p points to pages with large authority weight values then the page has a large hub weight value.  If p is pointed to by pages with large hub weight values then the page has a large authority weight value.  Pages with higher weights are better authorities and hubs. 6

 I operation: ◦ Authority weight of a page= Sum of all hub weights of pages pointing to the page.  O operation: ◦ Hub weight of a page= Sum of all authority weights of pages, this page points to.  I and O reinforce each other.  Normalization: The values of the hub and authority weights are divided with a value so that the squares of the sum doesn’t exceed 1. 7

Contd... q1 q2 y[p]=sum of all x[q]. page p page p q2 x[p]=sum of all y[q] q3 Operation IOperation O Decision on when to stop the reinforcing process. 1)Apply I and O operations alternatively until a fixed point is reached. 2)Choose a specific parameter ‘k’ and iterate the process only to k number of times. 8

 Given the set of pages in the form of a graph, set an integer value for parameter k.  k is the number of time the iteration occurs.  Repeat the following process k times. ◦ Apply the I operation to a page and update its new authority weight. ◦ Apply the O operation to a page and update its hub weight. ◦ Normalize both the authority weight and the hub weight.  Return the graph with the new authority weight and hub weight for each page. 9

Observations  The top authorities and hubs are determined by finding the pages containing the top ‘c’ values for x and y from the graph resulted from the Iterate algorithm.  The Iterate procedure converges to fixed points x* and y* as k increases arbitrarily. ◦ Proved using principal eigenvectors.  Iterate algorithm results in densely linked collection of pages- rich in relevant pages. ◦ Most relevant collection of pages is the densest graph. 10

Results (java) Authorities Gamelan JavaSoft Home Pagehttp://java.sun.com/ The Java Developer: HowDoI The Java Bookhttp://lightyear.ncsa.uiuc.edu/srp/java/javabooks.html (\search engines") Authorities Yahoo! Excite Lycos Home Page AltaVista: Main Page (Gates) Authorities Bill Gates: The Road Ahead Welcome to Microsoft  It was observed that the was the only site that was present in R initially.  This supports the algorithm because many of the pages don’t contain the search query in them. 11