Presentation is loading. Please wait.

Presentation is loading. Please wait.

Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1.

Similar presentations


Presentation on theme: "Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1."— Presentation transcript:

1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1

2 Basic Idea  R is grown to a set S so that it contains a rich amount of authoritative pages. Include any page to S that is pointed to by a page in R. R- Root set Scontains t results. RS- Base set generated from algorithm. ‘S’ is used to determine the hubs and authorities. 2

3  Get a set of results for a query string from a text based search query.  Take the top ‘t’ results out of it and put it in a set R.  For every page in set R, ◦ Add all the pages that the page points to into the set R. ◦ Add a maximum of d pages that points to the page, into the set R.  The new result set is named S. Result returned: Base set S out of which we compute the top authorities and hubs. 3

4 Heuristics To determine what pages to add to the set S.  Heuristic 1: Avoiding navigational links. ◦ Transverse links: links that are between pages with different domain names. ◦ Intrinsic links (navigational links): links that are between pages within a domain. ◦ Delete all intrinsic links.  Heuristic 2: Avoiding Mass endorsements. ◦ Mass endorsements: A large number of pages in a domain pointing to a single page. ◦ Example: “This site is designed by …” and a link. ◦ Eliminate this by setting a parameter m and allowing only m pages from a single domain to point to a page. 4

5  Extracting authorities from the overall collection of pages, through an analysis of the link structure of G.  Good hub points to many good authorities and a good authority is pointed to by many good hubs. HubsAuthoritiesunrelated page of large in-degree 5

6 Basic Idea  Each page p has a non negative authority weight and non negative hub weight.  If p points to pages with large authority weight values then the page has a large hub weight value.  If p is pointed to by pages with large hub weight values then the page has a large authority weight value.  Pages with higher weights are better authorities and hubs. 6

7  I operation: ◦ Authority weight of a page= Sum of all hub weights of pages pointing to the page.  O operation: ◦ Hub weight of a page= Sum of all authority weights of pages, this page points to.  I and O reinforce each other.  Normalization: The values of the hub and authority weights are divided with a value so that the squares of the sum doesn’t exceed 1. 7

8 Contd... q1 q2 y[p]=sum of all x[q]. page p page p q2 x[p]=sum of all y[q] q3 Operation IOperation O Decision on when to stop the reinforcing process. 1)Apply I and O operations alternatively until a fixed point is reached. 2)Choose a specific parameter ‘k’ and iterate the process only to k number of times. 8

9  Given the set of pages in the form of a graph, set an integer value for parameter k.  k is the number of time the iteration occurs.  Repeat the following process k times. ◦ Apply the I operation to a page and update its new authority weight. ◦ Apply the O operation to a page and update its hub weight. ◦ Normalize both the authority weight and the hub weight.  Return the graph with the new authority weight and hub weight for each page. 9

10 Observations  The top authorities and hubs are determined by finding the pages containing the top ‘c’ values for x and y from the graph resulted from the Iterate algorithm.  The Iterate procedure converges to fixed points x* and y* as k increases arbitrarily. ◦ Proved using principal eigenvectors.  Iterate algorithm results in densely linked collection of pages- rich in relevant pages. ◦ Most relevant collection of pages is the densest graph. 10

11 Results (java) Authorities.328 http://www.gamelan.com/ Gamelan.251 http://java.sun.com/ JavaSoft Home Pagehttp://java.sun.com/.190 http://www.digitalfocus.com/digitalfocus/faq/howdoi.html The Java Developer: HowDoI.190 http://lightyear.ncsa.uiuc.edu/srp/java/javabooks.html The Java Bookhttp://lightyear.ncsa.uiuc.edu/srp/java/javabooks.html (\search engines") Authorities.346 http://www.yahoo.com/ Yahoo!.291 http://www.excite.com/ Excite.231 http://www.lycos.com/ Lycos Home Page.231 http://www.altavista.digital.com/ AltaVista: Main Page (Gates) Authorities.643 http://www.roadahead.com/ Bill Gates: The Road Ahead.458 http://www.microsoft.com/ Welcome to Microsoft.440 http://www.microsoft.com/corpinfo/bill-g.htmhttp://www.microsoft.com/corpinfo/bill-g.htm  It was observed that the www.roadahead.com was the only site that was present in R initially.www.roadahead.com  This supports the algorithm because many of the pages don’t contain the search query in them. 11


Download ppt "Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg ACM-SIAM Symposium, 1998 Krishna Venkateswaran 1."

Similar presentations


Ads by Google