Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from

Similar presentations


Presentation on theme: "1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from"— Presentation transcript:

1 1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from http://4.bp.blogspot.com/_ZoVPDQT8m6o/SbbYNkPJCnI/AA AAAAAAAEM/c2ueV-36llY/s320/relevance2.JPG http://4.bp.blogspot.com/_ZoVPDQT8m6o/SbbYNkPJCnI/AA AAAAAAAEM/c2ueV-36llY/s320/relevance2.JPG

2 2 This presentation explains how to find the authoritative sources to a broad search query in WWW.

3 3 Why authorities are important in broad search queries. Is a web page the authority?

4 4 Why need to analyze the link structure to know if a web page is an authority. Content-based query failed to find TOYOTA, an authority of automobile manufactures. http://www.kolberg.co.uk/img/hubs_and_authorities.gif Link-based model for conferral of authorities

5 5 Focused subgraph of WWW is where to find authorities and hubs. 1).relatively small 2).rich in relevant pages 3).contains most of strongest authorities The process to obtain a focused graph of WWW Link Types

6 6 In-degree can not obtain authorities in a focused subgraph. Query:Java Definition of in-degreesTop 3 pages with large number of in-degrees How to filter out the popularities?

7 7 The iteration flow to compute weights or scores. Hubs and authorities reinforcing approach can find authorities in a focused subgraph. Heuristics: 1).An authority is pointed by good hubs. 2).A hub is pointing to good authorities. Broad-topic

8 8 Authority and hub Model analysis shows this method is convergent and eigenvector based. Theorem 3.1 The sequences x1; x2; x3; … and y1; y2; y3; … converge (to limits x* and y* respectively). A is a adjacent matrix of the focused subgraph. We have authority score vector: and hub score vector: Eigenvalue Assumption:then we refer to as principal eigenvector; otheras non-principal eigenvector; Theorem 3.2 (Subject to Assumption (#).) x* is the principal eigenvector of, and y* is the principal eigenvector of.

9 9 Experiment results of hubs and authorities scoring approach are compelling. Top pages are highly relevant to the query.

10 10 It’s very compelling to use authority and hub scoring approach to find similar pages. 1).Expand a highly referenced page to the focused graph. 2).Use authority and hub approach to find similar pages. 3).Top pages are highly relevant to the query page.

11 11 There are other link-based ranking approaches in different academic fields. Similar concepts:ranking,scoring,standing,impact and influence. How to measure them?

12 12 How to obtain clustering sets of hubs and authorities. Why multiple sets exist for a broad topic query? 1. Multiple meanings. i.e. “jaguar” 2. Multiple academic communities. i.e. “randomized algorithms” Proposition 6.1 and have the same multiset of eigenvalues, and their eigenvectors can be chosen so that. This proposition shows that the authority ranking x* and hub ranking y* can reinforce each other in the pair of egenvectors. An eigenvector  a set of authorities

13 13 Results of clustering sets of hubs and authorities. Broad-topic:Jaguar

14 14 Diffusion of the hubs and authorities ranking approach. The query in the focused subgraph has no dense enough relevant pages. Then broader topic pages are returned as principal eigenvector.

15 15 It’s challenging to evaluate the query results. Since the quality of the query results rely on the human judgement, no quantitative measurement. The authority and hub ranking approach is implemented in the CLEVER project. The further information can refer to the presentations by Hira Bahir and Ray Yamada separately.

16 16 Procs This paper utilizes AUTHOTITY concept to query highly relevant pages to a broad topic in an effective and efficient way by analyzing the link structure in a focused subgraph of WWW. The AUTHORITY and HUB ranking approach can not only be used in the WWW ranking but also can be utilized in other academic fields, like social work and scientific citations. A focused graph link structure is maintained instead of maintaining the entire link structure of WWW, therefore the storage and efficiency have higher performance. HUB is a compelling concept paired with AUTHORITIES. HUB plays a critical role to find AUTHORITIES in WWW. HUB concept is innovative since it can not be found in other bibliometrics. Even eigenvector-base method is not first presented in this paper. It is very effective compared with in degree ranking approach. This eigenvector based method has very natural advantages to group the relevant pages into Clusters. This AUTHORITY and HUB ranking approach can also be utilized in finding the similar pages from an interested page as well as broad topic search.

17 17 Cons The AUTHORITY and HUB ranking approach is based on heuristics. But the real WWW is far more complex than these idealized assumptions,i.e. a good HUB is pointing many good AUTHORITIES; or a good AUTHORITY is pointing to many good HUBS. How to tell out the INTRINSIC and TRANSEVERSE links in same or different domains are not that straight forward. Since some pages belong to the same business body even their domains are different. Or some pages belong to different business bodies even they are under the same domain. The base set of the focused subgraph to a broad topic query may not have enough dense relevant pages. This leads to diffusion and generalization. The assumption that the first eigenvalue of adjacent matrix is the principal one may not hold in some cases.

18 18 Q A

19 19 References: [1] E. Garfield, "Citation analysis as a tool in journal evaluation," Science, 178(1972), pp.471-479. [2] N. Geller, "On the citation influence methodology of Pinski and Narin," Inf. Proc. andManagement, 14(1978), pp. 93-95. [3] C.H. Hubbell, "An input-output approach to clique identification," Sociometry,28(1965), pp. 377-399. [4] L. Katz, "A new status index derived from sociometric analysis," Psychometrika,18(1953), pp. 39-43. [5] G. Pinski, F. Narin, "Citation influence for journal aggregates of scientific publica-tions: Theory, with application to the literature of physics," Inf. Proc. and Management,12(1976), pp. 297-312.


Download ppt "1 Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented by Yongqiang Li Adapted from"

Similar presentations


Ads by Google