Presentation is loading. Please wait.

Presentation is loading. Please wait.

The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.

Similar presentations


Presentation on theme: "The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun."— Presentation transcript:

1 The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun

2 2 Contents Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion

3 3 Motivation  Web: heterogeneous and unstructured  Free of quality control on the web  Commercial interest to manipulate ranking

4 4 Related Work Academic citation analysis Link based analysis Clustering methods of link structure Hubs & Authorities Model based on an eigenvector calculation

5 5 hubs Hubs & Authorities Model authorities

6 6 Hubs & Authorities Model Mutually reinforcing relationship “ A good hub is a page that points to many good authorities ” “ A good authority is a page that is pointed by many good hub ”

7 7 Link Structure of the Web Forward links (outedges) Backlinks (inedges) Approximation of importance / quality

8 8 PageRank A page has high rank if the sum of the ranks of its backlinks is high Backlinks coming from important pages convey more importance to a page Problem: Dangling Links, Rank Sink

9 9 Dangling Links

10 10 PageRank Calculation Given: R(u) = Rank of u, R(v) = Rank of v, c < 1 (used for normalization) N v = number of link from v B u = the set of pages that point to u

11 11 PageRank Calculation 10050 9 3 3 3 53 50

12 12 Page cycles pointed by some incoming link Problem: Ranking increase, don ’ t effect any rank outside Rank Sink.6

13 13 Escape Term Solution: Rank Source E(u) is some vector over the web pages – uniform, favorite page etc.

14 14 R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized Matrix Notation

15 15 Computing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks - compute normalizing factor - add escape term - control parameter While - stop when converged

16 16  Page Rank vs. Random Surfer Model  E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever” Random Surfer Model

17 17 Implementation Computing resources — 24 million pages — 75 million URLs — Process 550 pages/sec Memory and disk storage Weight Vector (4 byte float) Matrix A (linear access)

18 18 Implementation Assign a unique integer ID Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re- compute

19 19 Convergence Properties Using theory of random walks on graphs O(log(|V|)) due to rapidly mixing graph G of the web.

20 20 Convergence Properties

21 21 Searching with PageRank Using title search Comparing with Altavista

22 22 Sample Results

23 23 Some Applications Estimate web traffic Backlink predictor User Navigation

24 24 Conclusion PageRank is a global ranking based on the web's graph structure PageRank uses backlinks information to bring order to the web PageRank can separate out representative pages as cluster center A great variety of applications


Download ppt "The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun."

Similar presentations


Ads by Google