Download presentation
Presentation is loading. Please wait.
Published byHilary McDaniel Modified over 8 years ago
1
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun
2
2 Contents Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion
3
3 Motivation Web: heterogeneous and unstructured Free of quality control on the web Commercial interest to manipulate ranking
4
4 Related Work Academic citation analysis Link based analysis Clustering methods of link structure Hubs & Authorities Model based on an eigenvector calculation
5
5 hubs Hubs & Authorities Model authorities
6
6 Hubs & Authorities Model Mutually reinforcing relationship “ A good hub is a page that points to many good authorities ” “ A good authority is a page that is pointed by many good hub ”
7
7 Link Structure of the Web Forward links (outedges) Backlinks (inedges) Approximation of importance / quality
8
8 PageRank A page has high rank if the sum of the ranks of its backlinks is high Backlinks coming from important pages convey more importance to a page Problem: Dangling Links, Rank Sink
9
9 Dangling Links
10
10 PageRank Calculation Given: R(u) = Rank of u, R(v) = Rank of v, c < 1 (used for normalization) N v = number of link from v B u = the set of pages that point to u
11
11 PageRank Calculation 10050 9 3 3 3 53 50
12
12 Page cycles pointed by some incoming link Problem: Ranking increase, don ’ t effect any rank outside Rank Sink.6
13
13 Escape Term Solution: Rank Source E(u) is some vector over the web pages – uniform, favorite page etc.
14
14 R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized Matrix Notation
15
15 Computing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks - compute normalizing factor - add escape term - control parameter While - stop when converged
16
16 Page Rank vs. Random Surfer Model E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever” Random Surfer Model
17
17 Implementation Computing resources — 24 million pages — 75 million URLs — Process 550 pages/sec Memory and disk storage Weight Vector (4 byte float) Matrix A (linear access)
18
18 Implementation Assign a unique integer ID Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re- compute
19
19 Convergence Properties Using theory of random walks on graphs O(log(|V|)) due to rapidly mixing graph G of the web.
20
20 Convergence Properties
21
21 Searching with PageRank Using title search Comparing with Altavista
22
22 Sample Results
23
23 Some Applications Estimate web traffic Backlink predictor User Navigation
24
24 Conclusion PageRank is a global ranking based on the web's graph structure PageRank uses backlinks information to bring order to the web PageRank can separate out representative pages as cluster center A great variety of applications
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.