Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods of Computing the PageRank Vector Tom Mangan.

Similar presentations


Presentation on theme: "Methods of Computing the PageRank Vector Tom Mangan."— Presentation transcript:

1 Methods of Computing the PageRank Vector Tom Mangan

2 Brief History of Web Search Boolean term matching

3 Brief History of Web Search Boolean term matching Sergey Brin and Larry Page Reputation based ranking PageRank

4 Reputation Count links to a page Weight links by how many come from a page Further weight links by the reputation of the linker

5 12 34 56

6 12 34 56 Link Matrix

7 Calculating Rank Where: = the set of all pages linking to P = # of links from page Q

8 Calculating Rank Where: = the set of all pages linking to P = # of links from page Q

9 The PageRank Vector Define:

10 where

11 From our earlier mini-web:

12 Taken one row at a time: where

13 Iterating this equation is called the Power Method where

14 Iterating this equation is called the Power Method and we define the PageRank vector: where

15 Convergence requires: Power Method irreducibility (Perron-Frobenius Thm)

16 Definitions Markov chain The conditional probability of each future state depends only on the present state Markov matrix Transition matrix of a Markov chain

17 Transition Matrix From our earlier mini-web:

18 Markov Matrix Properties Row-stochastic Stationary vector gives long-term probability of each state All eigenvalues λ ≤ 1

19 not row-stochastic

20 Define a vector a such that: Then we obtain a row-stochastic matrix:

21 or

22 S may or may not be reducible, so we make one more fix: The Google Matrix: Now G is a positive, irreducible, row-stochastic matrix, and the power method will converge, but we’ve lost sparsity.

23 Note that:

24 so now the power method looks like:

25 Power method converges at the same rate as thus

26 12 34 56 Link Matrix

27 A Linear System Formulation Amy Langville and Carl Meyer Exploit dangling nodes Solve a system instead of iterating

28 By Langville and Meyer, solving the system and letting produces the PageRank vector (proof omitted)

29 Exploiting Dangling Nodes: Re-order the rows and columns of H such that

30 Exploiting Dangling Nodes: Re-order the rows and columns of H such that then

31 has some nice properties that simplify solving the linear system. Non-singular

32

33 Source: L&M, A Reordering for the PageRank Problem

34 Langville and Meyer Algorithm 1 Re-order rows and columns so that dangling nodes are lumped at bottom Solve Compute Normalize

35 Improvement In testing, Algorithm 1 reduces the time necessary to find the PageRank vector by a factor of 1-6 This time is data-dependent

36 Further Improvement? First improvement came from finding zero rows in Now find zero rows in

37 Source: L&M, A Reordering for the PageRank Problem

38

39 Langville and Meyer Algorithm 2 Reorder rows and columns so that all submatrices have zero rows at bottom Solve For i = 2 to b, compute Normalize

40 Problem with Algorithm 2 Finding submatrices of zero rows takes longer than time saved in solve step L & M wait until all submatrices are reordered to solve primary

41 Proposal As each submatrix is isolated, send it out for parallel solving

42 Source: L&M, A Reordering for the PageRank Problem

43 Sources DeGroot, M. and Schervish, M., Probability and Statistics, 3rd Ed., Addison Wesley, 2002 Langville, A. and Meyer, C., A Reordering for the PageRank Problem, Journal of Scientific Computing, Vol. 27 No. 6, 2006 Langville, A. and Meyer, C., Deeper Inside PageRank, 2004 Lee, C., Golub, G. and Zenios, S., A Fast Two-Stage Algorithm for Computing PageRank, undated Rebaza, J., Lecture Notes


Download ppt "Methods of Computing the PageRank Vector Tom Mangan."

Similar presentations


Ads by Google