Methods of Computing the PageRank Vector Tom Mangan.

Methods of Computing the PageRank Vector Tom Mangan

Brief History of Web Search Boolean term matching

Brief History of Web Search Boolean term matching Sergey Brin and Larry Page Reputation based ranking PageRank

Reputation Count links to a page Weight links by how many come from a page Further weight links by the reputation of the linker

12 34 56

12 34 56 Link Matrix

Calculating Rank Where: = the set of all pages linking to P = # of links from page Q

The PageRank Vector Define:

From our earlier mini-web:

Taken one row at a time: where

Iterating this equation is called the Power Method where

Iterating this equation is called the Power Method and we define the PageRank vector: where

Convergence requires: Power Method irreducibility (Perron-Frobenius Thm)

Definitions Markov chain The conditional probability of each future state depends only on the present state Markov matrix Transition matrix of a Markov chain

Transition Matrix From our earlier mini-web:

Markov Matrix Properties Row-stochastic Stationary vector gives long-term probability of each state All eigenvalues λ ≤ 1

not row-stochastic

Define a vector a such that: Then we obtain a row-stochastic matrix:

S may or may not be reducible, so we make one more fix: The Google Matrix: Now G is a positive, irreducible, row-stochastic matrix, and the power method will converge, but we’ve lost sparsity.

Note that:

so now the power method looks like:

Power method converges at the same rate as thus

12 34 56 Link Matrix

A Linear System Formulation Amy Langville and Carl Meyer Exploit dangling nodes Solve a system instead of iterating

By Langville and Meyer, solving the system and letting produces the PageRank vector (proof omitted)

Exploiting Dangling Nodes: Re-order the rows and columns of H such that

Exploiting Dangling Nodes: Re-order the rows and columns of H such that then

has some nice properties that simplify solving the linear system. Non-singular

Source: L&M, A Reordering for the PageRank Problem

Langville and Meyer Algorithm 1 Re-order rows and columns so that dangling nodes are lumped at bottom Solve Compute Normalize

Improvement In testing, Algorithm 1 reduces the time necessary to find the PageRank vector by a factor of 1-6 This time is data-dependent

Further Improvement? First improvement came from finding zero rows in Now find zero rows in

Langville and Meyer Algorithm 2 Reorder rows and columns so that all submatrices have zero rows at bottom Solve For i = 2 to b, compute Normalize

Problem with Algorithm 2 Finding submatrices of zero rows takes longer than time saved in solve step L & M wait until all submatrices are reordered to solve primary

Proposal As each submatrix is isolated, send it out for parallel solving

Sources DeGroot, M. and Schervish, M., Probability and Statistics, 3rd Ed., Addison Wesley, 2002 Langville, A. and Meyer, C., A Reordering for the PageRank Problem, Journal of Scientific Computing, Vol. 27 No. 6, 2006 Langville, A. and Meyer, C., Deeper Inside PageRank, 2004 Lee, C., Golub, G. and Zenios, S., A Fast Two-Stage Algorithm for Computing PageRank, undated Rebaza, J., Lecture Notes

Methods of Computing the PageRank Vector Tom Mangan.

Similar presentations

Presentation on theme: "Methods of Computing the PageRank Vector Tom Mangan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Methods of Computing the PageRank Vector Tom Mangan.

Similar presentations

Presentation on theme: "Methods of Computing the PageRank Vector Tom Mangan."— Presentation transcript:

Similar presentations

About project

Feedback