Presentation is loading. Please wait.

Presentation is loading. Please wait.

The effect of New Links on Google Pagerank By Hui Xie Apr, 07.

Similar presentations


Presentation on theme: "The effect of New Links on Google Pagerank By Hui Xie Apr, 07."— Presentation transcript:

1 The effect of New Links on Google Pagerank By Hui Xie Apr, 07

2 Computing PageRank Matrix representation Let P be an n  n matrix and p ij be the entry at the i-th row and j-th column. If page i has k>0 outgoing links p ij = 1/k if page i has a link to page j p ij = 0 if there is no link from i to j If page I has no outgoing links p ij = 1/n j=1,…,n

3 Google matrix G=cP+(1-c)(1/n)ee T e=(1,…,1) T G is stochastic matrix Ge=e There exists a unique column vector π such that π T G= π T, π T e=1 π T =(1-c)/n e T (I-cP) -1

4 Discrete Time Markov Chains A sequence of random variables {X n } is called a Markov chain if it has the Markov property: States are usually labeled {(0,)1,2,…} State space can be finite or infinite

5 Transition Probability Probability to jump from state i to state j Assume stationary: independent of time Transition probability matrix: P = (p ij ) Two state MC:

6 Side Topic: Markov Chains A discrete time stochastic process is a sequence of random variables {X 0, X 1, …, X n, …} where the 0, 1, …, n, … are discrete points in time. A Markov chain is a discrete-time stochastic process defined over a finite (or countably infinite) set of states S in terms of a matrix P of transition probabilities. Memorylessness property: for a Markov chain Pr[X t+1 = j | X 0 = i 0, X 1 = i 1, …, X t = i] = Pr[X t+1 = j | X t = i]

7 Side Topic: Markov Chains Let  i (t) be the probability of being in state i at time step t. Let  (t) = [  0 (t),  1 (t), … ] be the vector of probabilities at time t. For an initial probability distribution  (0), the probabilities at time n are  (n) =  (0) P n A probability distribution  is stationary if  =  P P(X m+n =j|X m = i) = P(X n =j|X 0 = i) = P n (i,j)

8 absorbing Markov chain Define a discrete-time absorbing markov chain {X t, t=0,1,…}with the state space {0,1,…,n} Where transitions between the states 1,…, n are conducted by the matrix cP, and the state 0 is absorbing. The transition matrix is

9 Random walk interpretation Walk starts at a uniformly chosen web page At each step, if currently at page p W/p , go to a uniformly chosen outneighbor of p W/p 1 - , stop

10 Let N j be the total number of visits to state j before absorption including the visit at time t = 0 if X 0 is j. Formally, Then z ij =(I-cP) -1 ij =E(N j |X 0 =I) Let q ij be the probability of reaching the state j before absorption if the initial state is i. Then we have

11 Theorem Let X denote a Markov chain with state space E. The total number of visits to a state j ∈ E under the condition that the chain starts in state i is given by P(N j =m|X 0 =j)=q jj m-1 (1-q jj ) and for i!=j P(N j =m|X 0 =i)= 1-q ij if m=0 q ij q jj m-1 (1-q jj ) if m>=1 Corollary For all i,j ∈ E the relations z ij =(1-q ii ) -1 and z ij =q ij z jj hold

12 Outgoing links from i do not affect q ji for any j!=I So by changing the outgoing links, a page can control its PageRank up to multiplication by a factor z ii =1/(1-q ii ) For 0<=q ii <=c 2, 1<=z ii <=(1-c 2 ) -1 ≈3.6 for c=0.85

13 Rank one update of google pagerank Page 1 with k 0 old links has k 1 newly created links to page 2 to k 1 +1 k=k 0 +k 1, p 1 T be the first row of matrix P Updated hyperlink matrix

14

15 According to (9) the ranking of page 1 increases when For z 11 =1/(1-q 11 ), z i2 =q i1 z 11, i>1 The above is equivalent to

16 Hence, the page 1 increases its ranking when it refers to pages that are characterized by a high value of q i1. These must be the pages that refer to page 1 or at least belong to the same Web community. Here by a Web community we mean a set of Web pages that a surfer can reach from one to another in a relatively small number of steps.

17 the PageRank of page j increases if

18 if several new links are added then the PageRank of page j might actually decrease even if this page receives one of the new links. Such situation occurs when most of newly created links point to “ irrelevant ” pages.

19 For instance, let j = 2 and assume that there is no hyperlink path from pages 3, …,k+1 to page 2.Then z ij is close to zero for i = 3, …, k + 1, and the PageRank of page 2 will increase only if (c/k 1 )z 22 > z 12, which is not necessarily true, especially if z 12 and k 1 are considerably large.

20 Asymptotic analysis Let be the stopping time of the first visit to the state j M ij =E( |X 0 =i) be the average time needed to reach j starting from i(mean first passage time)

21 Consider a page i = 1, …,n and assume that i has links to pages i 1, …,i k distinct from i. Further, let m ij (c) be the mean first passage time from page i to page j for the Google transition matrix G with parameter c. Optimal Linking Strategy

22 outgoing links from i do not affect m ji (c) for any j!= i. Thus, by linking from i to j, one can only alter k, this means that the owner of the page I has very little control over its pagerank. The best that he can do is to link only to one page j * such that Note that (surprisingly) the PageRank of j* plays no role here.

23 Theorem. The optimal linking strategy for a Web page is to have only one outgoing link pointing to a Web page with a shortest mean first passage time back to the original page.

24 Conclusions Our main conclusion is that a Web page cannot significantly manipulate its PageRank by changing its outgoing links. Furthermore, keeping a logical hyperlink structure and linking to a relevant Web community is the most sensible and rewarding policy.


Download ppt "The effect of New Links on Google Pagerank By Hui Xie Apr, 07."

Similar presentations


Ads by Google