Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector.

Similar presentations


Presentation on theme: "1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector."— Presentation transcript:

1 1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector of the stochastic matrix of the Web. wA few fixups needed.

2 2 Stochastic Matrix of the Web uEnumerate pages. uPage i corresponds to row and column i. uM[i,j] = 1/n if page j links to n pages, including page i; 0 if j does not link to i. wSeems backwards, but allows multiplication by M on the left to represent “follow a link.”

3 3 Example i j Suppose page j links to 3 pages, including i 1/3

4 4 Random Walks on the Web uSuppose v is a vector whose i-th component is the probability that we are at page i at a certain time. uIf we follow a link from i at random, the probability distribution of the page we are then at is given by the vector Mv.

5 5 The multiplication p 11 p 12 p 13 p 1 p 21 p 22 p 23 X p 2 p 31 p 32 p 33 p 3 If the probability that we are in page i is p i, then in the next iteration p 1 will be the probability we are in page 1 and will stay there + the probability we are in page 2 times the probability of moving from 2 to 1 + the probability that we are in page 3 times the probability of moving from 3 to 1: p 11 x p1 + p 12 x p2+ p 13 x p3

6 6 Random Walks 2 uStarting from any vector v, the limit M(M(…M(Mv)…)) is the distribution of page visits during a random walk. uIntuition: pages are important in proportion to how often a random walker would visit them. uThe math: limiting distribution = principal eigenvector of M = PageRank.

7 7 Example: The Web in 1839 Yahoo M’softAmazon y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0 y a m

8 8 Simulating a Random Walk uStart with the vector v = [1,1,…,1] representing the idea that each Web page is given one unit of “importance.” uRepeatedly apply the matrix M to v, allowing the importance to flow like a random walk. uLimit exists, but about 50 iterations is sufficient to estimate final distribution.

9 9 Example uEquations v = Mv: wy = y/2 + a/2 wa = y/2 + m wm = a/2 y a = m 111111 1 3/2 1/2 5/4 1 3/4 9/8 11/8 1/2 6/5 3/5...

10 10 Solving The Equations uThese 3 equations in 3 unknowns do not have a unique solution. uAdd in the fact that y+a+m=3 to solve. uIn Web-sized examples, we cannot solve by Gaussian elimination (we need to use other solution (relaxation = iterative solution).

11 11 Real-World Problems uSome pages are “dead ends” (have no links out). wSuch a page causes importance to leak out. uOther (groups of) pages are spider traps (all out-links are within the group). wEventually spider traps absorb all importance.

12 12 Microsoft Becomes Dead End Yahoo M’softAmazon y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 0 y a m

13 13 Example uEquations v = Mv: wy = y/2 + a/2 wa = y/2 wm = a/2 y a = m 111111 1 1/2 3/4 1/2 1/4 5/8 3/8 1/4 000000...

14 14 M’soft Becomes Spider Trap Yahoo M’softAmazon y 1/2 1/2 0 a 1/2 0 0 m 0 1/2 1 y a m

15 15 Example uEquations v = Mv: wy = y/2 + a/2 wa = y/2 wm = a/2 + m y a = m 111111 1 1/2 3/2 3/4 1/2 7/4 5/8 3/8 2 003003...

16 16 Google Solution to Traps, Etc. u“Tax” each page a fixed percentage at each iteration. This percentage is also called “damping factor”. uAdd the same constant to all pages. uModels a random walk in which surfer has a fixed probability of abandoning search and going to a random page next.

17 17 Ex: Previous with 20% Tax uEquations v = 0.8(Mv) + 0.2: wy = 0.8(y/2 + a/2) + 0.2 wa = 0.8(y/2) + 0.2 wm = 0.8(a/2 + m) + 0.2 y a = m 111111 1.00 0.60 1.40 0.84 0.60 1.56 0.776 0.536 1.688 7/11 5/11 21/11...

18 18 Solving the Equations uWe can expect to solve small examples by Gaussian elimination. uWeb-sized examples still need to be solved by more complex (relaxation) methods.

19 19 Search-Engine Architecture uAll search engines, including Google, select pages that have the words of your query. uGive more weight to the word appearing in the title, header, etc. uInverted indexes speed the discovery of pages with given words.

20 20 Google Anti-Spam Devices uEarly search engines relied on the words on a page to tell what it is about. wLed to “tricks” in which pages attracted attention by placing false words in the background color on their page. uGoogle trusts the words in anchor text wRelies on others telling the truth about your page, rather than relying on you.

21 21 Use of Page Rank uPages are ordered by many criteria, including the PageRank and the appearance of query words. w“Important” pages more likely to be what you want. uPageRank is also an antispam device. wCreating bogus links to yourself doesn’t help if you are not an important page.

22 22 Discussion uDealing with incentives uSeveral types of links uPage ranking as voting

23 23 Hubs and Authorities Distinguishing Two Roles for Pages

24 24 Hubs and Authorities uMutually recursive definition: wA hub links to many authorities; wAn authority is linked to by many hubs. uAuthorities turn out to be places where information can be found. wExample: information about how to use a programming language uHubs tell who the authorities are. wExample: a catalogue of sources about programming languages

25 25 Transition Matrix A uH&A uses a matrix A[i,j] = 1 if page i links to page j, 0 if not. uA’, the transpose of A, is similar to the PageRank matrix M, but A’ has 1’s where M has fractions.

26 26 Example Yahoo M’softAmazon y 1 1 1 a 1 0 1 m 0 1 0 y a m A =

27 27 Using Matrix A for H&A uLet h and a be vectors measuring the “hubbiness” and authority of each page. uEquations: h = Aa; a = A’ h. wHubbiness = scaled sum of authorities of linked pages. wAuthority = scaled sum of hubbiness of linked predecessors.

28 28 Consequences of Basic Equations uFrom h = Aa; a = A’ h we can derive: wh = AA’ h wa = A’Aa uCompute h and a by iteration, assuming initially each page has one unit of hubbiness and one unit of authority. uThere are different normalization techniques (after each iteration in an iterative procedure; other implementation is “normalization at end”).

29 29 The multiplication 1 1 1 a 1 h 1 1 0 1 x a 2 = h 2 0 1 0 a 3 h 3 In order to know the hubbiness of page 2, h 2, we need to add up the level of authority of the pages it points to (1 and 3).

30 30 The multiplication 1 1 0 h 1 a 1 1 0 1 x h 2 = a 2 1 1 0 h 3 a 3 In order to know the level authority of page 3, a 3, we need to add up the amount of hubbiness of the pages that point to it (1 and 2).

31 31 Example 1 1 1 A = 1 0 1 0 1 0 1 1 0 A’ = 1 0 1 1 1 0 3 2 1 AA’= 2 2 0 1 0 1 2 1 2 A’A= 1 2 1 2 1 2 a(yahoo) a(amazon) a(m’soft) ====== 111111 545545 24 18 24 114 84 114... 1+sqrt(3) 2 1+sqrt(3) h(yahoo) = 1 h(amazon) = 1 h(m’soft) = 1 642642 132 96 36... 1.000 0.735 0.268 28 20 8

32 32 Solving the Equations uSolution of even small examples is tricky. uAs for PageRank, we need to solve big examples by relaxation.


Download ppt "1 Page Rank uIntuition: solve the recursive equation: “a page is important if important pages link to it.” uIn technical terms: compute the principal eigenvector."

Similar presentations


Ads by Google