Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.

Similar presentations


Presentation on theme: "Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić."— Presentation transcript:

1 Ljiljana Rajačić

2 Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić

3 Page Rank Two challenges of web search 1.Web contains many sources of information Who to trust? 2.What is the “best” answer to a query? No single right answer Not all web pages are equally “important” Ljiljana Rajačić 3 / 25

4 Page Rank Link analysis approaches  Rank pages (nodes) by analyzing topology of the web graph  Idea: Links as votes -Page is more important if it has more links adjacent to it  Incoming links? Outgoing links?  Links from important pages have higher weight => recursive problem! Ljiljana Rajačić 4 / 25

5 Page Rank Ljiljana Rajačić 5 / 25

6 Page Rank Link weight proportional to the importance of its source page If page j with importance r j has n out-links, each link gets r j / n votes Page j ‘s own importance is the sum of the votes on its in-links Ljiljana Rajačić 6 / 25

7 Page Rank A page is important if it is pointed to by other important pages Rank r j of page j : d i out-degree of node i Ljiljana Rajačić 7 / 25

8 Page Rank Ljiljana Rajačić 8 / 25

9 Page Rank Ljiljana Rajačić 9 / 25

10 Page Rank Since Flow equasion in the matrix form: Ljiljana Rajačić 10 / 25 M ∙ r = r Page i links to 3 pages, including j

11 Page Rank x is an eigenvector with the corresponding eigenvalue λ if Since  Rank vector r is an eigenvector of web matrix M, with corresponding eigenvalue 1 We can now efficiently find r ! Power iteration method Ljiljana Rajačić 11 / 25 Mx = λ x M ∙ r = r

12 Page Rank Ljiljana Rajačić 12 / 25 d i – out-degree of node i

13 Page Rank Page rank simulates a random web surfer:  At any time t, surfer is on some page i  At t + 1, he follows an out-link from i uniformly at random  Ends up on some page j linked from i Rank vector r is a stationary distribution of probabilities that a random walker is on page i at arbitrary time t Ljiljana Rajačić 13 / 25

14 Page Rank Ljiljana Rajačić 14 / 25 Does this converge? Does it converge to what we want? Are the results reasonable?

15 Page Rank Ljiljana Rajačić 15 / 25 All out-links are within an isolated group Spider traps absorbe all rank eventually

16 Page Rank At each step, random surfer has 2 options:  Follow a random link with probability β  Jump to random page with probability 1 – β  β is usually in range 0.8 – 0.9 Ljiljana Rajačić 16 / 25

17 Page Rank Ljiljana Rajačić 17 / 25 A dead end is a page with no out-links They cause rank “leaking out” All 0 in b’s column

18 Page Rank Always jump to random page from a dead end Ljiljana Rajačić 18 / 25

19 Page Rank PageRank equation [Brin – Page, 1998]: Google matrix A: Ljiljana Rajačić 19 / 25 e – vector of all 1s

20 Page Rank Key step is matrix – vector multiplication A is dense – no 0 elements M was sparse  only ~ 10 – 100 non-zero elements per column We want to work with M It’s possible! Ljiljana Rajačić 20 / 25

21 Page Rank Ljiljana Rajačić 21 / 25

22 Page Rank Ljiljana Rajačić 22 / 25

23 Page Rank CPU  Graph representation: Adjecency list  O(m) per iteration, where m is the number of edges  m = O(n) => O(n) per iteration CUDA  Graph representation: Adjecency matrix  O(n 2 ) per iteration Ljiljana Rajačić 23 / 25

24 Page Rank Ljiljana Rajačić 24 / 25 Number of pagesCPUCUDA 300290 ms340 ms 400570 ms380 ms 500860 ms550 ms >850000~6.5 sMemory overflow

25 Page Rank Thanks for the attention! Ljiljana Rajačić 25 / 25


Download ppt "Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić."

Similar presentations


Ads by Google