Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,

Similar presentations


Presentation on theme: "Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,"— Presentation transcript:

1 Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen

2 Nov.2, 20052 Outline Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm Experiments Conclusion and Future Work

3 Nov.2, 20053 PageRank - Background Ranking Web pages Content-based methods Link-based methods PageRank[Page & Brin, 1998] HITS[Kleinberg, 1998] SALSA[Lempel & Moran, 2000]

4 Nov.2, 20054 PageRank - Intuition Page A points to B means that the author of A recommends B. A page is of high quality if it is referred to by many other pages referred to by pages of high quality

5 Nov.2, 20055 PageRank - Model Random Surfer - Markov Chain

6 Nov.2, 20056 PageRank - Algorithm Power method

7 Nov.2, 20057 Outline Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm Experiments Conclusion and Future Work

8 Nov.2, 20058 Motivation Compass search engine confederation

9 Nov.2, 20059 Motivation (cont.)

10 Nov.2, 200510 Basic Idea Divide and conquer Make use of the natural block structure of web graphs

11 Nov.2, 200511 DPC Algorithm Step 1 - Initialization Local nodes compute local PageRank vectors.

12 Nov.2, 200512 DPC Algorithm (cont.) Step 2 - Aggregation Central node computes the NodeRank vector.

13 Nov.2, 200513 DPC Algorithm (cont.) Step 3 - Disaggregation Local nodes compute extended local PageRank vectors. X: External nodes

14 Nov.2, 200514 DPC Algorithm (cont.) Step 4 - Central node computes the L1 distance between current global PageRank vector and previous one.

15 Nov.2, 200515 Advantages DPC mainly consists of standard PageRank computation. Small matrices fit into main memory. Low communication overhead.

16 Nov.2, 200516 Outline Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm Experiments Conclusion and Future Work

17 Nov.2, 200517 Experimental Setup Simulation on a single Linux box. Group web pages by sites. For comparison Classic power method LPR-Ref-2 algorithm in [Wang, VLDB 2004]

18 Nov.2, 200518 Data Sets ST01/03 - crawled in 2001/2003 by Stanford WebBase Project CN04 - crawled in 2004 from web sites in China.

19 Nov.2, 200519 Evaluation Metrics L1 distance Kendall's τ-distance if page i and j are in different order in the two ranking lists.

20 Nov.2, 200520 Accuracy of the First Iteration L1 Kendall

21 Nov.2, 200521 Convergence Rate Number of iteration for convergence ( )

22 Nov.2, 200522 Outline Quick Review of PageRank Distributed PageRank Computation Experiments Conclusion and Future Work

23 Nov.2, 200523 Conclusion A distributed PageRank computation algorithm based on iterative aggregation- disaggregation (IAD) methods with Block Jacobi smoothing. Experiments on real web graphs show that DPC outperforms LPR-Ref-2[Wang, VLDB'04], and converges 5~7 times faster than Power method.

24 Nov.2, 200524 Future Work Implement DPC in distributed system. Integrate with Compass search engine confederation. How to update PageRank vectors efficiently within DPC framework?

25 Nov.2, 200525 Thank you !

26 Nov.2, 200526 General PageRank Algorithm

27 Nov.2, 200527 IAD Method - Notations Aggregation matrix(n×N) Disaggregation matrix(N×n)

28 Nov.2, 200528 IAD Method

29 Nov.2, 200529 DPC Algorithm

30 Nov.2, 200530 DPC Algorithm (Cont.)

31 Nov.2, 200531 DPC Algorithm (Cont.)

32 Nov.2, 200532 DPC - Convergence Analysis The global convergence of IAD method is still an open problem. The difficulty partly comes from that the disaggregation step is non-linear. The paper proves the global convergence of Block Jacobi method in PageRank scenario when n > 2.

33 Nov.2, 200533 Experiments - Basic Facts Distribution over size of sites Distribution over number of pages hosted by sites of different size

34 Nov.2, 200534 Experiments - Communication Overhead PowerLPR-Ref-2 / DPC Pos() - Number of positive elements L/U - Block strictly lower/upper triangular part of P


Download ppt "Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,"

Similar presentations


Ads by Google