Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing,

Slides:



Advertisements
Similar presentations
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Advertisements

Markov Models.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Information Networks Link Analysis Ranking Lecture 8.
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
Maggie Zhou COMP 790 Data Mining Seminar, Spring 2011
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
On the Construction of Energy- Efficient Broadcast Tree with Hitch-hiking in Wireless Networks Source: 2004 International Performance Computing and Communications.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Problem Addressed Attempts to prove that Web Crawl is random & biased image of Web Graph and does not assert properties of Web Graph Understanding the.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.
CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah,
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
Sensor Positioning in Wireless Ad-hoc Sensor Networks Using Multidimensional Scaling Xiang Ji and Hongyuan Zha Dept. of Computer Science and Engineering,
Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Scaling Personalized Web Search Authors: Glen Jeh, Jennfier Widom Stanford University Written in: 2003 Cited by: 923 articles Presented by Sugandha Agrawal.
Adaptive On-Line Page Importance Computation Serge, Mihai, Gregory Presented By Liang Tian 7/13/2010 1Adaptive On-Line Page Importance Computation.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
Computer Science Department, Peking University
Predictive Ranking -H andling missing data on the web Haixuan Yang Group Meeting November 04, 2004.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Ranking Link-based Ranking (2° generation) Reading 21.
1 Effect of Spatial Locality on An Evolutionary Algorithm for Multimodal Optimization EvoNum 2010 Ka-Chun Wong, Kwong-Sak Leung, and Man-Hon Wong Department.
Aspect Mining Jin Huang Huazhong University of Science & Technology, China
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
“In the beginning -- before Google -- a darkness was upon the land.” Joel Achenbach Washington Post.
Kijung Shin Jinhong Jung Lee Sael U Kang
Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.
By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Autumn Web Information retrieval (Web IR) Handout #11:FICA: A Fast Intelligent Crawling Algorithm Ali Mohammad Zareh Bidoki ECE Department, Yazd.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University.
Motivation Modern search engines for the World Wide Web use methods that require solving huge problems. Our aim: to develop multiscale techniques that.
The PageRank Citation Ranking: Bringing Order to the Web
Methods and Apparatus for Ranking Web Page Search Results
Link-Based Ranking Seminar Social Media Mining University UC3M
A Comparative Study of Link Analysis Algorithms
Lecture 22 SVD, Eigenvector, and Web Search
Particle swarm optimization
Updating PageRank by Iterative Aggregation
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Lecture 22 SVD, Eigenvector, and Web Search
Presentation transcript:

Distributed PageRank Computation Based on Iterative Aggregation- Disaggregation Methods Yangbo Zhu, Shaozhi Ye and Xing Li Tsinghua University, Beijing, China ACM CIKM 2005, Bremen

Nov.2, Outline Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm Experiments Conclusion and Future Work

Nov.2, PageRank - Background Ranking Web pages Content-based methods Link-based methods PageRank[Page & Brin, 1998] HITS[Kleinberg, 1998] SALSA[Lempel & Moran, 2000]

Nov.2, PageRank - Intuition Page A points to B means that the author of A recommends B. A page is of high quality if it is referred to by many other pages referred to by pages of high quality

Nov.2, PageRank - Model Random Surfer - Markov Chain

Nov.2, PageRank - Algorithm Power method

Nov.2, Outline Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm Experiments Conclusion and Future Work

Nov.2, Motivation Compass search engine confederation

Nov.2, Motivation (cont.)

Nov.2, Basic Idea Divide and conquer Make use of the natural block structure of web graphs

Nov.2, DPC Algorithm Step 1 - Initialization Local nodes compute local PageRank vectors.

Nov.2, DPC Algorithm (cont.) Step 2 - Aggregation Central node computes the NodeRank vector.

Nov.2, DPC Algorithm (cont.) Step 3 - Disaggregation Local nodes compute extended local PageRank vectors. X: External nodes

Nov.2, DPC Algorithm (cont.) Step 4 - Central node computes the L1 distance between current global PageRank vector and previous one.

Nov.2, Advantages DPC mainly consists of standard PageRank computation. Small matrices fit into main memory. Low communication overhead.

Nov.2, Outline Quick Review of PageRank Distributed PageRank Computation Motivation Basic Idea Algorithm Experiments Conclusion and Future Work

Nov.2, Experimental Setup Simulation on a single Linux box. Group web pages by sites. For comparison Classic power method LPR-Ref-2 algorithm in [Wang, VLDB 2004]

Nov.2, Data Sets ST01/03 - crawled in 2001/2003 by Stanford WebBase Project CN04 - crawled in 2004 from web sites in China.

Nov.2, Evaluation Metrics L1 distance Kendall's τ-distance if page i and j are in different order in the two ranking lists.

Nov.2, Accuracy of the First Iteration L1 Kendall

Nov.2, Convergence Rate Number of iteration for convergence ( )

Nov.2, Outline Quick Review of PageRank Distributed PageRank Computation Experiments Conclusion and Future Work

Nov.2, Conclusion A distributed PageRank computation algorithm based on iterative aggregation- disaggregation (IAD) methods with Block Jacobi smoothing. Experiments on real web graphs show that DPC outperforms LPR-Ref-2[Wang, VLDB'04], and converges 5~7 times faster than Power method.

Nov.2, Future Work Implement DPC in distributed system. Integrate with Compass search engine confederation. How to update PageRank vectors efficiently within DPC framework?

Nov.2, Thank you !

Nov.2, General PageRank Algorithm

Nov.2, IAD Method - Notations Aggregation matrix(n×N) Disaggregation matrix(N×n)

Nov.2, IAD Method

Nov.2, DPC Algorithm

Nov.2, DPC Algorithm (Cont.)

Nov.2, DPC Algorithm (Cont.)

Nov.2, DPC - Convergence Analysis The global convergence of IAD method is still an open problem. The difficulty partly comes from that the disaggregation step is non-linear. The paper proves the global convergence of Block Jacobi method in PageRank scenario when n > 2.

Nov.2, Experiments - Basic Facts Distribution over size of sites Distribution over number of pages hosted by sites of different size

Nov.2, Experiments - Communication Overhead PowerLPR-Ref-2 / DPC Pos() - Number of positive elements L/U - Block strictly lower/upper triangular part of P