Iterative Aggregation Disaggregation

Slides:



Advertisements
Similar presentations
LU Factorization.
Advertisements

Numerical Solution of Linear Equations
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
ACCELERATING GOOGLE’S PAGERANK Liz & Steve. Background  When a search query is entered in Google, the relevant results are returned to the user in an.
The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
Link Analysis: PageRank
Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating.
Overview of Markov chains David Gleich Purdue University Network & Matrix Computations Computer Science 15 Sept 2011.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Pádraig Cunningham University College Dublin Matrix Tutorial Transition Matrices Graphs Random Walks.
How Google Relies on Discrete Mathematics Gerald Kruse Juniata College Huntingdon, PA
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Link Analysis, PageRank and Search Engines on the Web
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.
Basic Definitions Positive Matrix: 5.Non-negative Matrix:
1 Chapter 2 Matrices Matrices provide an orderly way of arranging values or functions to enhance the analysis of systems in a systematic manner. Their.
Dominant Eigenvalues & The Power Method
PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Exploiting Web Matrix Permutations to Speedup PageRank Computation Presented by: Aries Chan, Cody Lawson, and Michael Dwyer.
Using Adaptive Methods for Updating/Downdating PageRank Gene H. Golub Stanford University SCCM Joint Work With Sep Kamvar, Taher Haveliwala.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Methods of Computing the PageRank Vector Tom Mangan.
Roshnika Fernando P AGE R ANK. W HY P AGE R ANK ?  The internet is a global system of networks linking to smaller networks.  This system keeps growing,
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Author(s): Rahul Sami and Paul Resnick, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Chapter 3 Solution of Algebraic Equations 1 ChE 401: Computational Techniques for Chemical Engineers Fall 2009/2010 DRAFT SLIDES.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov.
How works M. Ram Murty, FRSC Queen’s Research Chair Queen’s University or How linear algebra powers the search engine.
Google PageRank Algorithm
By: Jesse Ehlert Dustin Wells Li Zhang Iterative Aggregation/Disaggregation(IAD)
WORKSHOP ERCIM Global convergence for iterative aggregation – disaggregation method Ivana Pultarova Czech Technical University in Prague, Czech Republic.
CompSci 100E 4.1 Google’s PageRank web site xxx web site yyyy web site a b c d e f g web site pdq pdq.. web site yyyy web site a b c d e f g web site xxx.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
STROUD Worked examples and exercises are in the text Programme 5: Matrices MATRICES PROGRAMME 5.
ILAS Threshold partitioning for iterative aggregation – disaggregation method Ivana Pultarova Czech Technical University in Prague, Czech Republic.
STROUD Worked examples and exercises are in the text PROGRAMME 5 MATRICES.
Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.
PageRank Google : its search listings always seemed deliver the “good stuff” up front. 1 2 Part of the magic behind it is its PageRank Algorithm PageRank™
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Motivation Modern search engines for the World Wide Web use methods that require solving huge problems. Our aim: to develop multiscale techniques that.
CS 290N / 219: Sparse Matrix Algorithms
Search Engines and Link Analysis on the Web
Solving Linear Systems Ax=b
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Degree and Eigenvector Centrality
Laboratory of Intelligent Networks (LINK) Youn-Hee Han
Link Counts GOOGLE Page Rank engine needs speedup
Piyush Kumar (Lecture 2: PageRank)
Updating PageRank by Iterative Aggregation
Junghoo “John” Cho UCLA
Math review - scalars, vectors, and matrices
Chapter 2 Determinants.
Presentation transcript:

Iterative Aggregation Disaggregation Nicole Typaldos Missouri State University

Process of Webpage ranking Graph Process of Webpage ranking Web Graph Matrix Page rank vector Here is a brief overview of how WebPages are ranked. We begin with the world wide web, and from there we get each web page Then from the web we get a directed graph denoted (Gamma H) this graph has each webpage represented as a node mapping out the direction links in between WebPages. From the directed graph the link matrix denoted H . The matrix has a row and a corresponding column for each node in the directed graph. Finally we reach the dominant eigenvector of a modified link matrix H. this page rank vector is denoted v. and contains the ranks of each webpage The pagerank vector is then applied to the web and we have the web pages ranked for queries. In the following slides I will be showing how this process can be achieved and the conditions necessary for it to work.

Google’s page ranking algorithm Google’s pagerank vector released to the public in 2004. We first need to assign a way of measuring importance of a webpage, this is achieved by the importance being proportional to the number of WebPages with inlinks to that page. The sum of the ranks must be scaled so that one page does not have more influence on another page without justification. Scaling allows fairness. What this means: here we have a iterative process where the rank of Page j is equal to the summation of the rank of page i divided by the number of outlinks page i has. Where page i inlinks to page j. this is to find the rank of page j at iterative step k and it depends on the ranks of all pages i at step k-1. We make the pagerank vector. Then this can be equated to the power method where vk^T=vk-1^T*H where H is the link matix discussed before. Now I will talk about the conditions fo allow this to really work Google uses an iterative process to avoid self-referencing

Conditioning the matrix H Definitions: Reducible: if there exist a permutation matrix Pnxn and an integer 1 ≤ r ≤ n-1 such that: otherwise if a matrix is not reducible then it is irreducible Primitive: if and only if Ak >0 for some k=1,2,3… Primitive |λ1| > |λ2| Power Method Converges Irreducible Unique Dominant Eigenvector Here we have two definitions :what is means to be a reducible matrix and what it means to be a primitive matrix. It is necessary to make H have both a single dominant eigenvalue of maximum magnitude and for the matrix to be irreducible. With these conditions forced onto H we not only have that the power method converges so there exist a eigenvector but also that the eigenvector is unique. The uniqueness of the dominant eigenvector is in the sense that the one-norm of v is one. After coming back: We now have a irreducible matrix that by accident we made primitive and therefore also have the condition that the power method converges and there exists an single dominant eigenvalue of maximum magnitude. By the Perron Frobenius Theorem.

Example Set Up Here is a simple example that I will be using throughout my talk. The web only contains six WebPages. Or six nodes. The figure on the left is a directed graph. This can be read as “node 5 outlinks to node 1 and to node 6” while node 5 has an inlink from node 6. node 4 only has inlinks and no outlinks. The figure on the right is of the link matrix H. because node 5 has two outlinks one to node 1 and one to node 6 there is the representation of these links in the 5th row on the first and 6th entry of ½. Because node 4 has no outlinks row 4 is a row of zeroes. Therefore node 4 is a dangling node and keeps H from being stochastic. This will need to be remedied.

Example Continued Because H is not stochastic in our example we will have to add a matrix to make H stochastic. One thing to keep in mind is that we do not want to change anything more than necessary so that we may preserve the democracy of inlinks/outlinks of the web. So we use the sparse vector a with a 1 entry where row i is a dangling node and a 0 entry otherwise. Then multiplying by an arbitrary probabilistic vector u^T so that we achieve a stochastic matrix.

Example Continued Here B denotes the newly modified H so that it will be stochastic. Notice that the only entries that were changed were those in the rows of the dangling node. B is stochastic with each row summation equaling one

The Google Matrix > 0 Where U is an arbitrary probabilistic vector ℮ is a vector of ones U is an arbitrary probabilistic vector a is the vector for correcting dangling nodes A problem with the matrix B is that it is not necessarily irreducible. By defining the matrix G to be the convex combination of B and E, where E is equal to eu^T. u^T is still a vector of 1/n. Therefore G called googles matrix is still stochastic yet now is irreducible. When alpha is 1 we have only the matrix B and therefore have earned nothing and B may still not be irreducible. When alpha is 0 we have only the matrix E where E is not a representation of the original matrix H. so we need to have an alpha between 0 and 1. In 2004 the alpha Google claimed was between the values of 0.8 and 0.85

Example Continued Here we have the google matrix. Notice that G>0 therefore by definition G is primitive where k=1 [click to go back to primitive page]

Different Approaches Power Method Linear Systems Iterative Aggregation Disaggregation (IAD) Now that we have met the conditions. We notice that G is no longer sparse, a quality that was useful in computation time and energy. The classical method of solving iterative steps the power method from Googles algorithm can be factored so that the original vk+1^T=vk^T*G can be written in terms of H where H is the original link matrix that is still sparse. With the linear systems we have several algorithms for solving them. IAD is a new method

Linear Systems and Dangling Nodes Simplify computation by arranging dangling nodes of H in the lower rows Rewrite by reordering dangling nodes Where is a square matrix that represents links between nondangling nodes to nondangling nodes; is a square matrix representing links to dangling nodes Linear systems along with the other methods can be simplified by rearranging the dangling nodes of H We use this method to reduce computation cost in both time and energy. The reordering of the dangling nodes makes the vector a have all the zero entries then all the one entries following

Rearranging H Size of original system is initially H now with rearranging by dangling nodes the size of the system is H 1 1 We solve for x1 in the first equation then plug into the second equation to solve for x2

Exact aggregation disaggregation Theorem If G transition matrix for an irreducible Markov chain with stochastic complement: is the stationary dist of S, and is the stationary distribution of A then the stationary of G is given by: Here we have the Google matrix partitioned and the predefined matrix A which is the aggregated matrix and we are solving for the dominant left eigenvector v which is also partitioned. The dominant left eigenvector for matrices is the equivalent to the stationary distribution of markov chains. So instead of computing the stationary distribution of G we find the stationary distribution of A and the stationary distribution of S and then we combine the two together

Approximate aggregation disaggregation Problem: Computing S and is too difficult and too expensive. So, Ã= Where A and à differ only by one row Rewrite as: Because S is too difficult to solve and u is also too difficult to solve because of the inverse in S so instead of solving for u we use the approximate aggregation matrix A tilda and make an initial guess for u and denote it as u tilda. A tilda can be simplified and equated to include z^T

Approximate aggregation disaggregation Algorithm Select an arbitrary probabilistic vector and a tolerance є For k = 1,2, … Find the stationary distribution of Set Let If then stop Otherwise Here is the algorithm for approximate aggregation disaggregation. Its an iterative step process that also includes the power method

Combined methods How to compute Iterative Aggregation Disaggregation combined with: Power Method Linear Systems

With Power Method = Ã Ã is a full matrix =

With Power Method Try to exploit the sparsity of H solving = Ã Exploiting dangling nodes:

With Power Method Try to exploit the sparsity of H Solving = Ã Exploiting dangling nodes:

With Linear Systems = Ã After multiplication write as: = Ã After multiplication write as: Since is unknown, make it arbitrary then adjust

With Linear Systems Algorithm (dangling nodes) Give an initial guess and a tolerance є Repeat until Solve Adjust

References Berry, Michael W. and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2005. Langville, Amy N. and Carl D. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton, New Jersey: Princeton University Press, 2006. "Updating Markov Chains with an eye on Google's PageRank." Society for Industrial and Applied mathematics (2006): 968-987. Rebaza, Jorge. "Ranking Web Pages." Mth 580 Notes (2008): 97-153.  

Iterative Aggregation Disaggregation Nicole Typaldos Missouri State University