Iterative Aggregation Disaggregation

Iterative Aggregation Disaggregation
Nicole Typaldos Missouri State University

Process of Webpage ranking
Graph Process of Webpage ranking Web Graph Matrix Page rank vector Here is a brief overview of how WebPages are ranked. We begin with the world wide web, and from there we get each web page Then from the web we get a directed graph denoted (Gamma H) this graph has each webpage represented as a node mapping out the direction links in between WebPages. From the directed graph the link matrix denoted H . The matrix has a row and a corresponding column for each node in the directed graph. Finally we reach the dominant eigenvector of a modified link matrix H. this page rank vector is denoted v. and contains the ranks of each webpage The pagerank vector is then applied to the web and we have the web pages ranked for queries. In the following slides I will be showing how this process can be achieved and the conditions necessary for it to work.

Google’s page ranking algorithm
Google’s pagerank vector released to the public in 2004. We first need to assign a way of measuring importance of a webpage, this is achieved by the importance being proportional to the number of WebPages with inlinks to that page. The sum of the ranks must be scaled so that one page does not have more influence on another page without justification. Scaling allows fairness. What this means: here we have a iterative process where the rank of Page j is equal to the summation of the rank of page i divided by the number of outlinks page i has. Where page i inlinks to page j. this is to find the rank of page j at iterative step k and it depends on the ranks of all pages i at step k-1. We make the pagerank vector. Then this can be equated to the power method where vk^T=vk-1^T*H where H is the link matix discussed before. Now I will talk about the conditions fo allow this to really work Google uses an iterative process to avoid self-referencing

Conditioning the matrix H
Definitions: Reducible: if there exist a permutation matrix Pnxn and an integer 1 ≤ r ≤ n-1 such that: otherwise if a matrix is not reducible then it is irreducible Primitive: if and only if Ak >0 for some k=1,2,3… Primitive |λ1| > |λ2| Power Method Converges Irreducible Unique Dominant Eigenvector Here we have two definitions :what is means to be a reducible matrix and what it means to be a primitive matrix. It is necessary to make H have both a single dominant eigenvalue of maximum magnitude and for the matrix to be irreducible. With these conditions forced onto H we not only have that the power method converges so there exist a eigenvector but also that the eigenvector is unique. The uniqueness of the dominant eigenvector is in the sense that the one-norm of v is one. After coming back: We now have a irreducible matrix that by accident we made primitive and therefore also have the condition that the power method converges and there exists an single dominant eigenvalue of maximum magnitude. By the Perron Frobenius Theorem.

Example Set Up Here is a simple example that I will be using throughout my talk. The web only contains six WebPages. Or six nodes. The figure on the left is a directed graph. This can be read as “node 5 outlinks to node 1 and to node 6” while node 5 has an inlink from node 6. node 4 only has inlinks and no outlinks. The figure on the right is of the link matrix H. because node 5 has two outlinks one to node 1 and one to node 6 there is the representation of these links in the 5th row on the first and 6th entry of ½. Because node 4 has no outlinks row 4 is a row of zeroes. Therefore node 4 is a dangling node and keeps H from being stochastic. This will need to be remedied.

Example Continued Because H is not stochastic in our example we will have to add a matrix to make H stochastic. One thing to keep in mind is that we do not want to change anything more than necessary so that we may preserve the democracy of inlinks/outlinks of the web. So we use the sparse vector a with a 1 entry where row i is a dangling node and a 0 entry otherwise. Then multiplying by an arbitrary probabilistic vector u^T so that we achieve a stochastic matrix.

Example Continued Here B denotes the newly modified H so that it will be stochastic. Notice that the only entries that were changed were those in the rows of the dangling node. B is stochastic with each row summation equaling one

The Google Matrix > 0 Where U is an arbitrary probabilistic vector
℮ is a vector of ones U is an arbitrary probabilistic vector a is the vector for correcting dangling nodes A problem with the matrix B is that it is not necessarily irreducible. By defining the matrix G to be the convex combination of B and E, where E is equal to eu^T. u^T is still a vector of 1/n. Therefore G called googles matrix is still stochastic yet now is irreducible. When alpha is 1 we have only the matrix B and therefore have earned nothing and B may still not be irreducible. When alpha is 0 we have only the matrix E where E is not a representation of the original matrix H. so we need to have an alpha between 0 and 1. In 2004 the alpha Google claimed was between the values of 0.8 and 0.85

Example Continued Here we have the google matrix. Notice that G>0 therefore by definition G is primitive where k=1 [click to go back to primitive page]

Different Approaches Power Method Linear Systems
Iterative Aggregation Disaggregation (IAD) Now that we have met the conditions. We notice that G is no longer sparse, a quality that was useful in computation time and energy. The classical method of solving iterative steps the power method from Googles algorithm can be factored so that the original vk+1^T=vk^T*G can be written in terms of H where H is the original link matrix that is still sparse. With the linear systems we have several algorithms for solving them. IAD is a new method

Linear Systems and Dangling Nodes
Simplify computation by arranging dangling nodes of H in the lower rows Rewrite by reordering dangling nodes Where is a square matrix that represents links between nondangling nodes to nondangling nodes; is a square matrix representing links to dangling nodes Linear systems along with the other methods can be simplified by rearranging the dangling nodes of H We use this method to reduce computation cost in both time and energy. The reordering of the dangling nodes makes the vector a have all the zero entries then all the one entries following

Rearranging H Size of original system is initially H now with rearranging by dangling nodes the size of the system is H 1 1 We solve for x1 in the first equation then plug into the second equation to solve for x2

Exact aggregation disaggregation
Theorem If G transition matrix for an irreducible Markov chain with stochastic complement: is the stationary dist of S, and is the stationary distribution of A then the stationary of G is given by: Here we have the Google matrix partitioned and the predefined matrix A which is the aggregated matrix and we are solving for the dominant left eigenvector v which is also partitioned. The dominant left eigenvector for matrices is the equivalent to the stationary distribution of markov chains. So instead of computing the stationary distribution of G we find the stationary distribution of A and the stationary distribution of S and then we combine the two together

Approximate aggregation disaggregation
Problem: Computing S and is too difficult and too expensive. So, Ã= Where A and Ã differ only by one row Rewrite as: Because S is too difficult to solve and u is also too difficult to solve because of the inverse in S so instead of solving for u we use the approximate aggregation matrix A tilda and make an initial guess for u and denote it as u tilda. A tilda can be simplified and equated to include z^T

Approximate aggregation disaggregation
Algorithm Select an arbitrary probabilistic vector and a tolerance є For k = 1,2, … Find the stationary distribution of Set Let If then stop Otherwise Here is the algorithm for approximate aggregation disaggregation. Its an iterative step process that also includes the power method

Combined methods How to compute Iterative Aggregation Disaggregation
combined with: Power Method Linear Systems

With Power Method = Ã Ã is a full matrix =

With Power Method Try to exploit the sparsity of H solving = Ã
Exploiting dangling nodes:

With Power Method Try to exploit the sparsity of H Solving = Ã
Exploiting dangling nodes:

With Linear Systems = Ã After multiplication write as:
= Ã After multiplication write as: Since is unknown, make it arbitrary then adjust

With Linear Systems Algorithm (dangling nodes)
Give an initial guess and a tolerance є Repeat until Solve Adjust

References Berry, Michael W. and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2005. Langville, Amy N. and Carl D. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton, New Jersey: Princeton University Press, 2006. "Updating Markov Chains with an eye on Google's PageRank." Society for Industrial and Applied mathematics (2006): Rebaza, Jorge. "Ranking Web Pages." Mth 580 Notes (2008):

Iterative Aggregation Disaggregation
Nicole Typaldos Missouri State University

Iterative Aggregation Disaggregation

Similar presentations

Presentation on theme: "Iterative Aggregation Disaggregation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Iterative Aggregation Disaggregation

Similar presentations

Presentation on theme: "Iterative Aggregation Disaggregation"— Presentation transcript:

Similar presentations

About project

Feedback