Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins.

Slides:

Advertisements

Similar presentations

Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.

Advertisements

Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.

Information Networks Link Analysis Ranking Lecture 8.

Graphs, Node importance, Link Analysis Ranking, Random walks

Link Analysis: PageRank

Matrices, Digraphs, Markov Chains & Their Use. Introduction to Matrices  A matrix is a rectangular array of numbers  Matrices are used to solve systems.

Overview of Markov chains David Gleich Purdue University Network & Matrix Computations Computer Science 15 Sept 2011.

More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.

Matrix Multiplication To Multiply matrix A by matrix B: Multiply corresponding entries and then add the resulting products (1)(-1)+ (2)(3) Multiply each.

DATA MINING LECTURE 12 Link Analysis Ranking Random walks.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005

Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.

Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou

Matrices. Special Matrices Matrix Addition and Subtraction Example.

Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006

15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.

Link Analysis, PageRank and Search Engines on the Web

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 5 April 23, 2006

(hyperlink-induced topic search)

Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.

1 COMP4332 Web Data Thanks for Raymond Wong’s slides.

Basic Definitions Positive Matrix: 5.Non-negative Matrix:

Information Retrieval Link-based Ranking. Ranking is crucial… “.. From our experimental data, we could observe that the top 20% of the pages with the.

Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.

The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.

Copyright © 2011 Pearson, Inc. 7.3 Multivariate Linear Systems and Row Operations.

CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.

LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.

PageRank. s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31.

2.4 Irreducible Matrices. Reducible is reducible if there is a permutation P such that where A 11 and A 22 are square matrices each of size at least one;

4.6: Rank. Definition: Let A be an mxn matrix. Then each row of A has n entries and can therefore be associated with a vector in The set of all linear.

Ch 14. Link Analysis Padmini Srinivasan Computer Science Department

How works M. Ram Murty, FRSC Queen’s Research Chair Queen’s University or How linear algebra powers the search engine.

Ranking Link-based Ranking (2° generation) Reading 21.

Understanding Google’s PageRank™ 1. Review: The Search Engine 2.

- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.

Arab Open University Faculty of Computer Studies M132: Linear Algebra

PageRank Algorithm -- Bringing Order to the Web (Hu Bin)

1.7 Linear Independence. in R n is said to be linearly independent if has only the trivial solution. in R n is said to be linearly dependent if there.

2.5 – Determinants and Multiplicative Inverses of Matrices.

Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.

Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.

REVIEW Linear Combinations Given vectors and given scalars

13.4 Product of Two Matrices

12-1 Organizing Data Using Matrices

Matrix Representation of Graphs

Quality of a search engine

HITS Hypertext-Induced Topic Selection

Methods and Apparatus for Ranking Web Page Search Results

Search Engines and Link Analysis on the Web

Matrix Multiplication

7CCSMWAL Algorithmic Issues in the WWW

Link-Based Ranking Seminar Social Media Mining University UC3M

PageRank and Markov Chains

A Comparative Study of Link Analysis Algorithms

Degree and Eigenvector Centrality

Iterative Aggregation Disaggregation

Lecture 22 SVD, Eigenvector, and Web Search

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Link Structure Analysis

Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"

Ilan Ben-Bassat Omri Weinstein

Junghoo “John” Cho UCLA

Lecture 22 SVD, Eigenvector, and Web Search

Lecture 22 SVD, Eigenvector, and Web Search

COMP5331 Web databases Prepared by Raymond Wong

Matrix Multiplication Sec. 4.2

Presentation transcript:

Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

SALSA Created by Lempel Moran in 2000 Combination of HITS and PageRank

SALSA’s similarities to HITS and PageRank SALSA uses authority and hub score SALSA creates a neighborhood graph using authority and hub pages and links

SALSA’s differences between HITS and PageRank The SALSA method create a bipartite graph of the authority and hub pages in the neighborhood graph. One set contains hub pages One set contains authority pages Each page may be located in both sets

Neighborhood Graph G

Bipartite Graph G of Neighborhood Graph N

Markov Chains Two matrices formed from bipartite graph G A hub Markov chain with matrix H An authority Markov chain with matrix A

Where does SALSA fit in? Matrices H and A can be derived from the adjacency matrix L used in the HITS and PageRank methods HITS used unweighted matrix L PageRank uses a row weighted version of matrix L SALSA uses both row and column weighting

How are H and A computed? Let L r be L with each nonzero row divided by its row sum let L c be L with each nonzero column divided by its column sum

H, SALSA’s hub matrix, consists of the nonzero rows and columns of L r L c T A, SALSA’s authority matrix, consists of the nonzero rows and columns of L c T L r

Eigenvectors Av = λv v T A = λ v T Numerically: Power Method

The Power Method X k+1 = AX k X k+1 T = X k T A Converges to the dominant eigenvector ( λ = 1).

The Power Method Matrices H and A must be irreducible for the power method to converge to a unique eigenvector given any starting value If our neighborhood graph G is connected, then both H and A are irreducible If G is not connected, then performing the power method on H and A will not result in the convergence to a unique dominant eigenvector

Our Graph is not connected! In our example it is clear to see that the graph is not connected as page 2 in the hub set is only connected to page 1 in the authority set and vice versa. H and A are reducible and therefore contain multiple irreducible connected components

Connected Components H contains two connected components, C = {2} and D = {1, 3, 6, 10} A contains two connected components, E = {1} and F = {3, 5, 6}

Cutting and Pasting. Part I We can now perform the power method on each component for H and A

Cutting and Pasting. Part II We can now paste the two components together for each matrix We must multiply each entry in the vector by its appropriate weight

H: A:

Strengths and Weaknesses Not affected as much my topic drift like HITS It gives authority and hub scores. Handles spamming better than HITS, but not near as good as PageRank query dependence

Thank You For Your Time!