Link-Based Ranking Seminar Social Media Mining University UC3M

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
Link Analysis: PageRank
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
ICS 278: Data Mining Lecture 15: Mining Web Link Structure
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
15-853Page :Algorithms in the Real World Indexing and Searching III (well actually II) – Link Analysis – Near duplicate removal.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Link Analysis, PageRank and Search Engines on the Web
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
(hyperlink-induced topic search)
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine CS 277: Data Mining Mining Web Link Structure.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
PRESENTED BY ASHISH CHAWLA AND VINIT ASHER The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page and Sergey Brin, Stanford University.
The PageRank Citation Ranking: Bringing Order to the Web Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun.
Presented By: - Chandrika B N
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Link Analysis Hongning Wang
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.
Lectures 6 & 7 Centrality Measures Lectures 6 & 7 Centrality Measures February 2, 2009 Monojit Choudhury
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Link Analysis Rong Jin. Web Structure  Web is a graph Each web site correspond to a node A link from one site to another site forms a directed edge 
Ranking Link-based Ranking (2° generation) Reading 21.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
PageRank Algorithm -- Bringing Order to the Web (Hu Bin)
1 CS 430: Information Discovery Lecture 5 Ranking.
Link Analysis Algorithms Page Rank Slides from Stanford CS345, slightly modified.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
Roberto Battiti, Mauro Brunato
Information Retrieval Search Engine Technology (11)
The PageRank Citation Ranking: Bringing Order to the Web
The PageRank Citation Ranking: Bringing Order to the Web
Quality of a search engine
15-499:Algorithms and Applications
HITS Hypertext-Induced Topic Selection
Information Retrieval (11)
Search Engines and Link Analysis on the Web
Lecture #11 PageRank (II)
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Centrality in Social Networks
Lecture 22 SVD, Eigenvector, and Web Search
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Graph Algorithms Ch. 5 Lin and Dyer.
Information retrieval and PageRank
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Graph Algorithms Ch. 5 Lin and Dyer.
Presented by Nick Janus
Presentation transcript:

Link-Based Ranking Seminar Social Media Mining University UC3M Date May 2017 Lecturer Carlos Castillo http://chato.cl/ Sources: Fei Li's lecture on PageRank Evimaria Terzi's lecture on link analysis. Paolo Boldi, Francesco Bonchi, Carlos Castillo, and Sebastiano Vigna. 2011. Viscous democracy for social networks. Commun. ACM 54, 6 (June 2011), 129-137. [link]

Purpose of Link-Based Ranking Static (query-independent) ranking Dynamic (query-dependent) ranking Applications: Search in social networks Search on the web

Given a set of connected objects

Assign some weights

Alternatives Various centrality metrics Degree, betweenness, ... Classical algorithms HITS / Hubs and Authorities PageRank

HITS (Hubs and Authorities)

HITS Jon M. Kleinberg. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5 (September 1999), 604-632. [DOI] Query-dependent algorithm Get pages matching the query Expand to 1-hop neighborhood Find pages with good out-links (“hubs”) Find pages with good in-links (“authorities”)

Root set = matches the query

Base set S = root set plus 1-hop neighbors Base set S is expected to be small and topically focused.

Base graph S of n nodes

Bipartite graph of 2n nodes

Bipartite graph of 2n nodes 0) Initialization: 1) Iteration: 2) Normalization:

Try it! Complete the table. Which one is the biggest hub? Which the biggest authority? Does it differ from ranking by degree? H(1) A(1) Â(1) H(2) Ĥ(2) A(2) Â(2) 1 3

What are we computing? Vector a is an eigenvector of ATA Conversely, vector h is an eigenvector of AAT

Tightly-knit communities Imagine a graph made of a 3,3 and a 2,3 clique 1 1 1 1 1

Tightly-knit communities Imagine a graph made of a 3,3 and a 2,3 clique 3 3 3 2 2 2

Tightly-knit communities Imagine a graph made of a 3,3 and a 2,3 clique 3x3 3x3 3x3 3x2 3x2

Tightly-knit communities Imagine a graph made of a 3,3 and a 2,3 clique 3x3x3 3x3x3 3x3x3 3x2x2 3x2x2 3x2x2

Tightly-knit communities HITS favors the largest dense sub-graph After n iterations: … ... ...

PageRank

PageRank The pagerank citation algorithm: bringing order to the web by L Page, S Brin, R Motwani, T Winograd - 7th World Wide Web Conference, 1998 [link]. Designed by Page & Brin as part of a research project that started in 1995 and ended in 1998 … with the creation of Google

A Simple Version of PageRank Nj: the number of forward links of page j c: normalization factor to ensure ||P||L1= |P1 + … + Pn| = 1

An example of Simplified PageRank First iteration of calculation 23

An example of Simplified PageRank Second iteration of calculation 24

An example of Simplified PageRank Convergence after some iterations 25

A Problem with Simplified PageRank A loop: During each iteration, the loop accumulates rank but never distributes rank to other pages!

An example of the Problem First iteration 27

An example of the Problem Second iteration … see what's happening? 28

An example of the Problem Convergence 29

What are we computing? p is an eigenvector of A with eigenvalue 1 This (power method) can be used if A is: Stochastic (each row adds up to one) Irreducible (represents a strongly connected graph) Aperiodic (does not represent a bipartite graph)

Markov Chains Discrete process over a set of states Next state determined by current state and current state only (no memory of older states) Higher-order Markov chains can be defined Stationary distribution of Markov chain is a probability distribution such that p = Ap Intuitively, p represents “the average time spent” at each node if the process continues forever

Random Walks in Graphs Random Surfer Model The simplified model: the standing probability distribution of a random walk on the graph of the web. simply keeps clicking successive links at random Modified Random Surfer The modified model: the “random surfer” simply keeps clicking successive links at random, but periodically “gets bored” and jumps to a random page based on the distribution of E This guarantees irreducibility Pages without out-links (dangling nodes) are a row of zeros, can be replaced by E, or by a row of 1/n

Modified Version of PageRank E(i): web pages that “users” jump to when they “get bored”; Uniform random jump => E(i) = 1/n

An example of Modified PageRank

Try it! Draw a three-node graph with only two directed edges (you pick the direction) Assume alpha = 0.2 (prob. of random jump) Compute PageRank through power method

Variant: personalized PageRank Modify vector E(i) according to users' tastes (e.g. user interested in sports vs politics) http://nlp.stanford.edu/IR-book/html/htmledition/topic-specific-pagerank-1.html

PageRank and internal linking A website has a maximum amount of Page Rank that is distributed between its pages by internal links [depends on internal links] The maximum amount of Page Rank in a site increases as the number of pages in the site increases. By linking poorly, it is possible to fail to reach the site's maximum Page Rank, but it is not possible to exceed it. http://www.cs.sjsu.edu/faculty/pollett/masters/Semesters/Fall11/tanmayee/Deliverable3.pdf

PageRank as a form of actual voting (liquid democracy) If alpha = 1, we can implement liquid democracy In liquid democracy, people chose to either vote or to delegate their vote to somebody else If alpha < 1, we have a sort of “viscous” democracy where delegation is not total

PageRank as a form of liquid democracy

These two graphs have different alpha (0.2 and 0.9) Which one is which?

PageRank Implementation Suppose there are n pages and m links Trivial implementation of PageRank requires O(m+n) memory Streaming implementation requires O(n) memory … how?

Other centrality metrics

Types of centrality measure Spectral PageRank HITS Non-spectral Degree Betweenness Closeness and harmonic closeness

Is u a well-connected person? Degree: u has many connections Node u has high degree Betweenness: u connects different communities Large number of shortest paths pass through u Closeness: u is near-by to many people Average distance from u is small Eigenvector: u is connected to the well-connected Coordinate u in eigenvector of eigenvalue 1 is large

Betweenness A node has high betweenness if it participates in many shortest-paths One can enumerate all shortest paths in O(|V|3) http://rnav.labri.fr/rNAV2.0_public_version/online_doc/_images/ComputeBetweennessCentrality.png

Closeness Farness (distance) between two nodes is d(u,v) Closeness if the reciprocal of distances Some graphs are not connected, in that case d(u,v) can be ∞; setting 1/∞ = 0 one can define the harmonic closeness:

Compute the closeness of every node Try it! Compute the closeness of every node a b c f g d e

A: Degree B: Closeness C: Betweenness D: Eigenvector HIGH LOW