DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.

Slides:



Advertisements
Similar presentations
Great Theoretical Ideas in Computer Science
Advertisements

Bipartite Matching, Extremal Problems, Matrix Tree Theorem.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Combinatorial Algorithms
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Five Problems CSE 421 Richard Anderson Winter 2009, Lecture 3.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Absorbing Random walks Coverage
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Analysis of Network Diffusion and Distributed Network Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Approximation Algoirthms: Semidefinite Programming Lecture 19: Mar 22.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Maggie Zhou COMP 790 Data Mining Seminar, Spring 2011
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Semidefinite Programming
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Approximation Algorithms
1 Vertex Cover Problem Given a graph G=(V, E), find V' ⊆ V such that for each edge (u, v) ∈ E at least one of u and v belongs to V’ and |V’| is minimized.
Zoë Abrams, Ashish Goel, Serge Plotkin Stanford University Set K-Cover Algorithms for Energy Efficient Monitoring in Wireless Sensor Networks.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Online Social Networks and Media Absorbing Random Walks Link Prediction.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Approximation Algorithms
1 Maximal Independent Set. 2 Independent Set (IS): In a graph G=(V,E), |V|=n, |E|=m, any set of nodes that are not adjacent.
CSCI 3160 Design and Analysis of Algorithms Chengyu Lin.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University Local Search, Greedy and Partitioning
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Online Social Networks and Media
The Dominating Set and its Parametric Dual  the Dominated Set  Lan Lin prepared for theory group meeting on June 11, 2003.
Gaussian Mixture Models and Expectation-Maximization Algorithm.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
Online Social Networks and Media Absorbing random walks Label Propagation Opinion Formation.
Approximation Algorithms based on linear programming.
TU/e Algorithms (2IL15) – Lecture 11 1 Approximation Algorithms.
Random Walk for Similarity Testing in Complex Networks
Introduction to Approximation Algorithms
Hamiltonian Cycle and TSP
Hamiltonian Cycle and TSP
DTMC Applications Ranking Web Pages & Slotted ALOHA
Greedy Algorithm for Community Detection
Lectures on Network Flows
Distributed Submodular Maximization in Massive Datasets
Neural Networks for Vertex Covering
Coverage Approximation Algorithms
Introduction Wireless Ad-Hoc Network
Vertex Covers, Matchings, and Independent Sets
Problem Solving 4.
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Instructor: Aaron Roth
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems

PAGERANK

PageRank algorithm The PageRank random walk Start from a page chosen uniformly at random With probability α follow a random outgoing link With probability 1- α jump to a random page chosen uniformly at random Repeat until convergence The PageRank Update Equations

The PageRank random walk What about sink nodes? When at a node with no outgoing links jump to a page chosen uniformly at random

The PageRank random walk The PageRank transition probability matrix P was sparse, P’’ is dense. P’’ = αP’ + (1-α)uv T, where u is the vector of all 1s, u = (1,1,…,1) and v is the uniform vector, v = (1/n,1/n,…,1/n)

A PageRank implementation Performing vanilla power method is now too expensive – the matrix is not sparse q 0 = v t = 1 repeat t = t +1 until δ < ε Efficient computation of q t = (P’’) T q t-1 P = normalized adjacency matrix P’’ = αP’ + (1-α)uv T, where u is the vector of all 1s P’ = P + dv T, where d i is 1 if i is sink and 0 o.w.

A PageRank implementation Why does this work?

Implementation details

ABSORBING RANDOM WALKS

Random walk with absorbing nodes What happens if we do a random walk on this graph? What is the stationary distribution? All the probability mass on the red sink node: The red node is an absorbing node

Random walk with absorbing nodes What happens if we do a random walk on this graph? What is the stationary distribution? There are two absorbing nodes: the red and the blue. The probability mass will be divided between the two

Absorption probability If there are more than one absorbing nodes in the graph a random walk that starts from a non- absorbing node will be absorbed in one of them with some probability The probability of absorption gives an estimate of how close the node is to red or blue

Absorption probability Computing the probability of being absorbed is very easy Take the (weighted) average of the absorption probabilities of your neighbors if one of the neighbors is the absorbing node, it has probability 1 Repeat until convergence (very small change in probs) The absorbing nodes have probability 1 of being absorbed in themselves and zero of being absorbed in another node

Absorption probability The same idea can be applied to the case of undirected graphs The absorbing nodes are still absorbing, so the edges to them are (implicitly) directed

Propagating values Assume that Red has a positive value and Blue a negative value Positive/Negative class, Positive/Negative opinion We can compute a value for all the other nodes in the same way This is the expected value for the node

Electrical networks and random walks Our graph corresponds to an electrical network There is a positive voltage of +1 at the Red node, and a negative voltage -1 at the Blue node There are resistances on the edges inversely proportional to the weights (or conductance proportional to the weights) The computed values are the voltages at the nodes

Transductive learning If we have a graph of relationships and some labels on these edges we can propagate them to the remaining nodes E.g., a social network where some people are tagged as spammers E.g., the movie-actor graph where some movies are tagged as action or comedy. This is a form of semi-supervised learning We make use of the unlabeled data, and the relationships It is also called transductive learning because it does not produce a model, but just labels the unlabeled data that is at hand. Contrast to inductive learning that learns a model and can label any new example

Implementation details

COVERAGE

Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend who has bought it. We want to offer the product for free to some people such that every person in the graph is covered (they have a friend who has the product). We want the number of free products to be as small as possible

Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend who has bought it. We want to offer the product for free to some people such that every person in the graph is covered (they have a friend who has the product). We want the number of free products to be as small as possible One possible selection

Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend who has bought it. We want to offer the product for free to some people such that every person in the graph is covered (they have a friend who has the product). We want the number of free products to be as small as possible A better selection

Dominating set

Set Cover

Applications

Best selection variant

Complexity Both the Set Cover and the Maximum Coverage problems are NP-complete What does this mean? Why do we care? There is no algorithm that can guarantee to find the best solution in polynomial time Can we find an algorithm that can guarantee to find a solution that is close to the optimal? Approximation Algorithms.

Approximation Algorithms Suppose you have an (combinatorial) optimization problem E.g., find the minimum set cover E.g., find the set that maximizes coverage If X is an instance of the problem, let OPT(X) be the value of the optimal solution, and ALG(X) the value of an algorithm ALG. ALG is a good approximation algorithm if the ratio of OPT and ALG is bounded.

Approximation Algorithms

Approximation ratio

A simple approximation ratio for set cover Any algorithm for set cover has approximation ratio  = |S max |, where S max is the set in S with the largest cardinality Proof: OPT(X)≥N/|S max |  N ≤ |s max |OPT(I) ALG(X) ≤ N ≤ |s max |OPT(X) This is true for any algorithm. Not a good bound since it can be that |S max |=O(N)

An algorithm for Set Cover What is the most natural algorithm for Set Cover? Greedy: each time add to the collection C the set S i from S that covers the most of the remaining elements.

The GREEDY algorithm

Approximation ratio of GREEDY OPT(X) = 2 GREEDY(X) = logN  =½logN

Maximum Coverage What is a reasonable algorithm?

Approximation Ratio for Max-K Coverage

Proof of approximation ratio

Optimizing submodular functions

Other variants of Set Cover Hitting Set: select a set of elements so that you hit all the sets (the same as the set cover, reversing the roles) Vertex Cover: Select a subset of vertices such that you cover all edges (an endpoint of each edge is in the set) There is a 2-approximation algorithm Edge Cover: Select a set of edges that cover all vertices (there is one edge that has endpoint the vertex) There is a polynomial algorithm

Parting thoughts In this class you saw a set of tools for analyzing data Association Rules Sketching Clustering Classification Signular Value Decomposition Random Walks Coverage All these are useful when trying to make sense of the data. A lot more variants exist.