The Importance of Communities for Learning to Influence

Slides:



Advertisements
Similar presentations
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Advertisements

Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Spread of Influence through a Social Network Adapted from :
Maximizing the Spread of Influence through a Social Network
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
Maximizing the Spread of Influence through a Social Network
Patterns of Influence in a Recommendation Network Jure Leskovec, CMU Ajit Singh, CMU Jon Kleinberg, Cornell School of Computer Science Carnegie Mellon.
Maximizing the Spread of Influence through a Social Network By David Kempe, Jon Kleinberg, Eva Tardos Report by Joe Abrams.
Based on “Cascading Behavior in Networks: Algorithmic and Economic Issues” in Algorithmic Game Theory (Jon Kleinberg, 2007) and Ch.16 and 19 of Networks,
PCPs and Inapproximability Introduction. My T. Thai 2 Why Approximation Algorithms  Problems that we cannot find an optimal solution.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Optimal Marketing Strategies over Social Networks Jason Hartline (Northwestern), Vahab Mirrokni (Microsoft Research) Mukund Sundararajan (Stanford)
Approximation Algorithms
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
On Stochastic Minimum Spanning Trees Kedar Dhamdhere Computer Science Department Joint work with: Mohit Singh, R. Ravi (IPCO 05)
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
1 Combinatorial Dominance Analysis Keywords: Combinatorial Optimization (CO) Approximation Algorithms (AA) Approximation Ratio (a.r) Combinatorial Dominance.
Experimental Evaluation
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
Models of Influence in Online Social Networks
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Approximation Algorithms
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Although this may seem a paradox, all exact science is dominated by the idea of approximation. Bertrand Russell Approximation Algorithm.
Online Social Networks and Media
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Inferring Networks of Diffusion and Influence
Seed Selection.
Wenyu Zhang From Social Network Group
Nanyang Technological University
Finding Dense and Connected Subgraphs in Dual Networks
Independent Cascade Model and Linear Threshold Model
Heuristic & Approximation
Data Driven Resource Allocation for Distributed Learning
Greedy & Heuristic algorithms in Influence Maximization
Topics In Social Computing (67810)
A Study of Group-Tree Matching in Large Scale Group Communications
Moran Feldman The Open University of Israel
A paper on Join Synopses for Approximate Query Answering
Minimum Spanning Tree 8/7/2018 4:26 AM
Greedy Algorithm for Community Detection
Server Allocation for Multiplayer Cloud Gaming
Learning Influence Probabilities In Social Networks
Independent Cascade Model and Linear Threshold Model
Maximizing the Spread of Influence through a Social Network
Effective Social Network Quarantine with Minimal Isolation Costs
Coverage Approximation Algorithms
Algorithms for Budget-Constrained Survivable Topology Design
Cost-effective Outbreak Detection in Networks
3.3 Network-Centric Community Detection
Miniconference on the Mathematics of Computation
A History Sensitive Cascade Model in Diffusion Networks
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Ch09 _2 Approximation algorithm
Viral Marketing over Social Networks
The Complexity of Approximation
Independent Cascade Model and Linear Threshold Model
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Submodular Maximization with Cardinality Constraints
Blockchain Mining Games
Presentation transcript:

The Importance of Communities for Learning to Influence Eric Balkanski, et al in NIPS 2017 Presented by Mohit Jangid and Xiaohu Zhao

Introduction In social networks, influence maximization has been discussed for over a decade… First posed by Domingos and Richardson (2001) Formulated and further developed by Kempe, Kleinberg, and Tardos (2003) Identifying a set of influencers that can trigger a large cascade in the network.

Two Lines of Research The first line studies the problem associated with learning the generative model that produces cascades from observations The second line focuses on, assuming the generative model is known, selecting a set of influencers that has maximal influence

Focus of This Paper A simple algorithm for maximizing influence from training data leverage the strong community structure of social networks identify a set of individuals who are influentials but whose communities have little overlap Such an algorithm doesn’t have a bounded approximation guarantee generally.

Influence Process Standard independent cascade model A node a influences each of its neighbors b with some probability independently Each node only has one chance to influence each of its neighbors The number of nodes influenced after the influence process is considered as the cascade size

Influence Response Independent Cascade Model Diminishing returns *Figure from lecture 15, Diffusion of innovation and influence maximization, http://www.leonidzhukov.net/hse/2015/networks/lectures/lecture15.pdf

Cascade Propagation *Figure from lecture 15, Diffusion of innovation and influence maximization, http://www.leonidzhukov.net/hse/2015/networks/lectures/lecture15.pdf

Influence Maximization Problem Fully propagation is not guaranteed! NP-hard *Figure from lD. Kempe, J. Kleinberg, E. Tardos, 2003, 2005

The Learning to Influence Model Given a collection of samples with seed sets of nodes and the number of nodes influenced accordingly Each sample is generated from the influence process independently Goal: “Find a set S of size at most k which maximizes the expected number of nodes f(S) influenced by seed set S”

The Algorithm Ordering In decreasing order First order marginal contribution Pruning Acceptable overlaps Solution The first k nodes after pruning

The Algorithm Second order marginal contribution Cases Two nodes are in a community where any node influences all of community, then Two nodes are in two communities which are not connected in the network, then

Analyzing Community Structure The network is a random graph The nodes are partitioned in communities The edges are added independently P(intra-community edge) >> P(inter-community edge) The probability is identical in a same community In different communities, the edge probabilities are different *Figure from http://tuvalu.santafe.edu/~aaronc/courses/5352/fall2013/csci5352_2013_L16.pdf

Analyzing Community Structure The stochastic block model (SBM) with 4 communities Green nodes represent the optimal solution for influence maximization Red nodes represent the k first nodes in the ordering by first order marginal contributions without pruning “By removing the overlap, COPS obtains a diverse solution”

The Paper’s Claims “The authors provided performance guarantees under two settings: In the dense communities and small seed set setting, the approximation ratio is 1-o(1). It is a constant in the setting with tight and loose communities and non-ubiquitous seed sets.” The assumed graph model is Stochastic Block Model.

Approximation “A NP hard approximation for a algorithms implies how closely it is possible to approximate optimal solutions to the in polynomial time. For example, The Christofides algorithm is an algorithm for finding approximate solutions to the travelling salesman problem. It is an approximation algorithm that guarantees that its solutions will be within a factor of 3/2 of the optimal solution length.”

Claim 1 Piece 1: “[ER60] Assume C is a “dense" community, then the subgraph G[C] of G is connected with probability 1 − O(|C|-2).” Piece 2: “First order marginal contributions corresponds to the ordering by decreasing order of community sizes that nodes belong to.” v(a) >= (1- o(1)). |C|

Claim 1 cont.. Piece 3: “The algorithm does not pick two nodes in a same community.” All pieces combined = Theorem 6. “In the dense communities and small seed set setting, COPS with α- overlap allowed, for any constant α ∈ (0, 1) is a 1 − o(1)-approximation algorithm for learning to influence from samples from a bounded product distribution D.”

Giant component in Erdős–Rényi model “Giant components are a prominent feature of the Erdős–Rényi model of random graphs. In this model, if p < (1-c)/n for any constant c>0, then with high probability all connected components of the graph have size O(log n), and there is no giant component.” ---Loose Graph “However, for p < (1+c)/n there is with high probability a single giant component, with all other components having size O(log n).” --- TightGraph

Claim 2 Piece 1: “A tight community C contains w.h.p. (1-o(1)) a giant connected component with a constant fraction of the nodes from C.” ⇒ “Thus, a single node from a tight community influences a constant fraction of its community in expectation.” ⇒ “The ordering by first order marginal contributions thus ensures a constant factor approximation of the value from nodes in tight communities” Piece 2: “A node from a loose community influences only at most a constant number of nodes in expectation by using Galton-Watson branching processes”

Caim 2 cont... “We can now upper bound the value of the optimal solution S* . Let C1 , . . . , Ct be the t ≤ k tight communities that have at least one node in Ci that is in the optimal solution S* and that are of super-constant size, i.e., |C| = ω(1). Without loss, we order these communities in decreasing order of their size |Ci |.” Piece 3:. “Let S* be the optimal set of nodes and C i and t be defined as above. There exists a constant c such that f(S*) ≤ summation( |Ci | + c · k ) for i=1 to t.”

Claim 2 cont.. Piece 3: “Since the algorithm checks for overlap using second order marginal contributions, the algorithm picks at most one node from any tight community.” All pieces combined: “With overlap allowed α = 1/ poly(n), COPS is a constant factor approximation algorithm for learning to influence from samples drawn from a bounded product distribution D in the setting with tight and loose communities and non-ubiquitous seed sets.”

Experiment Four synthetic networks SBM1 One community is significantly larger than the others As number of nodes increase, the expected community size is fixed SBM2 All communities have same expected size As number of nodes increase, the number of communities is fixed Erdos–Rényi (ER) random graph Preferential attachment model (PA) Two real networks Facebook network 4k nodes, 88k edges Subgraph of the DBLP co-authorship network 54k nodes, 361k edges

Benchmarks The standard GREEDY algorithm Optimal MARGI Picks the k nodes with highest first order marginal contribution RANDOM Returns a random set

Empirical Evaluation

Empirical Evaluation

Conclusions This paper proposed a new algorithm for Influence Maximization problem as an independent cascade propagation model in discrete-time scale. Pros The algorithm studies Influence Maximization problem under learned influence functions which explores the line of work practically The algorithm is shown to perform well experimentally Cons The COPS algorithm should be compared with other heuristics based on community structure It’s better to have a conclusion at the end of paper

Questions?

References Balkanski, E., Immorlica, N., & Singer, Y. (2017). The Importance of Communities for Learning to Influence. In Advances in Neural Information Processing Systems (pp. 5864-5873). The Importance of Communities for Learning to Influence: Reviews, “https://media.nips.cc/nipsbooks/nipspapers/paper_files/nips30/reviews/2999.html” Giant component, “https://en.wikipedia.org/wiki/Giant_component#Giant_component_in_Erd%C5%91s%E2%80%93R%C3%A9nyi_model” Approximation algorithm, “https://en.wikipedia.org/wiki/Approximation_algorithm” Christofides algorithm, “https://en.wikipedia.org/wiki/Christofides_algorithm” Lecture 15, Diffusion of innovation and influence maximization, http://www.leonidzhukov.net/hse/2015/networks/lectures/lecture15.pdf Kempe, David, Jon Kleinberg, and Éva Tardos. "Maximizing the spread of influence through a social network." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003. Lecture 16, Network Analysis and Modeling, CSCI 5352 http://tuvalu.santafe.edu/~aaronc/courses/5352/fall2013/csci5352_2013_L16.pdf