Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greedy & Heuristic algorithms in Influence Maximization

Similar presentations


Presentation on theme: "Greedy & Heuristic algorithms in Influence Maximization"— Presentation transcript:

1 Greedy & Heuristic algorithms in Influence Maximization
Jingtao Zhu May 13rd,2016

2 “Efficient Influence Maximization in Social Networks”
Written by Chen Wei, Yajun Wang, and Siyu Yang. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.

3 Scenery Market a cool online application through social network;
limited budget- only select a small number of initial users; The company wishes that through the word-of-mouth effect a large population in the social network would adopt the application. The problem is whom to select as the initial users so that they eventually influence the largest number of people in the network, i.e., the problem of finding influential individuals in a social network.

4 Problem Description Find a small subset of nodes in a social network that could maximize the spread of influence. Kempe et al. prove that the optimization problem is NP-hard, and present a greedy approximation algorithm applicable to all three models. They also show through experiments that their greedy algorithm significantly outperforms the classic degree and centrality-based heuristics in influence spread. However, their algorithm has a serious drawback, which is its efficiency.

5 Two proposed solutions
1. Improved the original greedy algorithm, further reduce its running time. 2.Proposed new degree discount heuristics.

6 Solution I 1. Original greedy algorithm
General idea:In each round i, the algorithm adds one vertex into the selected set S such that this vertex together with current set S maximizes the influence spread (Line 10). Equivalently, this means that the vertex selected in round i is the one that maximizes the incremental influence spread in this round. To do so, for each vertex v ∈V/ S, the influence spread of S ∪ {v} is estimated with R repeated simulations of RanCas(S ∪ {v}) (Lines 3–9). Each calculation of RanCas(S) takes O(m) time, and thus takes O(knRm) time to complete.

7 How to reduce the running time?
CELF optimization Based on submodularity of the influence maximization objective. Submodularity property:when adding a vertex v into a seed set S, the incremental influence spread as the result of adding v is larger if S is smaller. In each round the incremental influence spread of a large number of nodes do not need to be re-evaluated because their values in the previous round are already less than that of some other node evaluated in the current round. 700 times faster

8 Improved Greedy :Independent Cascade Model

9 2.Improved Greedy: Independent cascade model
For Independent Cascade:Each node may be active or inactive;Time proceeds at discrete time-steps. At time t, every node v that became active in time t-1 actives a non-active neighbor w with probability Puv. If it fails, it does not try again.The same as the simple SIR model. 1.construct a graph G’ 2.Obtain G’ by removing all edges not for propagation from G with Pr(1-p) 3.Use DFS/BFS to find out the set of vertices reachable from S in G’ Same influence spread with 15-34% shorter run time

10 Weighted cascade model
The probability of u activating v is usually not the same as the probability of v activating u. Because of this, we build a directed graph G’ = (V, E’), in which each edge is replaced by two directed edges u~v and v~u. We still use dv to denote the degree of v in the original graph. The same idea from the IC model, in each round of the greedy algorithm when selecting a new vertex to be added into the existing seed set S, we generate R random directed graphs G0 = RanWC(Gˆ). For each vertex v and each graph G0, we want to compute |RG0 (S [ {v})|, and then average among all G0 to obtain the influence spread of S {v} and select v that maximizes this value.

11 How to solve the running time for WC
In the IC model, it takes O(m) time total since G′ is an undirected graph. In the WC model, G′is a directed graph, making the algorithm non-trivial. A straightforward implementation using BFS from all vertices take O(mn) time , which is not as good as O(mn) for sparse graphs such as social network graphs. Solution: adapt the randomized algorithm of Cohen for estimating the number of all reachable vertices from every vertex.

12 Cohen’s algorithm

13 3.Mixed-Greedy WC algorithm
First round uses NewGreedyWC algorithm ; the remaining rounds use the CELF optimization.

14

15 Solution II: Degree Discount Heuristics
let v be a neighbor of vertex u. If u has been selected as a seed, then when considering selecting v as a new seed based on its degree, we should not count the edge vu towards its degree. Thus we discount v’s degree by one due to the presence of u in the seed set, and we do the same discount on v’s degree for every neighbor of v that is already in the seed set. This is a basic degree discount heuristic applicable to all cascade models,

16 For a vertex v with tv neighbors already selected as seeds, we should discount v’s degree by
For example, for a node v with dv = 200, tv = 1, and p = 0.01 (parameters similar to our experimental graphs), we should discount v’s degree to about 196.

17 Assumptions and Limits
1.Not consider other factors, such as indirect influence effects and selected seeds affecting the neighbors of v. 2.Suppose the difference of those effects between the case tv = 0 and tv > 0 is negligible for small p.

18 Algorithms evaluation
Large academic collaboration graphs from online archival database arXiv.org. Co-author network: author-node edge: two authors collaborated The first network is from the "High Energy Physics - Theory" section with papers form to 2003, which contains n = 15, 233 nodes and m=58,891edges. The second network is from the full paperlist of the "Physics" section, denoted as NetPHY, which contains n = 37,154nodes and m =231,584edges.

19

20

21

22 Results SingleDiscount heuristic, although just a simple adjustment to the Degree heuristic, reduced approximately half of the gap between Greedy and Degree. NewGreedyIC and MixedGreedyIC essentially matches CELF- Greedy on both graphs. DegreeDiscountIC heuristic performs extremely well.

23 Summary Greedy running time is shortened by mixed greedy alg.
Digree discount Heuristic >> classic heuristic Heuristic > greedy

24 The current influence maximization problem is simplified,
without considering other features in the social networks, such as indirect influence effects and selected seeds affecting the neighbors of vertices.

25 Thanks!


Download ppt "Greedy & Heuristic algorithms in Influence Maximization"

Similar presentations


Ads by Google