Download presentation

Presentation is loading. Please wait.

Published byFelix Barnett Modified about 1 year ago

1
StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

2
2 Outline Background Preliminaries Motivation StaticGreedy algorithm Experiments

3
3 Information Cascade An action or idea are adopted one by one due to social influence – cascade through social relationships Main Applications – Word-of-Mouth marketing – Out-break detection – Popularity prediction social network

4
4 Word-of-Mouth Marketing To promote a product by seeding a few users; users adopting the product will recommend it Advantages: efficient; cost-effective Companyseed users follow-up activated users free product/ discount influence How to select the optimal seed users?

5
5 Influence Maximization for Viral Marketing Objective function – Influence spread I ( S ) : expected number of activated (influenced/adpoted) nodes – Maximize I ( S ) Input: – A social influence graph G=(V, E) – An information cascade model – An integer k, |S| ≤ k Output: A seed set S

6
6 Information Cascade Model Independent cascade (IC) model – each edge ( u, v ) has a propagation probability p ( u, v ) – each newly activated node u independently activates its out-neighbor v with probability p ( u, v ) – a discrete time model Influence spread estimation on IC model – Monte Carlo simulation – Heuristic methods Social influence graph [Leskovec, 2008]

7
7 Difficulties in Influence Maximization Greedy approximate algorithm [Kempe, KDD’03] ( 1-1/e-ε)-approximation iteratively select nodes with largest marginal influence spread guaranteed by submodularity and montonicity properties of influence spread function accurate inefficient Difficulty 1: Influence maximization problem is NP-hard. [kempe, KDD’03] Existing solutions Heuristics Degree Pagerank Betweennes efficient inaccurate

8
8 Difficulties in Influence Maximization Existing solutions Heuristic methods DegreeDiscount[Chen, KDD’09] CGA[Wang, KDD‘10] PMIA[Chen,KDD’10] IRIE[Jung, ICDM’12] efficient inaccurate Monte-Carlo simulation CELF optimization[Leskovec,KDD’07] NewGreedy[Chen, KDD’09] CELF++ optimization[Goyal,WWW’11] accurate time-consuming Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10] A scalability-accuracy delimma!

9
9 Our works Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma AlgorithmAccuracyScalability Approximate algorithms Greedy[Kempe, KDD’03]gurannteedlow CreedyCELF[Leskovec, KDD’07]gurannteedlow GreedyCELF++[Goyal, WWW’11]gurannteedlow NewGreedy /MixedGreedy [Chen, KDD’09]gurannteedlow StaticGreedy [cheng, CIKM’13] gurannteedhigh Heuristics Degreeungurannteedhigh PageRank[Page, 1999]ungurannteedhigh DegreeDiscount[Chen, KDD’09]ungurannteedhigh PMIA[Chen, KDD’10]ungurannteedhigh IRIE[Jung, ICDM’12]ungurannteedhigh SP1M[Kimura, PKDD’06]ungurannteedrelatively low

10
10 Preliminaries-1 Social influence graph: G=(V, E), n=|V|, m=|E| Influence spread: I(S) Marginal influence spread: M(v|S) = I(S {v}) - I(S) guarantee Greedy approximate algorithm – iteratively select nodes with the largest marginal influence spread – provide 1-1/e-ε approximation Properties of I(S) under independent cascade model – submodularity: I(S {v}) - I(S) I(T {v}) - I(S) iff v V, S T V – monotonicity: I(S {v}) I(S) Influence spread estimation

11
11 Preliminaries-2 Monte Carlo simulation for influence spread estimation – to approximate true values of influence spread by realizations methodAn instanceAdvantageDisadvantage simulation modeling the information cascade process relatively low time complexity estimate one seed set at a time snapshot [Chen, KDD’09] removing each edge ( u, v ) from G with probability 1- p ( u, v ) can estimate any seed set simultaneously relatively high time complexity equivalent

12
12 Motivation In existing greedy algorithms – a risk of unguaranteed submodularity and monotonicity of influence spread function influence graph snapshot1 snapshot 2 iteration 1 iteration 2 Submodularity is breaked! – caused by using different results of Monte Carlo simulation across different influence spread estimation – a very large value of R is required, e.g. R=20000 R: number of Monte Carlo simulations for estimation

13
13 StaticGreedy algorithm Core idea: to always use the same snapshots for influence spread estimation – influence spread function is submodular and monotone – a small value of R is required, e.g. R=100 Part1: Generate R static snapshots Part 2: Greedy selection

14
14 Performance analysis: Convergence rate provide (1-1/e-ε)-approximation with a small value of R d R,k log R seed set size = 50 NetHEPT: a benchmark network uniform independent cascade (UIC) model: p(u, v) = p = 0.01 weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)

15
15 Performance analysis: Scalability log R min seed set size log running time (sec) ≈10 3 times ≈10 2 times Minimal R required Running time R is significantly reducedRunning time is significantly reduced

16
16 Performance analysis: Complexity n : number of nodes in social influence graph m : number of edges in social influence graph m’ : expected number of edges in a snapshot

17
17 Speed up StaticGreedy A dynamic update strategy – calculates the marginal gain in an efficient incremental manner at each step t, for each snapshot: M(v) M(v) - |R(v) R(v t *)|, R(v) R(v) - R(v) R(v t *) – trades space for time v2v2 v1v1 v3v3 v4v4 v5v5 v6v6 v7v7 v8v8 M(v 1 )=4 M(v 2 )=3 M(v 3 )=2 M(v 4 )=1 M(v 5 )=1 M(v 6 )=1 M(v 7 )=2 M(v 8 )=1 v1v1 snapshot initial R(v): reachable nodes from v in the snapshot

18
18 Speed up StaticGreedy A dynamic update strategy – calculates the marginal gain in an efficient incremental manner at each step t, for each snapshot: M(v) M(v) - |R(v) R(v t *)|, R(v) R(v) - R(v) R(v t *) – trades space for time v2v2 v1v1 v3v3 v4v4 v5v5 v6v6 v7v7 v8v8 M(v 1 )=4 M(v 2 )=3 M(v 3 )=2 M(v 4 )=1 M(v 5 )=1 M(v 6 )=1 M(v 7 )=2 M(v 8 )=1 M(v 1 )=0 M(v 2 )=2 M(v 3 )=0 M(v 4 )=0 M(v 5 )=1 M(v 6 )=0 M(v 7 )=2 M(v 8 )=1 v1v1 directly update snapshot after select v * = v 1 R(v): reachable nodes from v in the snapshot -4 -2

19
19 Experiments: setup Algorithms: – Our algorithms: StaticGreedyCELF, StaticGreedyDU – Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount Tested datasets Independent cascade models – uniform independent cascade(UIC) model: p( u, v) = p = 0.01 – weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v) Metrics: Influence spread, running time

20
20 Experiments: influence spread StaticGreedy achieves better accuracy than other heuristics NetPHY DBLP UIC model WIC model

21
21 Experiments: running time StaticGreedy runs >10 3 times faster than CELFGreedy StaticGreedy has comparable scalability to state-of-the-art heuristics StaticGreedyDU always runs faster than StaticGreedyCELF log running time (sec) UIC model WIC model

22
22 conclusion Essential reason of the inefficiency of existing greedy algorithms – a risk of unguaranteed submodularity and monotonicity – caused by different Monte Carlo simulations across different estimations – a very large value of R is required guaranteed accuracy + inefficiency StaticGreedy algorithm – guaranteed submodularity and monotonicity – using the same Monte Carlo simulations across different estimations – a small value of R is required guaranteed accuracy + high scalability – runs >10 3 times quicker than conventional greedy algorithms A dynamic update strategy to speed up StaticGreedy – about 10 times faster

23
23 Thank you! Q & A

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google