Download presentation

Presentation is loading. Please wait.

Published byFelix Barnett Modified about 1 year ago

1
**StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization**

Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

2
**Outline Background Preliminaries Motivation StaticGreedy algorithm**

Experiments

3
Information Cascade An action or idea are adopted one by one due to social influence cascade through social relationships Main Applications Word-of-Mouth marketing Out-break detection Popularity prediction social network

4
**Word-of-Mouth Marketing**

To promote a product by seeding a few users; users adopting the product will recommend it Advantages: efficient; cost-effective Company seed users follow-up activated users How to select the optimal seed users? free product/ discount influence

5
**Influence Maximization for Viral Marketing**

Objective function Influence spread I(S) : expected number of activated (influenced/adpoted) nodes Maximize I(S) Input: A social influence graph G=(V, E) An information cascade model An integer k, |S| ≤ k Output: A seed set S

6
**Information Cascade Model**

Independent cascade (IC) model each edge (u, v) has a propagation probability p(u, v) each newly activated node u independently activates its out-neighbor v with probability p(u, v) a discrete time model Influence spread estimation on IC model Monte Carlo simulation Heuristic methods 0.2 0.1 0.1 0.3 0.1 0.5 0.2 0.5 0.1 0.4 0.3 0.4 0.4 0.2 0.1 Social influence graph [Leskovec, 2008]

7
**Difficulties in Influence Maximization**

Difficulty 1: Influence maximization problem is NP-hard.[kempe, KDD’03] Existing solutions Heuristics Degree Pagerank Betweennes efficient inaccurate Greedy approximate algorithm [Kempe, KDD’03] (1-1/e-ε)-approximation iteratively select nodes with largest marginal influence spread guaranteed by submodularity and montonicity properties of influence spread function accurate inefficient

8
**Difficulties in Influence Maximization**

Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10] Monte-Carlo simulation CELF optimization[Leskovec,KDD’07] NewGreedy[Chen, KDD’09] CELF++ optimization[Goyal,WWW’11] accurate time-consuming Heuristic methods DegreeDiscount[Chen, KDD’09] CGA[Wang, KDD‘10] PMIA[Chen,KDD’10] IRIE[Jung, ICDM’12] efficient inaccurate Existing solutions A scalability-accuracy delimma!

9
Our works Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma Algorithm Accuracy Scalability Approximate algorithms Greedy [Kempe, KDD’03] gurannteed low CreedyCELF [Leskovec, KDD’07] GreedyCELF++ [Goyal, WWW’11] NewGreedy /MixedGreedy [Chen, KDD’09] StaticGreedy [cheng, CIKM’13] high Heuristics Degree ungurannteed PageRank [Page, 1999] DegreeDiscount PMIA [Chen, KDD’10] IRIE [Jung, ICDM’12] SP1M [Kimura, PKDD’06] relatively low

10
**Preliminaries-1 Social influence graph: G=(V, E), n=|V|, m=|E|**

Influence spread: I(S) Marginal influence spread: M(v|S)=I(S{v}) - I(S) Properties of I(S) under independent cascade model submodularity: I(S{v}) - I(S) I(T{v}) - I(S) iff vV, S T V monotonicity: I(S{v}) I(S) guarantee Greedy approximate algorithm iteratively select nodes with the largest marginal influence spread provide 1-1/e-ε approximation Influence spread estimation

11
**Preliminaries-2 Monte Carlo simulation for influence spread estimation**

to approximate true values of influence spread by realizations method An instance Advantage Disadvantage simulation modeling the information cascade process relatively low time complexity estimate one seed set at a time snapshot [Chen, KDD’09] removing each edge (u, v) from G with probability 1-p(u, v) can estimate any seed set simultaneously relatively high time complexity equivalent

12
**R: number of Monte Carlo simulations for estimation**

Motivation In existing greedy algorithms a risk of unguaranteed submodularity and monotonicity of influence spread function caused by using different results of Monte Carlo simulation across different influence spread estimation a very large value of R is required, e.g. R=20000 R: number of Monte Carlo simulations for estimation iteration 1 iteration 2 Submodularity is breaked! influence graph snapshot1 snapshot 2

13
**StaticGreedy algorithm**

Core idea: to always use the same snapshots for influence spread estimation influence spread function is submodular and monotone a small value of R is required, e.g. R=100 Part1: Generate R static snapshots Part 2: Greedy selection

14
**Performance analysis: Convergence rate**

provide (1-1/e-ε)-approximation with a small value of R seed set size = 50 dR,k log R NetHEPT: a benchmark network uniform independent cascade (UIC) model: p(u, v) = p = 0.01 weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)

15
**Performance analysis: Scalability**

Minimal R required Running time ≈102 times ≈103 times log Rmin log running time (sec) seed set size seed set size R is significantly reduced Running time is significantly reduced

16
**Performance analysis: Complexity**

n: number of nodes in social influence graph m: number of edges in social influence graph m’: expected number of edges in a snapshot

17
**Speed up StaticGreedy A dynamic update strategy**

calculates the marginal gain in an efficient incremental manner at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*) trades space for time R(v): reachable nodes from v in the snapshot initial v1 v1 v2 M(v1)=4 M(v2)=3 M(v3)=2 M(v4)=1 M(v5)=1 M(v6)=1 M(v7)=2 M(v8)=1 v3 v4 v5 v6 v7 v8 snapshot

18
**Speed up StaticGreedy X X X X A dynamic update strategy**

calculates the marginal gain in an efficient incremental manner at each step t, for each snapshot: M(v) M(v) - |R(v)R(vt*)|, R(v) R(v) - R(v)R(vt*) trades space for time R(v): reachable nodes from v in the snapshot after select v* = v1 X -4 v1 v1 v2 -1 M(v1)=4 M(v2)=3 M(v3)=2 M(v4)=1 M(v5)=1 M(v6)=1 M(v7)=2 M(v8)=1 M(v1)=0 M(v2)=2 M(v3)=0 M(v4)=0 M(v5)=1 M(v6)=0 M(v7)=2 M(v8)=1 X X -2 v3 -1 v4 v5 directly update X -1 v6 v7 v8 snapshot

19
**Experiments: setup Algorithms: Tested datasets**

Our algorithms: StaticGreedyCELF, StaticGreedyDU Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount Tested datasets Independent cascade models uniform independent cascade(UIC) model: p(u, v) = p = 0.01 weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v) Metrics: Influence spread, running time

20
**Experiments: influence spread**

StaticGreedy achieves better accuracy than other heuristics NetPHY UIC model WIC model DBLP UIC model WIC model

21
**Experiments: running time**

StaticGreedy runs >103 times faster than CELFGreedy StaticGreedy has comparable scalability to state-of-the-art heuristics StaticGreedyDU always runs faster than StaticGreedyCELF log running time (sec) UIC model WIC model

22
conclusion Essential reason of the inefficiency of existing greedy algorithms a risk of unguaranteed submodularity and monotonicity caused by different Monte Carlo simulations across different estimations a very large value of R is required guaranteed accuracy + inefficiency StaticGreedy algorithm guaranteed submodularity and monotonicity using the same Monte Carlo simulations across different estimations a small value of R is required guaranteed accuracy + high scalability runs >103 times quicker than conventional greedy algorithms A dynamic update strategy to speed up StaticGreedy about 10 times faster

23
Thank you! Q & A

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google