Suqi Cheng Research Center of Web Data Sciences & Engineering

StaticGreedy: Solving the Scalability-Accuracy Dilemma in Influence Maximization
Suqi Cheng Research Center of Web Data Sciences & Engineering Institute of Computing Technology, Chinese Academy of Sciences Authors: Suqi Cheng, Huawei Shen, Junming Huang, Guoqing Zhang, Xueqi Cheng

Outline Background Preliminaries Motivation StaticGreedy algorithm
Experiments

Information Cascade An action or idea are adopted one by one due to social influence cascade through social relationships Main Applications Word-of-Mouth marketing Out-break detection Popularity prediction social network

Word-of-Mouth Marketing
To promote a product by seeding a few users; users adopting the product will recommend it Advantages: efficient; cost-effective Company seed users follow-up activated users How to select the optimal seed users? free product/ discount influence

Influence Maximization for Viral Marketing
Objective function Influence spread I(S) : expected number of activated (influenced/adpoted) nodes Maximize I(S) Input: A social influence graph G=(V, E) An information cascade model An integer k, |S| ≤ k Output: A seed set S

Information Cascade Model
Independent cascade (IC) model each edge (u, v) has a propagation probability p(u, v) each newly activated node u independently activates its out-neighbor v with probability p(u, v) a discrete time model Influence spread estimation on IC model Monte Carlo simulation Heuristic methods 0.2 0.1 0.1 0.3 0.1 0.5 0.2 0.5 0.1 0.4 0.3 0.4 0.4 0.2 0.1 Social influence graph [Leskovec, 2008]

Difficulties in Influence Maximization
Difficulty 1: Influence maximization problem is NP-hard.[kempe, KDD’03] Existing solutions Heuristics Degree Pagerank Betweennes efficient inaccurate Greedy approximate algorithm [Kempe, KDD’03] (1-1/e-ε)-approximation iteratively select nodes with largest marginal influence spread guaranteed by submodularity and montonicity properties of influence spread function accurate inefficient

Difficulties in Influence Maximization
Difficulty 2: To exactly compute influence spread is #P-hard. [Chen, KDD’10] Monte-Carlo simulation CELF optimization[Leskovec,KDD’07] NewGreedy[Chen, KDD’09] CELF++ optimization[Goyal,WWW’11] accurate time-consuming Heuristic methods DegreeDiscount[Chen, KDD’09] CGA[Wang, KDD‘10] PMIA[Chen,KDD’10] IRIE[Jung, ICDM’12] efficient inaccurate Existing solutions A scalability-accuracy delimma!

Our works Objective : to propose an influence maximization algorithm to solve the scalability-accuracy dilemma Algorithm Accuracy Scalability Approximate algorithms Greedy [Kempe, KDD’03] gurannteed low CreedyCELF [Leskovec, KDD’07] GreedyCELF++ [Goyal, WWW’11] NewGreedy /MixedGreedy [Chen, KDD’09] StaticGreedy [cheng, CIKM’13] high Heuristics Degree ungurannteed PageRank [Page, 1999] DegreeDiscount PMIA [Chen, KDD’10] IRIE [Jung, ICDM’12] SP1M [Kimura, PKDD’06] relatively low

Preliminaries-1 Social influence graph: G=(V, E), n=|V|, m=|E|
Influence spread: I(S) Marginal influence spread: M(v|S)=I(S{v}) - I(S) Properties of I(S) under independent cascade model submodularity: I(S{v}) - I(S)  I(T{v}) - I(S) iff vV, S  T  V monotonicity: I(S{v})  I(S) guarantee Greedy approximate algorithm iteratively select nodes with the largest marginal influence spread provide 1-1/e-ε approximation Influence spread estimation

Preliminaries-2 Monte Carlo simulation for influence spread estimation
to approximate true values of influence spread by realizations method An instance Advantage Disadvantage simulation modeling the information cascade process relatively low time complexity estimate one seed set at a time snapshot [Chen, KDD’09] removing each edge (u, v) from G with probability 1-p(u, v) can estimate any seed set simultaneously relatively high time complexity equivalent

R: number of Monte Carlo simulations for estimation
Motivation In existing greedy algorithms a risk of unguaranteed submodularity and monotonicity of influence spread function caused by using different results of Monte Carlo simulation across different influence spread estimation a very large value of R is required, e.g. R=20000 R: number of Monte Carlo simulations for estimation iteration 1 iteration 2 Submodularity is breaked! influence graph snapshot1 snapshot 2

StaticGreedy algorithm
Core idea: to always use the same snapshots for influence spread estimation influence spread function is submodular and monotone a small value of R is required, e.g. R=100 Part1: Generate R static snapshots Part 2: Greedy selection

Performance analysis: Convergence rate
provide (1-1/e-ε)-approximation with a small value of R seed set size = 50 dR,k log R NetHEPT: a benchmark network uniform independent cascade (UIC) model: p(u, v) = p = 0.01 weighted independent cascade (WIC) model: p(u, v) = 1/(# of in-neighbors of v)

Performance analysis: Scalability
Minimal R required Running time ≈102 times ≈103 times log Rmin log running time (sec) seed set size seed set size R is significantly reduced Running time is significantly reduced

Performance analysis: Complexity
n: number of nodes in social influence graph m: number of edges in social influence graph m’: expected number of edges in a snapshot

Speed up StaticGreedy A dynamic update strategy
calculates the marginal gain in an efficient incremental manner at each step t, for each snapshot: M(v)  M(v) - |R(v)R(vt*)|, R(v)  R(v) - R(v)R(vt*) trades space for time R(v): reachable nodes from v in the snapshot initial v1 v1 v2 M(v1)=4 M(v2)=3 M(v3)=2 M(v4)=1 M(v5)=1 M(v6)=1 M(v7)=2 M(v8)=1 v3 v4 v5 v6 v7 v8 snapshot

Speed up StaticGreedy X X X X A dynamic update strategy
calculates the marginal gain in an efficient incremental manner at each step t, for each snapshot: M(v)  M(v) - |R(v)R(vt*)|, R(v)  R(v) - R(v)R(vt*) trades space for time R(v): reachable nodes from v in the snapshot after select v* = v1 X -4 v1 v1 v2 -1 M(v1)=4 M(v2)=3 M(v3)=2 M(v4)=1 M(v5)=1 M(v6)=1 M(v7)=2 M(v8)=1 M(v1)=0 M(v2)=2 M(v3)=0 M(v4)=0 M(v5)=1 M(v6)=0 M(v7)=2 M(v8)=1 X X -2 v3 -1 v4 v5 directly update X -1 v6 v7 v8 snapshot

Experiments: setup Algorithms: Tested datasets
Our algorithms: StaticGreedyCELF, StaticGreedyDU Baselines: CELFGreedy, SP1M, PMIA, Degree, DegreeDiscount Tested datasets Independent cascade models uniform independent cascade(UIC) model: p(u, v) = p = 0.01 weighted independent cascade(WIC) model: p(u, v) = 1/(# of in-neighbors of v) Metrics: Influence spread, running time

Experiments: influence spread
StaticGreedy achieves better accuracy than other heuristics NetPHY UIC model WIC model DBLP UIC model WIC model

Experiments: running time
StaticGreedy runs >103 times faster than CELFGreedy StaticGreedy has comparable scalability to state-of-the-art heuristics StaticGreedyDU always runs faster than StaticGreedyCELF log running time (sec) UIC model WIC model

conclusion Essential reason of the inefficiency of existing greedy algorithms a risk of unguaranteed submodularity and monotonicity caused by different Monte Carlo simulations across different estimations a very large value of R is required  guaranteed accuracy + inefficiency StaticGreedy algorithm guaranteed submodularity and monotonicity using the same Monte Carlo simulations across different estimations a small value of R is required  guaranteed accuracy + high scalability runs >103 times quicker than conventional greedy algorithms A dynamic update strategy to speed up StaticGreedy about 10 times faster

Thank you! Q & A

Suqi Cheng Research Center of Web Data Sciences & Engineering

Similar presentations

Presentation on theme: "Suqi Cheng Research Center of Web Data Sciences & Engineering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Suqi Cheng Research Center of Web Data Sciences & Engineering

Similar presentations

Presentation on theme: "Suqi Cheng Research Center of Web Data Sciences & Engineering"— Presentation transcript:

Similar presentations

About project

Feedback