Efficient Influence Maximization in Large-scale Social Networks

Efficient Influence Maximization in Large-scale Social Networks
计算社会科学与社会计算前沿研讨会 Efficient Influence Maximization in Large-scale Social Networks Chuan Zhou (周川) Institute of Information Engineering, CAS August 27, 2016

1 Introduction Contents Efficient Influence Maximization in Large-scale Social Networks 2 Upper Bound Method 4 Network Coarsening Method 3 Subgraph Stream Method

1.1 Background Social networks are popularly used
Viral marketing Information dissemination Technology/Idea transfers Influence propagation Influence maximization Community detection Influence inference Early warning of public opinion Link Prediction/Friends Recommendation Partner Recommendation/Social Cooperation/Team Formation

1.2 Problem Formulation Challenges:
Given a directed social graph G=(V,E), a budget k, and a stochastic propagation model M, finding k nodes, such that the expected spread of the influence can be maximized [Kempe KDD’03] Challenges: How to measure the objective function M(S) ? How to find the optimal solution, i.e., the subset k of the most influential nodes?

1.2 Problem Formulation How to measure the influence M(S) ?
.3 How to measure the influence M(S) ? Stochastic propagation models IC model LT model Other propagation models: e.g. continuous time IC or LT models Monte-Carlo (MC) simulation Exact calculation under IC and LT is #P-hard (Chen, KDD’ 10). .1 c .3 .1 .2 .1 a e .3 .4 f .2 .4 .1 .4 .3 h d .1 .2 .1 .2 g .4 I .4 .1 IC propagation model #P-hard

1.3 Greedy Algorithm How to find a subset k containing the most influential nodes Influence maximization under both IC and LT models is NP-hard . (Kemp, KDD’03) Property 1: M(S) is monotone: Property 2: M(S) is submodular : The set cover problem

1.3 Greedy Algorithm Advantage: Performance guarantee of 1− 1/e ≈ 63%
Disadvantage: Heavy computation cost Inner loop： M(S) needs many Monte-Carlo simulations Outer loop：time complexity of O(Nk), where N is network size

1.4 Heuristic Algorithms Advantage: faster than the Greedy Algorithm
ShortestPath: Kimura and Saito (PKDD’06) “Tractable models for information diffusion in social networks” DegreeDiscount: Chen et al. (KDD'09) “Efficient influence maximization in social networks” MIA: Chen et al. (KDD'10) “Scalable influence maximization for prevalent viral marketing in large-scale social networks” DAG: Chen et al. (ICDM’10) “Scalable influence maximization in social networks under the linear threshold model” SIMPATH： Goyal et al. (ICDM’11)“SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model” d e f g Shortest Path from a to c Node 2’s degree will shrink Advantage: faster than the Greedy Algorithm Disadvantage: no performance guarantee 2 5 DegreeDiscount

1.5 Advanced Greedy Greedy algorithm Advanced greedy algorithms reward
CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” CELF++：Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” StaticGreedy：Cheng et al. (CIKM’13) “StaticGreedy: solving the scalability-accuracy dilemma in influence maximization ” Greedy algorithm reward a d b b a c e c d e

1.5 Advanced Greedy Greedy algorithm CELF algorithm
Advanced greedy algorithms CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” CELF++：Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” StaticGreedy：Cheng et al. (CIKM’13) “StaticGreedy: solving the scalability-accuracy dilemma in influence maximization ” Greedy algorithm CELF algorithm reward reward d a d a b b b a b a c e c e c c d d e e

Advanced greedy algorithms CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” CELF++：Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” StaticGreedy：Cheng et al. (CIKM’13) “StaticGreedy: solving the scalability-accuracy dilemma in influence maximization ” Greedy algorithm CELF algorithm reward reward d a d a b b b b a a c e c e c c d d e e

Advanced greedy algorithms CELF： Leskovec et al. (KDD'07) “Cost-effective outbreak detection in networks” CELF++：Goyal et al. (WWW’11) “CELF++: optimizing the greedy algorithm for influence maximization in social networks” StaticGreedy：Cheng et al. (CIKM’13) “StaticGreedy: solving the scalability-accuracy dilemma in influence maximization ” Greedy algorithm CELF algorithm reward reward d a d a Advantage: by the submodularity, CELF reduces the Monte-Carlo calls and improves the greedy algorithm by up to 700 times Disadvantage: needs N Monte Carlo simulations to initialize the upper bound, where N is the network size. Not cater for large networks! b b b d a a b e c e c c e d c e

1.6 Our Ideas Demand for efficient and accurate solutions to influence maximization Algorithm Level： Upper Bound Data Level： Subgraph Stream Model Level： Network Coarsening We explore new upper bounds to significantly reduce the number of MC simulations in GREEDY, especially at the initial step. We break down a big networks into a series of small subgraphs and continuously estimate the influence spread on these subgraphs as data streams. We present a new network coarsening model to learn coarsened networks, which are small and tractable for influence maximization. Efficient Influence Maximization in Large-scale Social Networks

2.1 Motivation Can we initialize the upper bounds without actually computing the MC simulations ? CELF algorithm UBLF algorithm UBLF algorithm Node upper bound MC a 2.1 1 b 1.5 c 1.1 d 1.8 e 1.2 Node Upper bound MC a 2.3 b 1.6 c 1.2 d 1.9 e 1.3

2.2 Upper Bound of M(S) Local view Global view How many heads?
Proposition 2 establishes a relationship among the activation probabilties in time t and t+1.

But we know its upper bound!
2.2 Upper Bound of M(S) M(S) is bounded by a sum of series. In what condition the series convergent? and what is the limit? Too hard! Its aera? But we know its upper bound!

2.2 Upper Bound of M(S) Convergent condition：the total influence to or from any node is less than 1. Under condition (14), we get a tractable upper bound. + …… =

2.3 UBLF Algorithm CELF: the first round is time-consuming, needs full MC simulations. UBLF: the first round is analytically calculated.

2.4 Example for UBLF Monte-Carlo Simulation Node 1 is selected!
(only 1 time MC simulation)

2.5 Experiments Data collection Benchmark Digger Twitter Epinions
Small-world Benchmark CELF Degree DegreeDiscount MIA SP1M Evaluations of upper bounds

2.5 Experiments Comparison results (Numbers of MC simulations)
Observation: The total MC calls of UBLF is significantly reduced compared to CELF.

2.5 Experiments Comparison results (Influence spread) Observations:
The spreads of UBLF and CELF are completely identical, which explains again that UBLF and CELF share the same logic in selecting nodes.

2.5 Experiments Comparison results (Time cost) Observation:
UBLF is 2-10 times faster than CELF.

3.1 Motivations Can we deal with the whole network like data streams?
Can we restore results of original network from results on the subgraph streams?

3.2 Expression of Influence Spread
How to obtain strongly connected components of the original networks from those of subgraphs?

3.3 Joins of SCCs (A) (A) (B) (B) (C) (C) (D) (D)

3.4 Influence Spread Estimation
More Combinations ! ? Reasonable?

Only keep those random variables with orthogonal paths How many?

3.5 Subgraph Incremental Algorithm

3.6 Experiments Data collection Benchmark Statistics of data
ego-Facebook ca-HepPh ca-CondMat -Enron Benchmark CELF Staticgreedy DegreeDiscount PageRank PMIA

3.6 Experiments Comparison results (Influence spread) Observations:
The spreads of Ours, StaticGreedy and CELF are nearly the same, which explains that our method to estimate the influence spread is accurate.

3.6 Experiments Comparison results (Time cost) Observation:
Ours is more than 100 times faster than CELF and is nearly 5-10 times faster than the StaticGreedy.

1 Introduction Contents Efficient Influence Maximization in Large-scale Social Networks 4 Network Coarsening Method 2 Upper Bound Method 3 Subgraph Stream Method

4.1 Motivations Can we coarsen the networks in order to obtain a smaller equivalent representation? Can we solve the influence maximization problem using the smaller representation?

4.1 Motivations Can we coarsen the networks in order to obtain a smaller equivalent representation? Can we solve the influence maximization problem using the smaller representation? 9 1 2 4 3 5 8 6 7 13 12 10 11 0.221 0.128 0.097 0.184 0.204 0.216 0.265 0.305 0.051 0.141 0.241 0.028 0.045 2,4,10,9,12 1,3 5,8 2’ 5’ 1’ (1) coarsen (2) solve (3) project

4.2 Problem Statement Given a network G=(V,E) and a network coarsening rate 0< α < 1, the proposed Network Coarsening aims to infer a smaller coarsened network Gcoarsen =(V’,E’) by merging the nodes that tightly connect each other, such that and to find a smaller equivalent network to describe the original large yet sparse network.

4.2 Problem Statement Original large-scale network Static network structures Coarsening small network Dynamic network information spreading data Ideas: if two nodes tightly connect with each other and co-occur frequently in the same information cascade, they are likely to be combined together We called it “semi-data-driven coarsening method”

4.3 Challenges C1: Network structure data and network information spreading data are heterogeneous data that need to be modeled jointly. How to maintain the diffusive characteristics after nodes merging? C2: Network Coarsening leads to a complicated optimization problem which requires efficient algorithm.

4.4 Proposed Method

4.4 Proposed Method: Heterogeneous data
1st step: Model dynamic information spreading cascades

4.4 Proposed Method: Heterogeneous data
2nd step: Model static network structures (graph regularization) 3rd step: Formulate node label distribution learning function It simultaneously minimize the graph regularization on the static network structures and maximize the network coarsening likelihood on information spreading data.

4.4 Proposed Method: Efficient algorithm
Here we adapt the Accelerated Proximal Gradient (APG) [Beck and Teboulle, 2009] to learn the optimal node label distribution Y. Based on optimal Y, we merge nodes with same label. Assign new weights.

4.5 Algorithm Flowchart Step 1: Coarsen the network to get a quick picture Step 2: Solve the influence maximization problem in the coarsened network. (such as Greedy algorithm) 12 0.221 9 9 13 13 0.128 13 11 1,3 10 11 0.097 (1) coarsen 1’ 0.204 0.045 0.051 0.184 1 2 0.305 7 2’ 7 0.216 4 0.265 0.241 3 2,4,10,9,12 0.141 0.097 6 5 6 5’ 0.216 8 0.028 5,8 (3) project (2) solve

4.5 Algorithm Flowchart Step 3: Project the solutions back to the original network. (i) Randomly select (ii) Select node as (iii) Select node as , where 12 0.221 9 13 0.128 13 11 1,3 10 11 0.097 (1) coarsen 1’ 0.204 0.045 0.051 0.184 1 2 0.305 7 2’ 7 0.216 4 0.265 0.241 3 2,4,10,9,12 0.141 0.097 6 5 6 5’ 0.216 8 0.028 5,8 (3) project (2) solve

4.6 Experiments: Influence estimation
Metrics: (1) the average influence spread error rate (2) the running time

4.6 Experiments: Influence estimation
Observations: The average influence spread error rate of ours is much smaller than the baselines Our method uses more time than its peers To sum up, our method outperforms baseline methods in terms of accuracy without significantly raising time cost

4.6 Experiments: Influence maximization
Baseline methods: CFSInflu method proposed in the paper Random-based CoarseNet-based [Purohit et al., 2014] Data-driven-based, InfluLearner-based [Du et al., 2014] PMIA [Chen et al., 2010] InfluLearner-based and PMIA methods are carried out directly on the original network; Random-based, CoarseNet-based and Data-driven-based methods work on the coarsened network. How do we perform influence maximization analysis? What can we obtain compared with the methods conducted on the original network?

4.6 Experiments: Influence maximization
Observations: Influence spread derived from the ours approximates the result on the original network. Meanwhile, ours outperforms all the baseline methods that conducted on the coarsened networks. Our method runs orders of magnitude faster than the InfluLearner-based or PMIA methods while maintaining the influence spread results.

Conclusions Background Problem Formulation Greedy Algorithm
Heuristic Algorithms: MIA, SimplePath, PageRank, et al. Advanced Greedy Algorithms: CELF, CELF++, et al. Upper Bound Subgraph Stream Our Solutions Network Coarsening

References (TKDE-15) Chuan Zhou, Peng Zhang, Wenyu Zang, and Li Guo. On the Upper Bounds of Spread for Greedy Algorithms in Social Network Influence Maximization. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No.10, 2015, (CCF-A) (IJCAI-15) Wei-Xue Lu, Peng Zhang, Chuan Zhou, Chunyi Li, and Li Gao. Big Network Influence Maximization: An Incremental Algorithm for Streaming Subgraph Influence Spread Estimation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, (CCF-A) (IJCAI-16) Li Gao, Jia Wu, Hong Yang, Zhi Qiao, Chuan Zhou*, and Yue Hu. Semi-Data-Driven Network Coarsening. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, (CCF-A)

Thank You! 周川 Tel: 地址：北京市海淀区闵庄路甲89号

Efficient Influence Maximization in Large-scale Social Networks

Similar presentations

Presentation on theme: "Efficient Influence Maximization in Large-scale Social Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Influence Maximization in Large-scale Social Networks

Similar presentations

Presentation on theme: "Efficient Influence Maximization in Large-scale Social Networks"— Presentation transcript:

Similar presentations

About project

Feedback