Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1

Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Community-based Greedy Algorithm for Mining Top-K Influential Nodes in Mobile Social Networks Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1 1 Peking University, China 2 Nanyang Technological University, Singapore

Problem and Background
Problem: Given a mobile social network, we aim to mine a set of top-K influential nodes S such that R(S) is maximized using the extended Independent Cascade information diffusion model. A mobile social network plays an essential role as the spread of information and influence in the form of "word-of-mouth“ The problem is NP-hard. computationally expensive to run the greedy algorithm on a large network. The previous greedy algorithms take days to finish on 723k nodes a network with 723k nodes in our experiments.

Basic Idea of the Algorithm
Dynamic programming Algorithm & greedy algorithm on selected communities Construct Network from CDR (call detailed record) Community Detection: it based on diffusion Model on MSN

Step1: Extracting Mobile Social Network
Extract a Mobile Social Network from CDR data and model it as a directed weighted graph A phone user -- a node A directed edge u  v is established, if there exits communication from u to v communication time -- the weight of the edge A phone user corresponds to a node A directed edge from node u to node v is established, if there exits communication from u to v corresponding communication time as the weight of the edge

Extended Independent Cascade Model
Two states of nodes Active & inactive Diffusion speed λ When an active node vi contacts an inactive node vj , the inactive node becomes active at a probability (rate) λij.

Extended Independent Cascade Model

Step2: Influential Model Based Community Detection Algorithm
Community Partition Each node is assigned a unique community label from 1 to N For each node compute the set of its influenced neighbors using Independent Cascade diffusion model Iteratively propagate the labels through the network in finite iterations for each node v ,the label of the community that the majority of its influenced neighbors belong to  the label of v Community Combination the difference between the node’s influence degree in its community and its influence degree in the network is smaller than a threshold. t denotes the tth iteration sv denotes the number of neighneighbors that are influenced by v ui (i ∈ [1, sv]) represents an influenced neighbor of node v ui.Ct−1 represents the community label of ui at iteration t-1 maxCMT is to compute the majority label of ui.Ct−1

Step3: Community-Based Greedy Algorithm
Choose communities to find the Top-1 influential node C1 C2 ΔR2=0.3 ΔR1=0.2 R[1,1]=max{R[0,1], R[3,0]+ΔR1}=0.2 s[1,1]=C1; R[2,1]=max{R[1,1], R[3,0]+ ΔR2}=0.3 s[2,1]=C2; R[3,1]=max{R[2,1], R[3,0]+ ΔR3}=0.3 s[3,1]=C2; So we mine top-1 node in C2 ΔR3=0.1 C3

Community-Based Greedy Algorithm
Choose communities to find the Top-2 influential node C1 C2 ΔR2=0.06 ΔR1=0.2 Note ΔR2 is 0.06, but not 0.3. R[1,2]= max{R[0,2], R[3,1]+ΔR1}=0.5 s[1,2]=C1; R[2,2]= max{R[1,2], R[3,1]+ΔR2}=0.5 s[2,2]=C1; R[3,2]= max{R[2,2], R[3,1]+ΔR3}=0.5 s[3,2]=C1; We mine the second node in C1 ΔR3=0.1 C3

CGA:Community-Based Greedy Algorithm
Θ and Δd are two constants

Experiments Data Sets Extract a Mobile Social Network from a three-month CDR (call detailed record) data of a city from China Mobile Node number: 723,201 Average degree: 13.4

Community distribution
largest community size: 95,690

Experiments Top-k Nodes Mining Methods Parameter study:
MixedGreedy Algorithm NewGreedy Algorithm DegreeDiscount Random Method CGA SPCGA Parameter study: k, diffusion speed λ, data size

Results Influence degree and time vs K

Results Influence degree and time vs diffusion speed λ the efficiency
of MixedGreedy drops quickly (almost exponentially) while CGA is a lot better.

Results Influence degree and time vs network size
the influence degree is relatively stable

Summary Handle large-scale networks (power-law distribution degree)
improve the efficiency of existing algorithms by an order of magnitude while the loss in approximation precision is small Can combine with any existing algorithm to find influential nodes w.r.t. communities

Related work on Top-K Algorithm
Typical Greedy Algorithm( Kempel et al. KDD2003) CELF Greedy Algorithm (Leskovec et al. KDD2007) An improved greedy algorithm (Kimura et al. AAAI2007) NewGreedy Algorithm, MixedGreedy, DegreeDiscount Algorithm (Chen et al. KDD2009) MIA algorithm (Chen et al. KDD2010) --None of them considers community property

Thank You !

Experiments Influence degree and time with different θ

Influential Model Based Community Detection Algorithm
Community Combination denotes the influence degree of node u outside the community Cm Rm({u})denotes the influence degree of node v in its community Cm We expect that the difference between the node’s influence degree in its community and its influence degree in the whole network is small. To achieve a good set of top-K influential nodes with a good influence degree in our algorithm, we define combination entropy to measure the connections of two communities and combine two communities if the combination entropy between them is larger than a threshold. L[Cm] includes its influenced neighbors such that they will make diffusion degree of v with regard to Cm different from diffusion degree of v with regard to the whole network. We set a threshold θ. If the combination entropy CoEntropy (CElm) of community Cm to community Cl is bigger than θ, then Cm and Cl will be combined.

Problem Statement Influence Degree
Given a mobile social network G = (V, E, W), we aim to mine a set of top-K influential nodes S on the network such that R(S) is maximized using the extended Independent Cascade information diffusion model.

Related work on Top-K Algorithm
DegreeDiscount Algorithm NewGreedy Algorithm CELF Greedy Algorithm Typical Greedy Algorithm Chen et al. KDD2009 No precision guarantee O(KlogN+M) Chen et al. KDD2009 (1-1/e)-approximation O(KRM) (1-1/e)-approximation 700 times faster Leskovec et al. KDD2007 Kempel et al. KDD2003 (1-1/e)-approximation O(KNRM) --Using IC information diffusion model --None of them consider community property

Outline Research Background Related Work
Preliminaries and Problem Statement Top-K Nodes Mining Algorithm Experiments

Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1

Similar presentations

Presentation on theme: "Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1

Similar presentations

Presentation on theme: "Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1"— Presentation transcript:

Similar presentations

About project

Feedback