Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.

Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan 1

Networks with unobserved links Links help to model how information diffuses from one node to another Real-world agents/nodes have connections unobserved by third parties 2

Problem Overview Given: a network (with missing links) and snapshots of the network states over time. Objective: model information diffusions on networks We examine two different approaches: 1.Learning the underlying network, upon which a diffusion model is built (similar to some previous work’s approach) 2.Building a flexible model without learning the missing links 3

Problem Overview (cont.) Formalism A node/agent is in state s t = 1 if infected, and -1 otherwise, at time t (infection persists) A diffusion instance/trace s records snapshots of the network’s states over time Underlying network G* Input network G (G* with missing edges) N i is the neighborhood of i in G (including i itself) Underlying diffusion process: cascade The probability of infection is proportional with the number of infected neighbors The model’s parameters determine: (a) the diffusion rate and (b) the spontaneous infection rate. 4

Problem Summary 5 1. Network G 2. A set of diffusion traces s. (training) 1. Structure learning approach Learn network G’ Learn parameters for a cascade model built on G’ 2. Graphical model approach Learn parameters for a graphical multiagent model built on G Evaluation on testing sets of diffusion traces Capturing diffusion dynamics: log likelihood of diffusion traces L(s)  Objective function

Approach 1: Learning Missing Links MaxInf algorithm (maxC) Assumption: nodes can be infected by multiple neighbors, as in the cascade model Objective function: likelihood of traces L(s) Outline: –greedily adding edges –learning model parameters after each addition that increase the objective function the most –Repeat until the objective function starts to decrease Related work: NetInf [Gomez-Rodriguez et al. ’10]. Adopted version NetInf’ (netC) 6

Approach 2: History-Dependent Graphical Multiagent Model hGMM [Duong, Wellman, Singh, and Vorobeychik AAMAS’10] Directed edges from node N i d to i: how neighbors’ past states affect i’s present state. Undirected edges define N i u : correlations/interdependencies among nodes the same time t. (*) Cascade and many others assume conditional independence given history (N i u contains i itself only) (**) For simplicity, we assume N i = N i d = N i u 7

Approach 2: hGMM (cont.) Joint probability distribution of system’s states at time t potential of neighborhood’s joint states at t neighborhood-relevant abstracted history abstracted history Each neighborhood is associated with a potential function π i that represents the unnormalized likelihood of the joint states s Ni 8

Approach 2: hGMM (cont.) hGMMs allow reasoning about state correlations between neighbors who appear disconnected in the input graphical structure Example: hGMMs could use the potential function of node 2 to express correlations between nodes 1 and 3 to compensate for the missing edge (1, 3). 9 13 24 13 24 13 24

Approach 2: hGMM (cont.) A. Tabular hGMM (taG): potential π i of each neighborhood is a function of 5 features: –number of agents infected at t-1, –number of agents becoming infected at t, –neighborhood size, –i’s state at t (present) –i’s state at t-1 (past) 10

Approach 2: hGMM (cont.) B. Parametric hGMM (paG): based on the cascade model and our empirical study of taG, π i is the product of three components: (Recal π i represents the unnormalized likelihood of the joint states s Ni ) –[1] probability of node i’s infection as in the cascade model –[2] joint probability of c nodes in N’ i =N i \{i} becoming infected –[3] joint probability of (|N’ i | - c) nodes staying uninfected 11

Approach 2: hGMM (cont.) Component [2]: joint probability of c nodes in N’ i =N i \{i} becoming infected if assuming independence of c agent states in N’ i, component [2] is simply a product of infection probability of c nodes. If capturing the correlation among infections: component [2] is a product of infection of |c-γ|N’ i || “nodes,” where γ captures state correlations/interdependence 12

Empirical Study Generate graphs G* (random ER and preferential attachment PA) of 30 and 100 nodes Randomly delete 1/2 edges in creating G Generate cascades with the parameters learned from empirical data by Stonedahl et al. (’10); –2 domains: fast and normal –Generative model (on fully observed graphs): C on G* Vary training data amount (25 and 100 cascades): –paG (parametric hGMM on the given graph G): learn parameters –maxC (cascade model with G’ learned by MaxInf): learn parameters + connections –netC (cascade model with G’ learned by NetInf’): learn connections (given the generative model’s parameters) 13

Evaluation Metrics 1.Capturing diffusion dynamics: log likelihood of diffusion traces  Objective function 2.Predicting the fraction of infected nodes: KL (skewed) divergence between the predicted and actual distributions of fractions of infected nodes 3.Structural difference between the learned and actual graphs (only applicable for the structure learning approach) 14

Detailed Prediction Results Detailed prediction ER- 30(25 ) ER- 30(10 0) ER-30- fast (25) ER-30- fast (100) ER-100 (25) PA- 30(25 ) PA- 30(100 ) maxC vs C paG vs C paG vs maxC 15 Model 1 vs. Model 2: Black: 1 outperforms 2 (p < 0.05) White: 2 outperforms 1 (p < 0.05) Grey: otherwise Summary: With sufficient data, paG is the best model. In some fast diffusion cases, maxC outperforms paG. C is the best model when the graph is fully observed LegendpaG: parametric hGMM on G maxC: cascade model with G’ learned by MaxInf C: generative cascade model on G

Aggregate Prediction Results Aggregate prediction ER- 30(25 ) ER- 30(10 0) ER-30- fast (25) ER-30- fast (100) ER-100 (25) PA- 30(25 ) PA- 30(100 ) maxC vs C paG vs C paG vs maxC 16 KL divergence: better performing models have lower divergence

Graph Results 17 NetInf’ discovers more missing edges than MaxLInf, but adds more spurious edges than MaxLInf. paG’s learned parameters help to detect if the given network has missing edges

Conclusions Contributions We introduce two solutions: learning an hGMM on the given network structure, and directly discovering the missing connections. Our approaches can improve prediction over existing methods in various settings with a considerable number of missing edges. Future work Improve scalability (treating undirected and directed edges differently) Develop more systematic analysis to detect if there’re missing edges More effective interleaving between learning graph and model parameters 18

THANK YOU! qduong@umich.edu http://eecs.umich.edu/~qduong 19

Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.

Similar presentations

Presentation on theme: "Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.

Similar presentations

Presentation on theme: "Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University."— Presentation transcript:

Similar presentations

About project

Feedback