# Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1.

## Presentation on theme: "Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1."— Presentation transcript:

Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1

Link Prediction  Who is most likely to be interact with a given node? Friend suggestion in Facebook Should Facebook suggest Alice as a friend for Bob? Bob Alice 2

Link Prediction Alice Bob Charlie Movie recommendation in Netflix Should Netflix suggest this movie to Alice? 3

Link Prediction Prediction using simple features  degree of a node  number of common neighbors  last time a link appeared What if the graph is dynamic? 4

Related Work Generative models  Exp. family random graph models [Hanneke+/’06]  Dynamics in latent space [Sarkar+/’05]  Extension of mixed membership block models [Fu+/10] Other approaches  Autoregressive models for links [Huang+/09]  Extensions of static features [Tylenda+/09] 5

Goal Link Prediction  incorporating graph dynamics,  requiring weak modeling assumptions,  allowing fast predictions,  and offering consistency guarantees. 6

Outline Model Estimator Consistency Scalability Experiments 7

The Link Prediction Problem in Dynamic Graphs G1G1 G2G2 G T+1 …… Y 1 (i,j)=1 Y 2 (i,j)=0 Y T+1 (i,j)=? Y T+1 (i,j) | G 1,G 2, …,G T ~ Bernoulli (g G1,G2,…GT (i,j)) Edge in T+1 Features of previous graphs and this pair of nodes 8

cn ℓℓ deg Including graph-based features Example set of features for pair (i,j):  cn(i,j) (common neighbors)  ℓℓ(i,j) (last time a link was formed)  deg(j) Represent dynamics using “ datacubes ” of these features.  ≈ multi-dimensional histogram on binned feature values η t = #pairs in G t with these features 1 ≤ cn ≤ 3 3 ≤ deg ≤ 6 1 ≤ ℓℓ ≤ 2 η t + = #pairs in G t with these features, which had an edge in G t+1 high η t + /η t  this feature combination is more likely to create a new edge at time t+1 9

G1G1 G2G2 GTGT …… Y 1 (i,j)=1 Y 2 (i,j)=0 Y T+1 (i,j)=? 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 Including graph-based features How do we form these datacubes? Vanilla idea: One datacube for G t →G t+1 aggregated over all pairs (i,j)  Does not allow for differently evolving communities 10

Y T+1 (i,j)=? 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 Our Model How do we form these datacubes? Our Model: One datacube for each neighborhood  Captures local evolution G1G1 G2G2 GTGT …… Y 1 (i,j)=1 Y 2 (i,j)=0 11

Our Model Number of node pairs - with feature s - in the neighborhood of i - at time t Number of node pairs - with feature s - in the neighborhood of i - at time t - which got connected at time t+1 Datacube 1 ≤ cn(i,j) ≤ 3 3 ≤ deg(i,j) ≤ 6 1 ≤ ℓℓ (i,j) ≤ 2 Neighborhood N t (i)= nodes within 2 hops Features extracted from (N t-p,…N t ) 12

Our Model Datacube d t (i) captures graph evolution  in the local neighborhood of a node  in the recent past Model: What is g(.)? Y T+1 (i,j) | G 1,G 2, …,G T ~ Bernoulli ( g G1,G2,…GT (i,j)) g(d t (i), s t (i,j) ) Features of the pair Local evolution patterns 13

Outline Model Estimator Consistency Scalability Experiments 14

Kernel Estimator for g G1G1 G 2 …… GTGT G T-1 G T-2 query data-cube at T-1 and feature vector at time T compute similarities datacube, feature pair t=1 { { { { { { { { … datacube, feature pair t=2 { { { { { { { { … datacube, feature pair t=3 { { { { { { { { … { { 15

Factorize the similarity function  Allows computation of g(.) via simple lookups } } } K(, )I{ == } Kernel Estimator for g 16

Kernel Estimator for g G1G1 G 2 …… GTGT G T-1 G T-2 datacubes t=1 datacubes t=2 datacubes t=3 compute similarities only between data cubes w1w1 w2w2 w3w3 w4w4 η 1, η 1 + η 2, η 2 + η 3, η 3 + η 4, η 4 + 17

Factorize the similarity function  Allows computation of g(.) via simple lookups  What is K(, )? } } } K(, )I{ == } Kernel Estimator for g 18

Similarity between two datacubes Idea 1 For each cell s, take (η 1 + /η 1 – η 2 + /η 2 ) 2 and sum Problem:  Magnitude of η is ignored  5/10 and 50/100 are treated equally Consider the distribution η 1, η 1 + η 2, η 2 + 19

Similarity between two datacubes 0 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/13/4047383/slides/slide_20.jpg", "name": "Similarity between two datacubes 0

Want to show: Kernel Estimator for g 21

Outline Model Estimator Consistency Scalability Experiments 22

Consistency of Estimator Lemma 1: As T→∞, for some R>0, Proof using: As T→∞, 23

Consistency of Estimator Lemma 2: As T→∞, 24

Consistency of Estimator Assumption: finite graph Proof sketch:  Dynamics are Markovian with finite state space the chain must eventually enter a closed, irreducible communication class geometric ergodicity if class is aperiodic (if not, more complicated…) strong mixing with exponential decay variances decay as o(1/T) 25

Consistency of Estimator Theorem: Proof Sketch:  for some R>0  So 26

Outline Model Estimator Consistency Scalability Experiments 27

Scalability Full solution:  Summing over all n datacubes for all T timesteps  Infeasible Approximate solution:  Sum over nearest neighbors of query datacube How do we find nearest neighbors?  Locality Sensitive Hashing (LSH) [Indyk+/98, Broder+/98] 28

Using LSH Devise a hashing function for datacubes such that  “Similar” datacubes tend to be hashed to the same bucket  “Similar” = small total variation distance between cells of datacubes 29

Using LSH Step 1: Map datacubes to bit vectors Use B 2 bits for each bucket For probability mass p the first bits are set to 1 Use B 1 buckets to discretize [0,1] Total M*B1*B2 bits, where M = max number of occupied cells << total number of cells 30

Using LSH 31

Fast Search Using LSH 1111111111000000000111111111000 10000101000011100001101010000 10101010000011100001101010000 101010101110111111011010111110 1111111111000000000111111111001 0000 0001 1111 0011........ 1011 32

Outline Model Estimator Consistency Scalability Experiments 33

Experiments 34

Setup G1G1 G2G2 GTGT Training data Test data G T+1 35

Simulations Social network model of Hoff et al.  Each node has an independently drawn feature vector  Edge(i,j) depends on features of i and j  Seasonality effect Feature importance varies with season different communities in each season  Feature vectors evolve smoothly over time evolving community structures 36

Simulations NonParam is much better than others in the presence of seasonality CN, AA, and Katz implicitly assume smooth evolution 37

Sensor Network * * www.select.cs.cmu.edu/data 38

Summary Link formation is assumed to depend on  the neighborhood’s evolution  over a time window Admits a kernel-based estimator  Consistency  Scalability via LSH Works particularly well for  Seasonal effects  differently evolving communities 39

Similar presentations