Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

 Which pair of nodes {i,j} should be connected?  Variant: node i is given Friend suggestion in Facebook Should Facebook suggest Alice to Bob as a future friend? Bob Alice

 Which pair of nodes {i,j} should be connected?  Variant: node i is given Alice Bob Charlie Movie recommendation in Netflix Should Netflix suggest this movie to Alice?

Paper #2 Paper #1 SVM margin maximum classification paper-has-word paper-cites-paper paper-has-word large scale Is paper #1 relevant to the query “SVM”? Relevance search in databases

Classifying Hand Written Digits Are these two digits the same? Zhu et al, 2003

 Link prediction problems rely on Homophily  similar nodes are more likely to be connected.  Use a graph-based proximity measure between the query node q and other nodes And now predict a link between q and the highest ranking node which is not already connected.

 Predict link between nodes With the minimum number of hops With max common neighbors (length 2 paths) 8 followers 1000 followers Prolific common friends  Less evidence Less prolific  Much more evidence Alice Bob Charlie The Adamic/Adar score gives more weight to low degree common neighbors.

 Predict link between nodes With the minimum number of hops With max common neighbors (length 2 paths) With larger Adamic/Adar With more short paths (e.g. length 3 paths ) …

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 How do we justify these observations? Especially if the graph is sparse

 Link prediction problems rely on Homophily  similar nodes are more likely to be connected.  Different heuristics are trying to predict this underlying or “latent” nearness of nodes.  Easier to encode this by using a latent-space model for generating links.

11 Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node.  Equivalent to inferring distances in the latent space Raftery et al.’s Model: Unit volume universe Points close in this space are more likely to be connected.

12 1 ½ Higher probability of linking Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters α, r α, r and D are known radius r α determines the steepness

13 Generative model Link Prediction Heuristics node a Most likely neighbor of node i ? node b Compare A few properties  Can justify the empirical observations  We also offer some new prediction algorithms

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Especially if the graph is sparse

 Pr 2 (i,j) = Pr(common neighbor|d ij ) Product of two logistic probabilities, integrated over a volume determined by d ij As α  ∞ Logistic  Step function Much easier to analyze! i j

16 Everyone has same radius r i j Empirical Bernstein Bounds on distance V(r)=volume of radius r in D dims η =Number of common neighbors Unit volume universe

 OPT = node closest to i  MAX = node with max common neighbors with i  Theorem: d OPT ≤ d MAX ≤ d OPT + 2[ ε/V(1)] 1/D ε = c 1 (var N /N) ½ + c 2 /(N -1 ) D=dimensionality w.h.p Common neighbors is an asymptotically optimal heuristic as N  ∞

 Node k has radius r k.  i  k if d ik ≤ r k (Directed graph)  r k captures popularity of node k 18 i k j Type 1: i  k  j riri rjrj A(r i, r j,d ij ) Type 2: i  k  j i k j rkrk rkrk A(r k, r k,d ij )

i j k η 1 ~ Bin[N 1, A(r 1, r 1, d ij )] η 2 ~ Bin[N 2, A(r 2, r 2, d ij )] Example graph:  N 1 nodes of radius r 1 and N 2 nodes of radius r 2  r 1 << r 2 Maximize Pr[ η 1, η 2 | d ij ] = product of two binomials w(r 1 ) E[ η 1 |d*] + w(r 2 ) E[ η 2 |d*] = w(r 1 ) η 1 + w(r 2 ) η 2 RHS ↑  LHS ↑  d* ↓

{ Variance Jacobian Small variance  Presence is more surprising r is close to max radius Small variance  Absence is more surprising Adamic/Adar 1/r Real world graphs generally fall in this range

Q r = Fraction of nodes with radius ≤ r which are common neighbors T R = Fraction of nodes with radius ≥ R which are common neighbors Number of common neighbors of a given radius Large Q r  small d ij Small T R  large d ij r

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Especially if the graph is sparse

 Common neighbors = 2 hop paths  Analysis of longer paths: two components 1. Bounding E( η l | d ij ). [η l = # l hop paths]  Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. η l ≈ E( η l | d ij ) Triangulation

 Common neighbors = 2 hop paths  Analysis of longer paths: two components 1. Bounding E( η l | d ij ) [η l = # l hop paths]  Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. η l ≈ E( η l | d ij ) Bounded dependence of η l on position of each node  Can use McDiarmid’s inequality to bound | η l - E( η l | d ij )|

 Bound d ij as a function of η l using McDiarmid’s inequality.  For l’ ≥ l we need η l’ >> η l to obtain similar bounds  Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

1 ½ Factor ¼ weak bound for Logistic Can be made tighter, as logistic approaches the step function.

 Three key ingredients 1. Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001 2. Triangle inequality holds  necessary to extend to l hop paths 3. Points are spread uniformly at random  Otherwise properties will depend on location as well as distance

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 The number of paths matters, not the length For large dense graphs, common neighbors are enough Differentiating between different degrees is important In sparse graphs, length 3 or more paths help in prediction.

 Combine bounds from different radii  But there might not be enough data to obtain individual bounds from each radius  New sweep estimator  Q r = Fraction of nodes w. radius ≤ r, which are common neighbors.  Higher Q r  smaller d ij w.h.p

 Q r = Fraction of nodes w. radius ≤ r, which are common neighbors larger Q r  smaller d ij w.h.p  T R : = Fraction of nodes w. radius ≥ R, which are common neighbors.  Smaller T R  large d ij w.h.p

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

Similar presentations

Presentation on theme: "Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

Similar presentations

Presentation on theme: "Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)"— Presentation transcript:

Similar presentations

About project

Feedback