Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

Slides:

Advertisements

Similar presentations

Routing in Poisson small-world networks A. J. Ganesh Microsoft Research, Cambridge Joint work with Moez Draief.

Advertisements

05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.

Lindsey Bleimes Charlie Garrod Adam Meyerson

The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.

1 Analyzing Kleinberg’s Small-world Model Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.

Small-world networks.

SMALL WORLD NETWORKS Made and Presented By : Harshit Bhatt

Analysis and Modeling of Social Networks Foudalis Ilias.

Nonparametric Link Prediction in Dynamic Graphs Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Facebook) Michael Jordan (UC Berkeley) 1.

Maximizing the Spread of Influence through a Social Network

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Information Networks Small World Networks Lecture 5.

Advanced Topics in Data Mining Special focus: Social Networks.

Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.

Lecture 7 CS 728 Searchable Networks. Errata: Differences between Copying and Preferential Attachment In generative model: let p k be fraction of nodes.

CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.

Company LOGO 1 Identity and Search in Social Networks D.J.Watts, P.S. Dodds, M.E.J. Newman Maryam Fazel-Zarandi.

CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.

1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.

CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.

Purnamrita Sarkar (UC Berkeley) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.) 1.

Advanced Topics in Data Mining Special focus: Social Networks.

1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.

1 Analyzing Kleinberg’s (and other) Small-world Models Chip Martel and Van Nguyen Computer Science Department; University of California at Davis.

Greedy Routing with Bounded Stretch Roland Flury, Roger Wattenhofer (ETH Zurich), Sriram Pemmaraju (Iowa University) Published at IEEE Infocom 2009 Introduction.

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

1 Challenges in Computational Advertising Deepayan Chakrabarti

1 1 1-to-Many Distribution Vehicle Routing John H. Vande Vate Spring, 2005.

Small-world networks. What is it? Everyone talks about the small world phenomenon, but truly what is it? There are three landmark papers: Stanley Milgram.

Graph limit theory: Algorithms László Lovász Eötvös Loránd University, Budapest May

July The Mathematical Challenge of Large Networks László Lovász Eötvös Loránd University, Budapest

Using Graph Theory to Study Neural Networks (Watrous, Tandon, Conner, Pieters & Ekstrom, 2012)

Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.

October Large networks: a new language for science László Lovász Eötvös Loránd University, Budapest

Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)

On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.

LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu 1.

Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.

1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.

March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.

Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.

Random Geometric Graph Model Model for ad hoc/sensor networks n nodes placed in d-dim space Connectivity threshold r Two nodes u,v connected iff ||u-v||

Introduction Wireless Ad-Hoc Network  Set of transceivers communicating by radio.

Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.

1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Algorithms for Radio Networks Winter Term 2005/2006.

Purnamrita Sarkar (Carnegie Mellon) Andrew W. Moore (Google, Inc.)

Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.

Link Prediction Class Data Mining Technology for Business and Society

Lecture 1: Complex Networks

Topics In Social Computing (67810)

Link Prediction Seminar Social Media Mining University UC3M

Peer-to-Peer and Social Networks

A Theoretical Justification of Link Prediction Heuristics

Greedy Routing with Bounded Stretch

Network Science: A Short Introduction i3 Workshop

Plum Pudding Models for Growing Small-World Networks

A Theoretical Justification of Link Prediction Heuristics

The Watts-Strogatz model

Theoretical Justification of Popular Link Prediction Heuristics

Nonparametric Link Prediction in Dynamic Graphs

Peer-to-Peer and Social Networks Fall 2017

Department of Computer Science University of York

Scaling up Link Prediction with Ensembles

Topological Signatures For Fast Mobility Analysis

Vehicle Routing John H. Vande Vate Fall,

Presentation transcript:

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)

 Which pair of nodes {i,j} should be connected?  Variant: node i is given Friend suggestion in Facebook Alice Bob Charlie Movie recommendation in Netflix

 Predict link between nodes With the minimum number of hops With max common neighbors (length 2 paths) 8 followers 1000 followers Prolific common friends  Less evidence Less prolific  Much more evidence Alice Bob Charlie The Adamic/Adar score gives more weight to low degree common neighbors.

 Predict link between nodes With the minimum number of hops With more common neighbors (length 2 paths) With larger Adamic/Adar With more short paths (e.g. length 3 paths ) …

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 How do we justify these observations? Especially if the graph is sparse

6 Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node.  Equivalent to inferring distances in the latent space Raftery et al.’s Model: Unit volume universe Points close in this space are more likely to be connected.

7 1 ½ Higher probability of linking Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters α, r α, r and D are known radius r α determines the steepness

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Especially if the graph is sparse

 Pr 2 (i,j) = Pr(common neighbor|d ij ) Product of two logistic probabilities, integrated over a volume determined by d ij As α  ∞ Logistic  Step function Much easier to analyze! i j

10 Everyone has same radius r i j Empirical Bernstein Bounds on distance V(r)=volume of radius r in D dims η =Number of common neighbors Unit volume universe

 OPT = node closest to i  MAX = node with max common neighbors with i  Theorem: d OPT ≤ d MAX ≤ d OPT + 2[ ε/V(1)] 1/D ε = c 1 (var N /N) ½ + c 2 /(N -1 ) D=dimensionality w.h.p Common neighbors is an asymptotically optimal heuristic as N  ∞

 Node k has radius r k.  i  k if d ik ≤ r k (Directed graph)  r k captures popularity of node k 12 i k j Type 1: i  k  j riri rjrj A(r i, r j,d ij ) Type 2: i  k  j i k j rkrk rkrk A(r k, r k,d ij )

i j k η 1 ~ Bin[N 1, A(r 1, r 1, d ij )] η 2 ~ Bin[N 2, A(r 2, r 2, d ij )] Example graph:  N 1 nodes of radius r 1 and N 2 nodes of radius r 2  r 1 << r 2 Maximize Pr[ η 1, η 2 | d ij ] = product of two binomials w(r 1 ) E[ η 1 |d*] + w(r 2 ) E[ η 2 |d*] = w(r 1 ) η 1 + w(r 2 ) η 2 RHS ↑  LHS ↑  d* ↓

{ Variance Jacobian Small variance  Presence is more surprising r is close to max radius Small variance  Absence is more surprising Adamic/Adar 1/r Real world graphs generally fall in this range

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Especially if the graph is sparse

 Common neighbors = 2 hop paths  Analysis of longer paths: two components 1. Bounding E( η l | d ij ). [η l = # l hop paths]  Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. η l ≈ E( η l | d ij ) Triangulation

 Common neighbors = 2 hop paths  Analysis of longer paths: two components 1. Bounding E( η l | d ij ) [η l = # l hop paths]  Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. η l ≈ E( η l | d ij ) Bounded dependence of η l on position of each node  Can use McDiarmid’s inequality to bound | η l - E( η l | d ij )|

 Bound d ij as a function of η l using McDiarmid’s inequality.  For l’ ≥ l we need η l’ >> η l to obtain similar bounds  Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

1 ½ Factor ¼ weak bound for Logistic Can be made tighter, as logistic approaches the step function.

 Three key ingredients 1. Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg Triangle inequality holds  necessary to extend to l hop paths 3. Points are spread uniformly at random  Otherwise properties will depend on location as well as distance

RandomShortest Path Common Neighbors Adamic/AdarEnsemble of short paths Link prediction accuracy* *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 The number of paths matters, not the length For large dense graphs, common neighbors are enough Differentiating between different degrees is important In sparse graphs, length 3 or more paths help in prediction.

23 Generative model Link Prediction Heuristics node a Most likely neighbor of node i ? node b Compare A few properties  Can justify the empirical observations  We also offer some new prediction algorithms

 Combine bounds from different radii  But there might not be enough data to obtain individual bounds from each radius  New sweep estimator  Q r = Fraction of nodes w. radius ≤ r, which are common neighbors.  Higher Q r  smaller d ij w.h.p

 Q r = Fraction of nodes w. radius ≤ r, which are common neighbors larger Q r  smaller d ij w.h.p  T R : = Fraction of nodes w. radius ≥ R, which are common neighbors.  Smaller T R  large d ij w.h.p

Q r = Fraction of nodes with radius ≤ r which are common neighbors T R = Fraction of nodes with radius ≥ R which are common neighbors Number of common neighbors of a given radius Large Q r  small d ij Small T R  large d ij r