Presentation is loading. Please wait.

Presentation is loading. Please wait.

Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

Similar presentations

Presentation on theme: "Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro"— Presentation transcript:

1 Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

2 Overview Deterministic Heuristics Methods
Matrix / Probabilistic Methods Supervised Learning Approaches © 2015 Bruno Ribeiro

3 Goal Input: Snapshot of social network at time t
Ouput: Predict edges that will be added to network at time t’ > t © 2015 Bruno Ribeiro

4 Product Recommendation
Example of link prediction application 2 i j 5 2 (c) 2015, Bruno Ribeiro

5 Twitter’s Who to Follow
Another example of link prediction application © 2015 Bruno Ribeiro

6 Other Applications A few more examples:
Fraud Detection: (Beutel, Akoglu, Faloutsos, KDD’15) Focuses on discovering surprising link patterns Anomaly detection: (Rattigan et al, 2005) Focuses on finding unlikely existing links © 2015 Bruno Ribeiro

7 Deterministic Heuristics
© 2015 Bruno Ribeiro

8 A Very Naïve Approach Every entity is assigned a score
Score(v) = how many friends person already has © 2015 Bruno Ribeiro

9 General Approach Assign connection weight score(u, v) to non-existing edge (u, v) Rank edges and recommend ones with highest score Can be viewed as computing a measure of proximity or “similarity” between nodes u and v. © 2015 Bruno Ribeiro

10 Shortest Path Negated length of shortest path between u and v. All nodes that share one neighbor will be linked Problem: Network diameter often very small © 2015 Bruno Ribeiro

11 Common Neighbors Common neighbors uses as score the number of common neighbors between vertices u and v. Score can be high when vertices have too many neighbors © 2015 Bruno Ribeiro

12 Jaccard Similarity Fixes common neighbors “too many neighbors” problem by dividing the intersection by the union Between 0 and 1 © 2015 Bruno Ribeiro

13 Adamic / Adar This score gives more weight to neighbors that that are not shared with many others. 5 neighbors 50 neighbors Too active Less evidence of missing link between Alice and Bob Less active More evidence of missing link between Bob and Carol Alice Bob Carol © 2015 Bruno Ribeiro

14 Preferential Attachment
The probability of co-authorship of u and v is proportional to the product of the number of collaborators of u and v © 2015 Bruno Ribeiro

15 Katz (1953) Exponentially weighted sum of number of paths of length l
For small α << 1 : predictions ≈ common neighbors © 2015 Bruno Ribeiro

16 Hitting Time where, Hu,v is the random walk hitting time between u and v © 2015 Bruno Ribeiro

17 Personalized PageRank
Stationary probability walker is at v under the following random walk: With probability α, jump back to u With probability 1 – α, go to random neighbor of current node © 2015 Bruno Ribeiro

18 SimRank N(u) and N(v) are number of in-degrees of nodes u and v
Only directed graphs score(u,v) ∊ [0,1]. If u=v then, s(u,v)=1 © 2015 Bruno Ribeiro

19 Latent Space Models © 2015 Bruno Ribeiro

20 Low Rank Reconstruction
Represent the adjacency matrix A with a lower rank matrix Ak. If Ak(u,v) has large value for a missing A(u,v)=0, then recommend link (u,v) Vn⨉k A Ak Uk⨉ n = © 2015 Bruno Ribeiro

21 Product Recommendations
Users have a propensity rate to buy certain category of product (W) In a category some products have a propensity rate to be bought (H) 2 i j 5 (c) 2015, Bruno Ribeiro 2 (Non-negative matrix factorization)

22 Example Euclidean Generative Model
Vertices uniformly distributed in latent unit hyperplane Vertices close in latent space more likely to be connected Probability of edge = f (distance on latent space) Unit plane © 2015 Bruno Ribeiro The problem of link prediction is to infer distances in the latent space, i.e., find the nearest neighbor in latent space not currently linked Raftery, A. E., Handcock, M. S., & Hoff, P. D. (2002). Latent space approaches to social network analysis. J. Amer. Stat. Assoc., 15, 460.

23 Bayesian Networks Directed Graphical Models
Bayesian networks and Prob. Relational Models (Getoor et al., 2001, Neville & Jensen 2003) Captures the dependence of link existence on attributes of entities Constrains probabilistic dependency to directed acyclic graph Undirected Graphical Models Markov Networks © 2015 Bruno Ribeiro

24 Example of Today’s Approaches
Whom To Follow and Why Credit: N. Barbieri, F. Bonchi, G. Manco KDD’14

25 Relational Markov Networks
A Relational Markov Network (RMN) specifies cliques and the potentials between attributes of related entities at the template level Single model provides distribution for entire graph Train model to maximize the probability of observed edge and labels Use trained model to predict edges and unknown attributes

26 Link Prediction Heuristic
Latent Space Rational Link Prediction Heuristic Most likely neighbor of node v ? Generative model Model parameters (latent space) θ Node u Node v © 2015 Bruno Ribeiro Observed edges

27 Relationship between Deterministic & Probabilistic methods?
© 2015 Bruno Ribeiro Yes = (Sarkar, Chakrabarti, Moore, COLT’10)

28 How do we justify these observations?
Empirically Especially if the graph is sparse How do we justify these observations? Link prediction accuracy* Random Shortest Path Common Neighbors Adamic/Adar Ensemble of short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007 Credit: Sarkar, Chakrabarti, Moore

29 Raftery’s Model (cont)
Two sources of randomness Point positions: uniform in D dimensional space Linkage probability: logistic with parameters α, r α, r and D are known Higher probability of linking α determines the steepness 1 radius r Credit: Sarkar, Chakrabarti, Moore

30 Raftery’s Model: Probability of New Edge
j i Logistic function integrated over volume determined by dij Credit: Sarkar, Chakrabarti, Moore

31 Connection to Common Neighbors
k j r Distinct radius per node gives Adamic/Adar Empirical Bernstein bounds (Maurer & Pontil, 2009) © 2015 Bruno Ribeiro Decreases with N Net size No. common neigh. i k j j

32 Supervised Learning Approaches
Create binary classifier that predicts whether an edge exists between two nodes © 2015 Bruno Ribeiro

33 Types of Features Classifier SVM Decision Trees
Label Features Keywords Frequency of an item Topological Features Shortest Distance Degree Centrality PageRank index SVM Decision Trees Deep Belief Network (DBN) K-Nearest Neighbors (KNN) Naive Bayes Radial Basis Function (RBF) Logistic Regression © 2015 Bruno Ribeiro Link Prediction Heuristic

Download ppt "Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro"

Similar presentations

Ads by Google