Link Prediction Seminar Social Media Mining University UC3M

Slides:



Advertisements
Similar presentations
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Advertisements

Link Prediction in Social Networks
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Viral Marketing – Learning Influence Probabilities.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu Aobo Tao Chen 1.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Behavior Analytics Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Behavior Analytics Examples of Behavior Analytics What motivates.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Link Prediction.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Online Social Networks and Media Absorbing Random Walks Link Prediction.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
The Link Prediction Problem for Social Networks David Libel-Nowell, MIT John Klienberg, Cornell Saswat Mishra sxm
Predicting Positive and Negative Links in Online Social Networks
Feedback Effects between Similarity and Social Influence in Online Communities David Crandall, Dan Cosley, Daniel Huttenlocher, Jon Kleinberg, Siddharth.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Networks and Surrounding Contexts Chapter 4, from D. Easley and J. Kleinberg book.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Link Prediction.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Algorithmic Detection of Semantic Similarity WWW 2005.
LINK PREDICTION IN CO-AUTHORSHIP NETWORK Le Nhat Minh ( A N) Supervisor: Dongyuan Lu 1.
Slides are modified from Lada Adamic
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
Network Community Behavior to Infer Human Activities.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Link Prediction Class Data Mining Technology for Business and Society
Graph clustering to detect network modules
Lecture 23: Structure of Networks
Inferring Networks of Diffusion and Influence
Neighborhood - based Tag Prediction
Uncovering the Mystery of Trust in An Online Social Network
Greedy & Heuristic algorithms in Influence Maximization
Groups of vertices and Core-periphery structure
Minimum Spanning Tree 8/7/2018 4:26 AM
Link-Based Ranking Seminar Social Media Mining University UC3M
Link Prediction on Hacker Networks
A Theoretical Justification of Link Prediction Heuristics
Community detection in graphs
Lecture 23: Structure of Networks
Using Friendship Ties and Family Circles for Link Prediction
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
A Theoretical Justification of Link Prediction Heuristics
Peer-to-Peer and Social Networks
The Watts-Strogatz model
Lecture 13 Network evolution
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Peer-to-Peer and Social Networks Fall 2017
Introduction Wireless Ad-Hoc Network
Jiawei Han Department of Computer Science
Lecture 23: Structure of Networks
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Link Prediction Seminar Social Media Mining University UC3M Date May 2017 Lecturer Carlos Castillo http://chato.cl/ Sources: Chapter 10 of Zafarani, Abbasi, and Liu's book on Social Media Mining [slides] Sarkar, Chakrabarti, Moore: [slides] [slides]

Problem definition Given a graph G=(V,E) at time t Or a series of snapshots of the graph at times ti<=t Describe the state of the graph at time t'>t Sometimes, assume V stays the same and E changes

Applications Accelerating formation of connections in professional social networks

Applications Helping find your offline friends online

Applications Increase server efficiency through pre-fetching Determining which links are missing in Wikipedia pages (or other educational resources) Monitor/control propagation of computer viruses Fixing corrupted data You bought five books, one of the titles is lost, can we infer it? ...

Basic method Assign a score to every possible link (u,v) Sort links by descending score Predict the top-k links Or the links with scores above a threshold Profit!

Common neighbors Newman 2001: The probability of scientists collaborating increases with the number of other collaborators they have in common. Tendency to close triangles, more on this later ...

Jaccard similarity Correct common neighbors by reducing the influence of nodes with many neighbors

Adamic/Adar Count common neighbors but weight down nodes with too many neighbors The idea is to avoid this

Preferential attachment “Rich-get-richer” Newman 2001: the probability of two authors collaborating is proportional to the product of their number of collaborators

Example: score(v5,v7) Exercise, compute: Number of common neighbors Jaccard coefficient Adamic and Adar's Preferential attachment Common: 1 Jaccard: 1 / 2 = 0.5 Adamic and Adar: 1/log(3) Preferential attachment: 1 x 2 = 2

Geodesic/shortest path distance Assumption: social connections are formed by following edges, then finding a new person, then connecting directly score(u,v) := -(length of shortest path from u to v) Limit case: triadic closure

Katz 1953 or “rooted PageRank” Score based on weighted counts of paths, with exponential decay on path length. For α < 1 A small α yields predictions which are similar to common neighbors

More on random walks Hitting time Hu,v = expected steps of random walk from u to v To reduce the influence of well-connected nodes, we can multiply by the probability of a node in stationary state

Symmetric hitting time (commute time) Hitting time is not symmetric, we can symmetrize easily

SimRank [Jeh 2002] For directed graphs; follows inlinks Exercise, compute: simrank(u,v) simrank(v,w) simrank(u,w) u p v q s w r

Meta-method / prunning Compute score(u,v) for all existing edges assuming they do not exist Remove k% with lower score Compute score(u,v) in the reduced graph

Evaluating link prediction methods After one of the aforementioned measures is selected, a list of the top most similar pairs of nodes are selected. These pairs of nodes denote edges predicted to be the most likely to soon appear in the network. Performance (precision, recall, or accuracy) can be evaluated using the testing graph and by comparing the number of the testing graph’s edges that the link prediction algorithm successfully reveals. Performance is usually very low, since many edges are created due to reasons not solely available in a social network graph. So, a common baseline is to compare the performance with random edge predictors and report the factor improvements over random prediction.

Performance comparison [Liben-Nowell et al. 2003] Notes: Effectiveness in general is very low (challenging problem)

Adamic/Adar + content

Supervised learning [Hassan et al. 2006] Input features are all attributes, possibly including node-links as attribute Predict connected/not-connected learning on a sub-set of the data

Example experimental results with supervised learning Data: co-authorship network in DBLP and BIOBASE Split into two disjoint ranges of publication years (Ra, Rb) Example: DBLP, Ra = [1999,2000] Rb=[2001,2004] Training item is a pair of authors (u,v), both with a paper in Ra, and all their attributes computed in Ra Ground truth is whether (u,v) co-author during Rb Positive=yes, Negative=no

Example features Content similarity Keywords in common, conferences in common, ... Aggregation features Sum of papers, Sum of neighbors, ... Topological distance Shortest-path length, ...

Performance results under various learning schemes (same feature set)

Community prediction

Community membership prediction Why do users join communities? What factors affect the community-joining behavior of individuals? We can observe users who join communities and determine the factors that are common among them We require a population of users, a community C, and community membership information (i.e., users who are members of C). To distinguish between users who have already joined the community and those who are now joining it, we need community memberships at two different times t1 and t2, with t2 > t1. At t2, we determine users such as u who are currently members of the community, but were not members at t1. These new users form the subpopulation that is analyzed for community-joining behavior.

Peer influence Hypothesis: individuals are inclined toward an activity when their friends are engaged in the same activity. A factor that plays a role in users joining a community is the number of their friends who are already members of the community.

Supervised learning

Example regression tree

Beyond community membership Communities can be implicit: One can think of individuals buying a product as a community, and people buying the product for the first time as individuals joining the community Collective Behavior: A group of individuals behaving in a similar way (first defined by sociologist Robert Park) It can be planned and coordinated, but often is spontaneous and unplanned Examples: Individuals standing in line for a new product release Posting messages online to support a cause or to show support for an individual Approach can be similar to community membership prediction