Collective Network Linkage across Heterogeneous Social Platforms

Slides:



Advertisements
Similar presentations
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Advertisements

Suleyman Cetintas 1, Monica Rogati 2, Luo Si 1, Yi Fang 1 Identifying Similar People in Professional Social Networks with Discriminative Probabilistic.
Random Forest Predrag Radenković 3237/10
Large-Scale Entity-Based Online Social Network Profile Linkage.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Ming Yan, Jitao Sang, Tao Mei, ChangSheng Xu
Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.
Overview Full Bayesian Learning MAP learning
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Parametric Inference.
Visual Recognition Tutorial
Exploration of Ground Truth from Raw GPS Data National University of Defense Technology & Hong Kong University of Science and Technology Exploration of.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Fenglong Ma1, Yaliang Li1, Qi Li1, Minghui Qiu2,
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Webpage Understanding: an Integrated Approach
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,
Anomaly detection with Bayesian networks Website: John Sandiford.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Data Mining and Machine Learning Lab Network Denoising in Social Media Huiji Gao, Xufei Wang, Jiliang Tang, and Huan Liu Data Mining and Machine Learning.
Probabilistic Question Recommendation for Question Answering Communities Mingcheng Qu, Guang Qiu, Xiaofei He, Cheng Zhang, Hao Wu, Jiajun Bu, Chun Chen.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Cache-Conscious Performance Optimization for Similarity Search Maha Alabduljalil, Xun Tang, Tao Yang Department of Computer Science University of California.
Features-based Object Recognition P. Moreels, P. Perona California Institute of Technology.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CONCEPTS AND TECHNIQUES FOR RECORD LINKAGE, ENTITY RESOLUTION, AND DUPLICATE DETECTION BY PETER CHRISTEN PRESENTED BY JOSEPH PARK Data Matching.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Xutao Li1, Gao Cong1, Xiao-Li Li2
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wen-Ning Kuo and Vincent S. Tseng Institute of Computer Science.
Head Tracking Using Video Analytics Xuan Wang 1, Yuhen Hu 1, Robert G. Radwin 2, John D. Lee 2 University of Wisconsin – Madison 1 Dept. Electrical and.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Warren Shen, Xin Li, AnHai Doan Database & AI Groups University of Illinois, Urbana Constraint-Based Entity Matching.
Synchronization for Multi-Perspective Videos in the Wild
QianZhu, Liang Chen and Gagan Agrawal
Personalizing Search on Shared Devices
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Presentation 王睿.
Probabilistic Models with Latent Variables
Overview of Machine Learning
iSRD Spam Review Detection with Imbalanced Data Distributions
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Learning Probabilistic Graphical Models Overview Learning Problems.
Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz
Topological Signatures For Fast Mobility Analysis
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
GhostLink: Latent Network Inference for Influence-aware Recommendation
Modeling Topic Diffusion in Scientific Collaboration Networks
Presentation transcript:

Collective Network Linkage across Heterogeneous Social Platforms International Conference on Data Mining, Atlantic City, NJ, USA Ming Gao Institute for Data Science and Engineering East China Normal University, Shanghai, China

Our Co-authors Ee-Peng Lim David Lo Feida Zhu (Singapore Management University) David Lo Feida Zhu Philips Kokoh Prasetyo (Singapore Management University) Aoying Zhou (East China Normal University)

Roadmap Related Work Solution Empirical Study Conclusions Background

Background Social media sites have become extremely popular in recent years People maintain their accounts and social connections with different social media sites simultaneously Major applications of network linkage: Across different social networks Profile user Understand user behavior Recommend users or products across networks … On a single social network Detect duplicates from the single network

Related Work Network linkage across different social networks There is no unsupervised approach which is not domain-specific and can also handle missing and incomplete data. These are the focuses of this work.

Network Linkage VS. Record Linkage Object Social user Relational data Attribute Heterogeneous Simple Implicit and explicit Explicit Unfixed Fixed Missing data Many Few Relationship True False Challenges Heterogeneous attributes Noise or missing data in user attributes Social connection across heterogeneous networks Many pairs for consideration

Formulation Two networks: A and B The set of candidate pairs: R M: matched pairs, U: unmatched pairs Comparison vector, denoted , represents a set of similarity functions between observed attributes and . Our task is to determine M

Overview of Solution Collective network linkage approach (CNL) Is an unsupervised and probabilistic approach Integrates heterogeneous attributes Handles missing data Evaluates social similarity in a collective manner Can scale-up to large networks using LSH Solution Given a pair of users, denoted ,has similarity vector Assign a label to in terms of score of the pair which can be computed as

Empirical Study Task Datasets Our approach VS. baselines Self-linking for users from Twitter Linking users across Foursquare and Twitter Datasets Twitter TWN(x%): size N, noise x% Foursquare Ground-truth: 3,534 matched pairs Our approach VS. baselines CNL._.: CNLF-E, CNLnonN-E, CNLnonN-G, CNLLF-E NL ._.: NLnonN-E, NLnonN-G , NLF-E Mobius

Matching Score for Self-linking on TW1109(0%) NLnonN-E NLnonN-G CNLnonN-G Distribution assignment is very important The scores from CNL with correct distribution assignment are more distinguishable than these of NL CNLnonN-E

Compare with Mobius on TWN(10%) Precision CNLF-E outperforms Mobius significantly

Scalability Test for Self-linking on TWN(10%) Candidate pairs The elapsed time (Sec.) Less than 1% CNLLF-E can scale-up to large networks Precision Recall

Linking Heterogeneous Large Networks Precision Recall

DEMO: Linky http://research.larc.smu.edu.sg/linky/ Linky: Linking networks for unity Two networks Foursquare Twitter Four Features Username Social structure Temporal features Content features http://research.larc.smu.edu.sg/linky/

Conclusions Network linkage across heterogeneous social networks A unified and unsupervised approach Integrate heterogeneous user attributes and social connection Handle missing data Scale-up to large social networks Future work Distributed solution to improve the scalability Multiple networks, rather than two networks

Thank You for Your Attention

Integrates the Heterogeneous Attributes Attribute similarities can be discrete and continuous values Exponential family is a set of PDFs or PMFs Attribute similarities draw from a distribution in exponential family Log-likelihood Parameters for mat. & unmat. groups 2-dim. latent vector may be the missing values Pr(r \in M)

Handles the Missing Values Performs in an unsupervised manner and handles missing data It employs EM algorithm to estimate the parameters In the E-step, it replaces latent variables and missing values to their expectations

Scale-up to Large Networks Speed up the computation via using LSH LSH on usernames can be utilized to block users It also reduces the computation of social similarity

Links Network in a Collective Manner CNL works in a collective manner and consists of two stages: In the first iteration, only non-social attributes are used to link users Based on the result in the first iteration, then it can link users via integrating social similarity CNL is terminated if the convergence condition is held