Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Topic models Source: Topic models, David Blei, MLSS 09.
A Tutorial on Learning with Bayesian Networks
Finding Topic-sensitive Influential Twitterers Presenter 吴伟涛 TwitterRank:
One Theme in All Views: Modeling Consensus Topics in Multiple Contexts Jian Tang 1, Ming Zhang 1, Qiaozhu Mei 2 1 School of EECS, Peking University 2 School.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
An Analysis of Social Network-Based Sybil Defenses Sybil Defender
Segmentation and Fitting Using Probabilistic Methods
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
1 Yuxiao Dong *$, Jie Tang $, Sen Wu $, Jilei Tian # Nitesh V. Chawla *, Jinghai Rao #, Huanhuan Cao # Link Prediction and Recommendation across Multiple.
Generative Topic Models for Community Analysis
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Visual Recognition Tutorial
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Maryam Karimzadehgan (U. Illinois Urbana-Champaign)*, Ryen White (MSR), Matthew Richardson (MSR) Presented by Ryen White Microsoft Research * MSR Intern,
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Review of Lecture Two Linear Regression Normal Equation
Fenglong Ma1, Yaliang Li1, Qi Li1, Minghui Qiu2,
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
RESEARCH A systematic quest for undiscovered truth A way of thinking
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Popularity-Aware Topic Model for Social Graphs Junghoo “John” Cho UCLA.
Confidential. This presentation is provided for the recipient only and cannot be reproduced or shared without Fair Isaac Corporation's express consent.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Problem: 1) Show that is a set of sufficient statistics 2) Being location and scale parameters, take as (improper) prior and show that inferences on ……
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Recommend User to Group in Flickr Zhe Zhao
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Relation Strength-Aware Clustering of Heterogeneous Information Networks with Incomplete Attributes ∗ Source: VLDB.
Consensus Relevance with Topic and Worker Conditional Models Paul N. Bennett, Microsoft Research Joint with Ece Kamar, Microsoft Research Gabriella Kazai,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Unsupervised Streaming Feature Selection in Social Media
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois.
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Mining Utility Functions based on user ratings
Event Detection using Customer Care Calls
Multimodal Learning with Deep Boltzmann Machines
Collective Network Linkage across Heterogeneous Social Platforms
Latent Variables, Mixture Models and EM
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Community-based User Recommendation in Uni-Directional Social Networks
Daniela Stan Raicu School of CTI, DePaul University
Learning Probabilistic Graphical Models Overview Learning Problems.
Probabilistic Latent Preference Analysis
GANG: Detecting Fraudulent Users in OSNs
Daniela Stan Raicu School of CTI, DePaul University
GhostLink: Latent Network Inference for Influence-aware Recommendation
Presentation transcript:

Multiple Location Profiling for Users and Relationships from Social Network and Content Rui Li, Shengjie Wang, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign

Users’ Locations are important for many information services and many others. Lives in: Los Angeles 2 Carol User Social Network Content Provider Local Content Recommendation Local Friends Recommendation

Community has explored social network and content to profile users’ locations. Profiling a User’s Home Location Location: Los Angeles Tweets Terrible LA traffic! Want to go to Honolulu for Spring vacation! See Gaga in Hollywood. Good Morning! Mike LA Carol ? Lucy Austin Gaga NY Bob San Diego Jean ? Social Network 3

Problem 1 They only profile a single home location. Locations of a user’s friends Locational WordFrequencies Paramount1 Los Angeles1 Hollywood2 Austin2 Tweeted Locational Words Carol lives Los Angeles and studied at Uni. of Texas at Austin Uni. of Texas at Austin o incomplete o inaccurate 4

5 Problem 2 They totally miss profiling relationships. Relationships Profiling Carol follows Bob Carol follows Lucy Carol tweets Hollywood both Carol and Lucy studied at Austin Carol lives Los Angeles both Carol and Bob work at Los Angeles o useful !

We focus on multiple location profiling for users and relationships. Carol in Real-world Location: Los Angeles Education: Uni. of Texas at Austin Uni. of Texas at Austin Terrible LA traffic! Want to go to Honolulu for Spring vacation! See Gaga in Hollywood. Good Morning! Mike LA Carol ? Lucy Austin Gaga NY Bob San Diego Jean ? Carol’s Location Profile: Los Angeles, Austin Carol follows Lucy: Austin, Austin 6

Our approach is to build a model to connect known relationships with unknown locations. Known Relationships Following Relationships Carol follows Lucy Carol follows Mike …. Tweeting Relationships Carol tweets Hollywood Carol tweets Honolulu …. Users’ Locations ? Unknown Locations 7 MLP Model Generation Model Inference Algorithm

 Challenge 1 How to connect users’ locations with relationships? A.from users’ locations to following relationships B.from users’ locations to tweeting relationships  Challenge 2 How to model that the relationships are mixed? A.some relationships are not based on locations. B.each relationship is based on a different location.  Challenge 3 How to utilize home locations from labeled users? There are three challenges for building MLP. 8

Challenge 1.A We need to connect following relationships with two users’ locations. 9 Even a user has only one location follows others from different locations. Tweeting Probability Carol at Los Angeles follows Bob in San Diego. 20% Carol at Los Angeles follows Mike in Los Angeles. 30% … The following probability as the probability generating a following relationship from a user to another user based on their locations

10 Observation We explore following probability via investigating a corpus It captures our intuition well. It fits a power law distribution.

11 Solution: We derive location-based following model for following probability. The location-based following model

12 Challenge 1.B We need to connect tweeting relationships with a user’s location. User at a location tweets different locations. The tweeting probability as the probability generating a tweeting relationship from a user to a venue based on a location Probability of Tweeting Carol at Los Angeles tweets about watching a show in Hollywood. 30% Carol at Los Angeles tweets about traffic in Los Angeles. 40% …

They capture our intuition well. They can be modeled as a set of multinomial distributions. 13 Observation We explore tweeting probability via investigating a corpus.

14 Solution: We derive location-based tweeting model for tweeting probability. The location-based tweeting model

Noisy relationships are not useful! Noisy Relationships Carol follows Lady Gaga Carol tweets Honolulu Location-based Relationshipsb Carol follows Lucy Carol tweets Los Angeles 15 Challenge 2.A There are both noisy and location-based relationships.

16 Solution: We propose a mixture component for two types of relationships. 1.A relationship is generated based on either a location-based model or a random model. 2.A binary model selector μ indicates which model is used. 3.The selector is generated via a binomial distribution

17 Challenge 2.B Location-based relationships are related to multiple locations. Location-based relationships Carol follows Lucy Carol tweets Hollywood Accurate! Complete! both Carol and Lucy studied at Austin Carol lives Los Angeles

Solution: We fundamentally model users multiple locations in generating relationships. Carol {Los Angels 0.1, Austin 0.1, … } 18 Location profile as a multinomial distribution over locations. Each relationship is based on one particular location from his profile.

Challenge 3 We should utilize observed locations from some users’ profiles. Mike LA Carol ? Lucy Austin Gaga NY Bob San Diego Jean ?  they are useful for profiling locations!  we cannot use them directly to generate relationships! 19 20% users provide their home locations in their profiles.

Solution: We utilize observed locations from as priors to generate users’ profiles. Bob {San Diego 0.9, Los Angels 0.05, …} We assume users profiles are generated prior distributions. Home locations of users are likely to be generated.

Therefore, we arrive a complete model. 21

 We crawled a subset of Twitter.  There are 139K users, 50 million tweets and 2 million following relationships. We evaluate our model on a large Twitter corpus. 22

Task 1 profiling users’ home locations, MLP performs accurately and improves baselines. 23

Task 2 profiling users’ multiple locations, MLP proforms accurately and completely. Precision and Recall at Rank 2 Case Studies Locations in a similar region Locations in different areas Accurately Completely 24

Task 3 profiling following relationships, MLP achieves 57% accuracy. 25

26 Thanks and Questions !

27 Backup for Questions

28 Experiments 1 We use the home location provided in users’ profiles as ground truth. We compare two baseline methods proposed in literature.

29 Experiments 2 We manually labeled multiple locations of 1000 users, and obtained 585 users, who clearly have multiple locations. We compare the same baseline methods as in the previous task. We measure the performance in terms of “precision” and “recall”.

30 Experiments 3 We manually labeled location assignments of 585 users, whose multiple locations are known to us, and obtained 4426 relationships. We design a meaningful baseline method, which profile a relationship based users home locations.

 MLP defines the joint probability of observations, parameters, and latent variables.  We infer users’ locations and locations assignments with the observed relationships and the given parameters.  We develop our algorithm based on the Gibbs sampling method. We infer users’ locations and location assignments for relationships as latent variable in the joint probability. 31