Download presentation
Presentation is loading. Please wait.
Published byJames Christ Modified over 10 years ago
1
Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research, Bangalore *Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012
2
Social Media Analysis: Motivation Microblogs: Twitter, Facebook, MySpace Understanding and analyzing topics & trends Influences on users Variety of stakeholders Business Government Social scientists 2
3
Social Media Analysis: Challenges Network and Influences on Users User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11] Dynamic nature Topics & user personalities evolve over time Volume of data Existing approaches fall short 3
4
Soc Med Analysis: State of the Art Content Analysis Ramage ICWSM 2010, Hong SOMA 2010 Variants of LDA Inferring User Interests Ahmed KDD 2011, Wen KDD 2010 Individual features such as user activity or network Patterns in Temporal Evolution Yang et al WSDM 2011 4
5
Bayesian Non-parametric Models Choosing no of components in a mixture model Particularly severe problem for large data volumes such as for social media data Bayesian solution Infinite dimensional prior Allows no of mixture components to grow with data size Cannot capture richness of social media data Algorithms often not scalable 5
6
Our Contributions Analyzing influences in social media data Relational CRP Captures multiple relationships in a domain Extended to handle dynamic nature of the data Multi-threaded online inference algorithm Analysis on 360 million tweets Interesting insights 6
7
Evolving character of topics Tiger Woods: Sudden change from personal to geographic and then to world-wide influence Insights: Sneak Preview 7
8
Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 8
9
Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 9
10
Dirichlet Process (Informal) 10
11
Dirichlet Process: Properties Shown to be discrete and infinite dimensional Used as prior for infinite mixture model 11
12
Dirichlet Process: Properties 12
13
Chinese Restaurant Process (CRP) 13
14
Chinese Restaurant Process (CRP) 14
15
Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Parallelized Online Inference Algorithm Experimental Results 15
16
Relational Ch. Rest. Pr. (RelCRP) R 16
17
Relational Ch. Rest. Pr. (RelCRP) 17
18
Influence of World-wide Factors 18
19
Influence of World-wide Factors 19
20
Influence of Personal Preferences 20
21
Influence of Personal Preferences 21
22
Influence of Friend Network 22
23
Influence of Friend Network 23
24
Influence of Geography India China UK 24
25
Influence of Geography 25
26
Aggregating Influences RelCRP is exchangeable like the CRP Useful as a prior for infinite mixture model RelCRP captures influence of one relation on posts Influences act simultaneously on any user Aggregated influence pattern is user specific Different users affected differently by same combination of world-wide and geographic factors
27
Multi Relational CRP 27
28
Multi Relational CRP 28
29
Multi RelCRP: Generative Process 29
30
Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 30
31
Evolving Patterns in Social Media Number of Topics Topics die and new ones are born User Personalities Susceptibility to influence by world-wide, geographic and friends preferences Existing Topic Distributions Words go out of fashion, new ones enter vocabulary Topic Characters: Popularity of topic changes world-wide, in users preference, sub-networks and geographies 31
32
Dynamic MultiRelCRP 32
33
User Personality Trends 33
34
Evolving Topic Distributions 34
35
Topic Character Trends 35
36
Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 36
37
Inference and Estimation Tasks 37
38
Online Algorithm Traditional iterative framework does not scale for social media data Sequential Monte Carlo methods [Canini AIStats 09] that rejuvenate some old labels also infeasible Online sampling [Banerjee SDM 07] does not revisit old labels at all; initial batch phase Adapt for non-parametric setting 38
39
Multi-threaded Implementation Sequential online implementation does not scale Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10] Our algorithm is parallel, online and non-parametric Explicit consolidation by master thread at the end of each iteration Only new topics consolidated 39
40
Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 40
41
Datasets and Baselines Twitter: 360 million tweets (Jun-Dec 2009) Facebook: 300,000 posts (public profiles, 3 mths) Latent Dirichlet Allocation (LDA) [Hong SOMA 2010] Labeled LDA (L-LDA) Hashtags as topics [Ramage ICWSM 2010] Timeline Dynamic non-parametric topic model [Ahmed UAI 2010] 41
42
1 Model Goodness Perplexity: Ability to generalize to unseen data Both network and dynamics are important for modeling social media data 42
43
2 Quality of Discovered Topics Label assigned to each post indicating category Distribution over words indicating semantics A.Clustering posts using topic labels B.Prediction using topic labels Predicting post authorship & user commenting activity C.Major event detection 43
44
2A Post Clustering using Topics Use hashtags as gold standard (for Twitter) 16K posts #NIPS2009, #ICML2009, #bollywood etc DMRelCRP close to L-LDA without using hashtags DMelCRP produces finer-grained clusters 44
45
2B Prediction Using Topics Authorship: Given post and user, predict if author Commenting activity: Given post and (non-author) user, predict if user comments on that post DMRelCRP topics lead to more accurate prediction 45
46
2C Major Event Detection 46
47
2C Major Event Detection 47
48
2C Major Event Detection 48
49
3 Analysis of Influences 49
50
3A Global Personality Trends 50
51
3A Global Personality Trends 51 Michael Jacksons death FIFA WC Google Wave
52
3A Global Personality Trends 52
53
3B Geo-specific Personality Trends Personality trends very similar in UK and US Geographic influences high at different epochs 53
54
3B Geo-specific Personality Trends India: W-wide and geographic influences weaker China: W-wide weak, geo strong; stable pattern 54
55
3C Topic Character Trends 55
56
3C Topic Character Trends 56
57
3C Topic Character Trends 57
58
Scaling with Data Size Java-based multi-threaded framework; 7 threads 8-core 32 GB RAM Scales largely because of multi-threading 58
59
Summary First attempt at studying user influences in social media data New non-parametric model that captures multiple relationships and temporal evolution Multi-threaded online Gibbs sampling algorithm Extensive evaluation on large real dataset Topics lead to better clustering and prediction Insights on user influence patterns 59
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.