Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,

Similar presentations


Presentation on theme: "Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,"— Presentation transcript:

1 Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research, Bangalore *Collaboration w/ Himabindu Lakkaraju & Chiranjib Bhattacharyya Workshop on Social Computing IIT Kharagpur, Oct 5-6 2012

2 Social Media Analysis: Motivation Microblogs: Twitter, Facebook, MySpace Understanding and analyzing topics & trends Influences on users Variety of stakeholders Business Government Social scientists 2

3 Social Media Analysis: Challenges Network and Influences on Users User personality: Personal preferences, global and geographic trends, social circle in the network [Yang WSDM 11] Dynamic nature Topics & user personalities evolve over time Volume of data Existing approaches fall short 3

4 Soc Med Analysis: State of the Art Content Analysis Ramage ICWSM 2010, Hong SOMA 2010 Variants of LDA Inferring User Interests Ahmed KDD 2011, Wen KDD 2010 Individual features such as user activity or network Patterns in Temporal Evolution Yang et al WSDM 2011 4

5 Bayesian Non-parametric Models Choosing no of components in a mixture model Particularly severe problem for large data volumes such as for social media data Bayesian solution Infinite dimensional prior Allows no of mixture components to grow with data size Cannot capture richness of social media data Algorithms often not scalable 5

6 Our Contributions Analyzing influences in social media data Relational CRP Captures multiple relationships in a domain Extended to handle dynamic nature of the data Multi-threaded online inference algorithm Analysis on 360 million tweets Interesting insights 6

7 Evolving character of topics Tiger Woods: Sudden change from personal to geographic and then to world-wide influence Insights: Sneak Preview 7

8 Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 8

9 Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 9

10 Dirichlet Process (Informal) 10

11 Dirichlet Process: Properties Shown to be discrete and infinite dimensional Used as prior for infinite mixture model 11

12 Dirichlet Process: Properties 12

13 Chinese Restaurant Process (CRP) 13

14 Chinese Restaurant Process (CRP) 14

15 Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Parallelized Online Inference Algorithm Experimental Results 15

16 Relational Ch. Rest. Pr. (RelCRP) R 16

17 Relational Ch. Rest. Pr. (RelCRP) 17

18 Influence of World-wide Factors 18

19 Influence of World-wide Factors 19

20 Influence of Personal Preferences 20

21 Influence of Personal Preferences 21

22 Influence of Friend Network 22

23 Influence of Friend Network 23

24 Influence of Geography India China UK 24

25 Influence of Geography 25

26 Aggregating Influences RelCRP is exchangeable like the CRP Useful as a prior for infinite mixture model RelCRP captures influence of one relation on posts Influences act simultaneously on any user Aggregated influence pattern is user specific Different users affected differently by same combination of world-wide and geographic factors

27 Multi Relational CRP 27

28 Multi Relational CRP 28

29 Multi RelCRP: Generative Process 29

30 Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 30

31 Evolving Patterns in Social Media Number of Topics Topics die and new ones are born User Personalities Susceptibility to influence by world-wide, geographic and friends preferences Existing Topic Distributions Words go out of fashion, new ones enter vocabulary Topic Characters: Popularity of topic changes world-wide, in users preference, sub-networks and geographies 31

32 Dynamic MultiRelCRP 32

33 User Personality Trends 33

34 Evolving Topic Distributions 34

35 Topic Character Trends 35

36 Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 36

37 Inference and Estimation Tasks 37

38 Online Algorithm Traditional iterative framework does not scale for social media data Sequential Monte Carlo methods [Canini AIStats 09] that rejuvenate some old labels also infeasible Online sampling [Banerjee SDM 07] does not revisit old labels at all; initial batch phase Adapt for non-parametric setting 38

39 Multi-threaded Implementation Sequential online implementation does not scale Iterative Gibbs sampling algorithms parallelized for hierarchical Bayesian models [Asuncion NIPS 08, Smola VLDB 10] Our algorithm is parallel, online and non-parametric Explicit consolidation by master thread at the end of each iteration Only new topics consolidated 39

40 Talk Outline Background: Chinese Restaurant Processes CRP with multiple relationships: (RelCRP, MRelCRP) Dynamic MRelCRP Multi-threaded Online Inference Algorithm Experimental Results 40

41 Datasets and Baselines Twitter: 360 million tweets (Jun-Dec 2009) Facebook: 300,000 posts (public profiles, 3 mths) Latent Dirichlet Allocation (LDA) [Hong SOMA 2010] Labeled LDA (L-LDA) Hashtags as topics [Ramage ICWSM 2010] Timeline Dynamic non-parametric topic model [Ahmed UAI 2010] 41

42 1 Model Goodness Perplexity: Ability to generalize to unseen data Both network and dynamics are important for modeling social media data 42

43 2 Quality of Discovered Topics Label assigned to each post indicating category Distribution over words indicating semantics A.Clustering posts using topic labels B.Prediction using topic labels Predicting post authorship & user commenting activity C.Major event detection 43

44 2A Post Clustering using Topics Use hashtags as gold standard (for Twitter) 16K posts #NIPS2009, #ICML2009, #bollywood etc DMRelCRP close to L-LDA without using hashtags DMelCRP produces finer-grained clusters 44

45 2B Prediction Using Topics Authorship: Given post and user, predict if author Commenting activity: Given post and (non-author) user, predict if user comments on that post DMRelCRP topics lead to more accurate prediction 45

46 2C Major Event Detection 46

47 2C Major Event Detection 47

48 2C Major Event Detection 48

49 3 Analysis of Influences 49

50 3A Global Personality Trends 50

51 3A Global Personality Trends 51 Michael Jacksons death FIFA WC Google Wave

52 3A Global Personality Trends 52

53 3B Geo-specific Personality Trends Personality trends very similar in UK and US Geographic influences high at different epochs 53

54 3B Geo-specific Personality Trends India: W-wide and geographic influences weaker China: W-wide weak, geo strong; stable pattern 54

55 3C Topic Character Trends 55

56 3C Topic Character Trends 56

57 3C Topic Character Trends 57

58 Scaling with Data Size Java-based multi-threaded framework; 7 threads 8-core 32 GB RAM Scales largely because of multi-threading 58

59 Summary First attempt at studying user influences in social media data New non-parametric model that captures multiple relationships and temporal evolution Multi-threaded online Gibbs sampling algorithm Extensive evaluation on large real dataset Topics lead to better clustering and prediction Insights on user influence patterns 59


Download ppt "Dynamic Multi-Relational Chinese Restaurant Process for Analyzing Influences on Users in Social Media* Indrajit Bhattacharya Research Scientist IBM Research,"

Similar presentations


Ads by Google