Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon.

Similar presentations


Presentation on theme: "Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon."— Presentation transcript:

1 Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University Joint work with: Amr Ahmed, Andrew Arnold, Ramnath Balasubramanyan, Frank Lin, Matt Hurst (MSFT), Ramesh Nallapati, Noah Smith, Eric Xing, Tae Yano

2 Newswire Text Formal Primary purpose: –Inform “typical reader” about recent events Broad audience: –Explicitly establish shared context with reader –Ambiguity often avoided Informal Many purposes: –Entertain, connect, persuade… Narrow audience: –Friends and colleagues –Shared context already established –Many statements are ambiguous out of social context Social Media Text

3 Newswire Text Goals of analysis: –Extract information about events from text –“Understanding” text requires understanding “typical reader” conventions for communicating with him/her Prior knowledge, background, … Goals of analysis: –Very diverse –Evaluation is difficult And requires revisiting often as goals evolve –Often “understanding” social text requires understanding a community Social Media Text

4 Outline Tools for analysis of text –Probabilistic models for text, communities, and time Mixture models and LDA models for text LDA extensions to model hyperlink structure LDA extensions to model time –Alternative framework based on graph analysis to model time & community Preliminary results & tradeoffs Discussion of results & challenges

5 Introduction to Topic Models Mixture model: unsupervised naïve Bayes model C W N M   Joint probability of words and classes: But classes are not visible: Z

6 Introduction to Topic Models

7 Probabilistic Latent Semantic Analysis Model d z w  M Select document d ~ Mult(  ) For each position n = 1, , N d generate z n ~ Mult( ¢ |  d ) generate w n ~ Mult( ¢ |  z n ) dd  N Topic distribution Mixture model: each document is generated by a single (unknown) multinomial distribution of words, the corpus is “mixed” by  PLSA model: each word is generated by a single unknown multinomial distribution of words, each document is mixed by  d

8 9/18/200710-802: Guest Lecture 8 / 57 Introduction to Topic Models PLSA topics (TDT-1 corpus)

9 Introduction to Topic Models JMLR, 2003

10 Introduction to Topic Models Latent Dirichlet Allocation z w  M  N  For each document d = 1, ,M Generate  d ~ Dir(¢ |  ) For each position n = 1, , N d generate z n ~ Mult( ¢ |  d ) generate w n ~ Mult( ¢ |  z n )

11 Introduction to Topic Models Latent Dirichlet Allocation –Overcomes some technical issues with PLSA PLSA only estimates mixing parameters for training docs –Parameter learning is more complicated: Gibbs Sampling: easy to program, often slow Variational EM

12 Introduction to Topic Models Perplexity comparison of various models Unigram Mixture model PLSA LDA Lower is better

13 Introduction to Topic Models Prediction accuracy for classification using learning with topic-models as features Higher is better

14 Outline Tools for analysis of text –Probabilistic models for text, communities, and time Mixture models and LDA models for text LDA extensions to model hyperlink structure LDA extensions to model time –Alternative framework based on graph analysis to model time & community Preliminary results & tradeoffs Discussion of results & challenges

15 9/18/200710-802: Guest Lecture 15 / 57 Hyperlink modeling using PLSA

16 9/18/200710-802: Guest Lecture 16 / 57 Hyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001] d z w  M dd  N z c  Select document d ~ Mult(  ) For each position n = 1, , N d generate z n ~ Mult( ¢ |  d ) generate w n ~ Mult( ¢ |  z n ) For each citation j = 1, , L d generate z j ~ Mult( ¢ |  d ) generate c j ~ Mult( ¢ |  z j ) L

17 9/18/200710-802: Guest Lecture 17 / 57 Hyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001] d z w  M dd  N z c  L PLSA likelihood: New likelihood: Learning using EM

18 9/18/200710-802: Guest Lecture 18 / 57 Hyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001] Heuristic: 0 ·  · 1 determines the relative importance of content and hyperlinks  (1-  )

19 9/18/200710-802: Guest Lecture 19 / 57 Hyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001] Experiments: Text Classification Datasets: –Web KB 6000 CS dept web pages with hyperlinks 6 Classes: faculty, course, student, staff, etc. –Cora 2000 Machine learning abstracts with citations 7 classes: sub-areas of machine learning Methodology: –Learn the model on complete data and obtain  d for each document –Test documents classified into the label of the nearest neighbor in training set –Distance measured as cosine similarity in the  space –Measure the performance as a function of 

20 9/18/200710-802: Guest Lecture 20 / 57 Hyperlink modeling using PLSA [Cohn and Hoffman, NIPS, 2001] Classification performance Hyperlinkconten t Hyperlink conten t

21 Hyperlink modeling using LDA

22 Hyperlink modeling using LinkLDA [Erosheva, Fienberg, Lafferty, PNAS, 2004] z w  M  N  For each document d = 1, ,M Generate  d ~ Dir(¢ |  ) For each position n = 1, , N d generate z n ~ Mult( ¢ |  d ) generate w n ~ Mult( ¢ |  z n ) For each citation j = 1, , L d generate z j ~ Mult(. |  d ) generate c j ~ Mult(. |  z j ) z c  L Learning using variational EM

23 Hyperlink modeling using LDA [Erosheva, Fienberg, Lafferty, PNAS, 2004]

24 Newswire Text Goals of analysis: –Extract information about events from text –“Understanding” text requires understanding “typical reader” conventions for communicating with him/her Prior knowledge, background, … Goals of analysis: –Very diverse –Evaluation is difficult And requires revisiting often as goals evolve –Often “understanding” social text requires understanding a community Social Media Text Science as a testbed for social text: an open community which we understand

25 9/18/200710-802: Guest Lecture 25 / 57 Author-Topic Model for Scientific Literature

26 9/18/200710-802: Guest Lecture 26 / 57 Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] z w  M  N  For each author a = 1, ,A Generate  a ~ Dir(¢ |  ) For each topic k = 1, ,K Generate  k ~ Dir( ¢ |  ) For each document d = 1, ,M For each position n = 1, , N d Generate author x ~ Unif(¢ | a d ) generate z n ~ Mult( ¢ |  a ) generate w n ~ Mult( ¢ |  z n ) x a A P  K

27 9/18/200710-802: Guest Lecture 27 / 57 Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] Learning: Gibbs sampling z w  M  N  x a A P  K

28 9/18/200710-802: Guest Lecture 28 / 57 Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] Perplexity results

29 9/18/200710-802: Guest Lecture 29 / 57 Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] Topic-Author visualization

30 9/18/200710-802: Guest Lecture 30 / 57 Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] Application 1: Author similarity

31 9/18/200710-802: Guest Lecture 31 / 57 Author-Topic Model for Scientific Literature [Rozen-Zvi, Griffiths, Steyvers, Smyth UAI, 2004] Application 2: Author entropy

32 9/18/200710-802: Guest Lecture 32 / 57 Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]

33 9/18/200710-802: Guest Lecture 33 / 57 Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] Gibbs sampling

34 9/18/200710-802: Guest Lecture 34 / 57 Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] Datasets –Enron email data 23,488 messages between 147 users –McCallum’s personal email 23,488(?) messages with 128 authors

35 9/18/200710-802: Guest Lecture 35 / 57 Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] Topic Visualization: Enron set

36 9/18/200710-802: Guest Lecture 36 / 57 Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05] Topic Visualization: McCallum’s data

37 9/18/200710-802: Guest Lecture 37 / 57 Author-Topic-Recipient model for email data [McCallum, Corrada-Emmanuel,Wang, ICJAI’05]

38 9/18/200710-802: Guest Lecture 38 / 57 Modeling Citation Influences

39 Modeling Citation Influences [Dietz, Bickel, Scheffer, ICML 2007] Copycat model of citation influence LDA model for cited papers Extended LDA model for citing papers For each word, depending on coin flip c, you might chose to copy a word from a cited paper instead of generating the word

40 9/18/200710-802: Guest Lecture 40 / 57 Modeling Citation Influences [Dietz, Bickel, Scheffer, ICML 2007] Citation influence model

41 Modeling Citation Influences [Dietz, Bickel, Scheffer, ICML 2007] Citation influence graph for LDA paper

42 Models of hypertext for blogs [ICWSM 2008] Ramesh Nallapati me Amr Ahmed Eric Xing

43 LinkLDA model for citing documents Variant of PLSA model for cited documents Topics are shared between citing, cited Links depend on topics in two documents Link-PLSA-LDA

44 Experiments 8.4M blog postings in Nielsen/Buzzmetrics corpus –Collected over three weeks summer 2005 Selected all postings with >=2 inlinks or >=2 outlinks –2248 citing (2+ outlinks), 1777 cited documents (2+ inlinks) –Only 68 in both sets, which are duplicated Fit model using variational EM

45 Topics in blogs Model can answer questions like : which blogs are most likely to be cited when discussing topic z?

46 Topics in blogs Model can be evaluated by predicting which links an author will include in a an article Lower is better Link-PLDA-LDA Link-LDA

47 Another model: Pairwise Link-LDA z w   N  z w  N z z c  LDA for both cited and citing documents Generate an indicator for every pair of docs Vs. generating pairs of docs Link depends on the mixing components (  ’s) stochastic block model

48 Pairwise Link-LDA supports new inferences… …but doesn’t perform better on link prediction

49 Outline Tools for analysis of text –Probabilistic models for text, communities, and time Mixture models and LDA models for text LDA extensions to model hyperlink structure –Observation: these models can be used for many purposes… LDA extensions to model time –Alternative framework based on graph analysis to model time & community Discussion of results & challenges

50

51 Authors are using a number of clever tricks for inference….

52

53

54

55 Predicting Response to Political Blog Posts with Topic Models [NAACL ’09] Tae Yano Noah Smith

56 56 Political blogs and and comments Comment style is casual, creative, less carefully edited Posts are often coupled with comment sections

57 Political blogs and comments Most of the text associated with large “A-list” community blogs is comments –5-20x as many words in comments as in text for the 5 sites considered in Yano et al. A large part of socially-created commentary in the blogosphere is comments. –Not blog  blog hyperlinks Comments do not just echo the post

58 Modeling political blogs Our political blog model: CommentLDA D = # of documents; N = # of words in post; M = # of words in comments z, z` = topic w = word (in post) w`= word (in comments) u = user

59 Modeling political blogs Our proposed political blog model: CommentLDA LHS is vanilla LDA D = # of documents; N = # of words in post; M = # of words in comments

60 Modeling political blogs Our proposed political blog model: CommentLDA RHS to capture the generation of reaction separately from the post body Two separate sets of word distributions D = # of documents; N = # of words in post; M = # of words in comments Two chambers share the same topic-mixture

61 Modeling political blogs Our proposed political blog model: CommentLDA User IDs of the commenters as a part of comment text generate the words in the comment section D = # of documents; N = # of words in post; M = # of words in comments

62 Modeling political blogs Another model we tried: CommentLDA This is a model agnostic to the words in the comment section! D = # of documents; N = # of words in post; M = # of words in comments Took out the words from the comment section! The model is structurally equivalent to the LinkLDA from (Erosheva et al., 2004)

63 63 Topic discovery - Matthew Yglesias (MY) site

64 64 Topic discovery - Matthew Yglesias (MY) site

65 65 Topic discovery - Matthew Yglesias (MY) site

66 66 LinkLDA and CommentLDA consistently outperform baseline models Neither consistently outperforms the other. Comment prediction 20.54 % 16.92 % 32.06 % Comment LDA (R) Link LDA (R) Link LDA (C) user prediction: Precision at top 10 From left to right: Link LDA(-v, -r,-c) Cmnt LDA (-v, -r, -c), Baseline (Freq, NB) (CB) (MY) (RS)

67 From Episodes to Sagas: Temporally Clustering News Via Social- Media Commentary Noah Smith Matthew HurstFrank Lin Ramnath Balasubramanyan

68 Motivation News-related blogosphere is driven by recency Some recent news is better understood based on context of sequence of related stories Some readers have this context – some don’t To reconstruct the context, reconstruct the sequence of related stories (“saga”) –Similar to retrospective event detection First efforts: –Find related stories –Cluster by time Evaluation: agreement with human annotators

69 Clustering results on Democratic-primary- related documents SpeCluster + time: Mixture of multinomials + model for “general” text + timestamp from Gaussian k-walks (more later)

70 Clustering results on Democratic-primary- related documents Also had three human annotators build gold-standard timelines hierarchical annotated with names of events, times, … Can evaluate a machine-produced timeline by tree-distance to gold- standard one

71 Clustering results on Democratic-primary- related documents Issue: divergence of opinion with human annotators is modeling community interests the problem? how much of what we want is actually in the data? should this task be supervised or unsupervised?

72 More sophisticated time models Hierarchical LDA Over Time model: –LDA to generate text –Also generate a timestamp for each document from topic-specific Gaussians –“Non-parametric” model Number of clusters is also generated (not specified by user) –Allows use of user-provided “prototypes” –Evaluated on liberal/conservative blogs and ML papers from NIPS conferences Ramnath Balasubramanyan

73 Results with HOTS model - unsupervised

74 Results with HOTS model – human guidance Adding human seeds for some key events improves performance on all events. Allows a user to partially specify a timeline of events and have the system complete it.

75 Goals of analysis: –Very diverse –Evaluation is difficult And requires revisiting often as goals evolve –Often “understanding” social text requires understanding a community Social Media Text Probabilistic models –can model many aspects of social text Community (links, comments) Time Evaluation: –introspective, qualitative on communities we understand Scientific communities –quantitative on predictive tasks Link prediction, user prediction, … –Against “gold-standard visualization” (sagas) Comments


Download ppt "Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon."

Similar presentations


Ads by Google