Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April 23 2010 TexPoint fonts used in EMF. Read the TexPoint manual before.

Similar presentations


Presentation on theme: "Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April 23 2010 TexPoint fonts used in EMF. Read the TexPoint manual before."— Presentation transcript:

1 Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April 23 2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A AA A A A A A

2 Outline 2  Topic Models: An Example of Latent Dirichlet Allocation  Learning : Expectation Maximization Algorithms  Learning: Gibbs sampling

3 Problem set-up  Given: a collection of documents  Want:  Detect key ‘topics’ discussed in the collection  For each document detect which ‘topics’ are discussed there  Requirements:  No supervision (documents are not labeled)  Probabilistic methods

4 Motivation  Visualization of collections:  what are the topics discussed?  which documents discuss a topic?  Opinion mining:  what is sentiment towards a product aspects?  what are important aspects of a product?  Dimensionality reduction:  for information retrieval  for document classification  Summarization:  ensuring topic coverage in a summary ...

5 Latent Semantic Analysis [Deerwester et al., 1990]  Decomposition of the coccurence matrix  Approximate the co-occurence matrix Hope: terms having common meaning are mapped to the same direction - Hope: documents discussing similar topics have similar representation - Non-zero inner products between documents with non-overlapping terms - Hope: documents discussing similar topics have similar representation - Non-zero inner products between documents with non-overlapping terms

6 Latent Semantic Analysis [Deerwester et al., 1990]  Optimal rank k approximation (Frobenius norm):

7 Latent Semantic Analysis [Deerwester et al., 1990]  Not motivated probabilistically (no clean underlying probability model)  No obvious interpretation of directions

8 Probabilistic LSA [Hofmann, 99]  P arameters:  Distributions of topics in document P(z | d), for every d  Distribution of words for every topic, P(w | z), z 2 {1, …K}  Generative story:  For each document d  For each word occurrence i in document d  Select topic z i for the word from P(z i | d)  Generate word w i from P(w i | z i )  Note:  Words in the same documents can be generated from different topics

9 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party … Generative story: Given parameters generate text Generative story: Given parameters generate text

10 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

11 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

12 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

13 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

14 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

15 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

16 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

17 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

18 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

19 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

20 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

21 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

22 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

23 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

24 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

25 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

26 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

27 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

28 Probabilistic LSA : Example Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party …

29  Note:  Words in the same documents can be generated from different topics  Does not take into account order, the following to texts are guaranteed to have the same probability under the model delays due to the volcanic ash cloud will affect Formula1 teams’ preparations to ash due Formula1 affect volcanic the will teams’ cloud preparations delays PLSA: Example

30 We considered: Document 1 Document 2 delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party … ? ?

31 In fact we are solving reverse problem: Document 1 … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … Document 2 … Obama will not attend the ceremony due to delays caused by eruption … delays volcanic volcano ash cloud … 0.003 0.002 0.001 … Eruption P(w | z = 1) football teams ball preparations Formula1 … 0.004 0.002 0.001 … Sport P(w | z = 2) 0.005 0.001 0.0001 … Politics P(w | z = 3) Obama Merkel ceremony attend party … ? ? ? ? ? We are going to talk about learning in a moment

32 Directed Graphical Models … … … Doc 1Doc M N1N1 NMNM - Roughly, arrows denote conditional dependencies - A generative story corresponds to a topological order on the graph - Roughly, arrows denote conditional dependencies - A generative story corresponds to a topological order on the graph

33 Plate Notation … … … Doc 1Doc M N1N1 NMNM N M

34 Plate Notation N M Generated from P(z | d) Generate from P(w | z)

35 Distributions are also variables… N M K

36 If they are variables, maybe we can generate them from some variables? N M K Prior distribution for topic distributions in documents Prior distribution for word distributions This is essentially the Latent Dirichlet Allocation (LDA) model (Blei et al, 2001): Hierarchical Bayesian generalization of PLSA

37 Latent Dirichlet Allocation  Parameters: two scalars: and  Generative story:  For draw word distrib-s  For each document d  Draw topic distribs  For each word occurrence i in document d  Select topic z i for the word from  Generate word w i from Dirichlet distribution: think of it as a “distribution of over distributions” We will talk about semantics of these parameters later

38 Dirichlet distribution  Defines a distribution over p 1,.., p K-1 such as:   - can be regarded as a distribution  “Distribution over distributions” (K-1 dimensional simplex)  Form of the Dirichlet distribution (with symmetric prior) Normalization (Beta function)

39 Meaning of hyperparameter  Consider case K= 2 (i.e., Beta distribution), pdf: ® = 0.5 : prefers “biased” distributions ® = 2 : prefers smooth distributions Often, we want sparse distributions: though the collection may represent 1,000 topics, each document discusses 2-3 topics. In this case we use small ® Can be regarded as “pseudo counts”: ® =2 means that you observed 1 example of each class

40 Summary so far  We defined a generative model of document collections:  Now, first we will consider how to do MAP estimation, i.e.  Then we will consider Bayesian methods Expectation maximization algorithm

41 Expectation Maximization 41  EM is a class of algorithms that is used to estimate parameters in the presence of missing attributes.  Non-convex optimization, therefore:  converges to a local maximum of the maximum likelihood function (or posterior distribution if we incorporate the prior distribution).  can be very sensitive to the starting point In LDA we do not observe from each topic z a word is generated

42 Three coin example 42  We observe a series of coin tosses generated in the following way:  A person has three coins.  Coin 0: probability of Head is ®  Coin 1: probability of Head p  Coin 2: probability of Head q  Consider the following coin-tossing scenario

43 Three coin example 43  Scenario:  Toss coin 0 (do not show it to anyone!).  If Head – toss coin 1 M times;  Else -- toss coin 2 M times.  Only the series of tosses are observed  HHHT, HTHT, HHHT, HTTH  What are the parameters of the coins ? ( ®, p, q)

44 Three coin example 44  Scenario:  Toss coin 0 (do not show it to anyone!).  If Head – toss coin 1 M times;  Else -- toss coin 2 M times.  Only the series of tosses are observed  HHHT, HTHT, HHHT, HTTH  What are the parameters of the coins ? ( ®, p, q)  There is no closed form solution to the problem

45 Key Intuition 45  If we knew which of the data points (HHHT), (HTHT), (HTTH) came from Coin1 and which from Coin2, that would be trivial.

46 Key Intuition 46  If we knew which of the data points (HHHT), (HTHT), (HTTH) came from Coin1 and which from Coin2, there was no problem.  Instead, use an iterative approach for estimating the parameters:  Guess the probability that a data point came from Coin 1/2  Generate fictional labels, weighted according to this probability.  Re-estimate the initial parameter setting: set them to maximize the likelihood of these augmented data.  This process can be iterated and can be shown to converge to a local maximum of the likelihood function

47 EM-algorithm: coins (E-step) 47  We will assume (for a minute) that we know the parameters and use it to estimate which Coin it is  Then, we will use the estimation for the tossed Coin, to estimate the most likely parameters and so on...  What is the probability that the ith data point came from Coin1 ?

48 EM-algorithm: coins (M-step) 48  At this point we would like to compute the likelihood of the data, and find the parameters which maximize it  We will maximize the likelihood of the data (n data points)  But one of the variables: the coin name is hidden:  Instead on each step we maximize expectation of the likelihood over the coin name:

49 EM-algorithm: coins (M-step) 49 Continue:

50 EM-algorithm: coins 50  Now fine the most likely parameters by setting derivatives to zero: Note that are fixed (computed from the previous estimate of parameters)

51 EM-algorithm: coins 51  Now fine the most likely parameters by setting derivatives to zero: Note that are fixed (computed from the previous estimate of parameters)

52 Summary: EM 52

53 Summary: EM 53

54 EM for PLSA / LDA 54 N M K

55 EM for PLSA / LDA 55 N M K Can be negative… Ignore it for now. Can be regarded as smoothing

56 Summary so far  We defined a generative model of document collections:  We considered how to do MAP estimation, i.e.  Now we could visualize collections  Estimate topic distributions in each document

57 Example (Science collection)  Top 10 words for 10 topics out of 128 (ordered by

58 Example: topics of a document … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations …

59 Outline 59  Topic Models: An Example of Latent Dirichlet Allocation  Learning : Expectation Maximization Algorithms  Learning: Gibbs sampling

60 Gibbs Sampling for PLSA / LDA 60 N M K Choosing word for topic j for position i Choosing topic j for document d i

61 Gibbs Sampling for PLSA / LDA  We do not directly obtain and  Instead we observe which words are assigned to which topics:  We can estimate the parameters from the sample (and take into account pseudo counts priors) … Delays due to the volcanic ash cloud will affect Formula1 teams’ preparations … For details on Gibbs sampling refer to Finding Scientific Topics, Griffiths and Steyvers, PNAS 2004 For details on Gibbs sampling refer to Finding Scientific Topics, Griffiths and Steyvers, PNAS 2004

62 Summary 62  We considered the most standard “topic model”: PLSA / LDA  Reviewed two basic inference techniques:  EM  Collapsed Gibbs sampling  These methods are applicable to the majority of models we will consider in class

63 Formalities 63  Doodle poll: we will stay with Friday, 2pm  Paper selection:  Deadline extended  Not sure what we will do with the next class April 30 (watch for announcements)  If we do not get all the paper selected, review selections may be affected  These methods are applicable to the majority of models we will consider in class

64 References 64  PLSA: Hofmann, Probabilistic Latent Semantic Indexing, SIGIR 99  LDA (they use variational inference though): Blei, Ng, Jordan, and Lafferty, Latent Dirichlet allocation, Journal of Machine Learning Research, 2003  Collapsed Gibbs sampling for LDA: Griffiths and Steyvers, Finding Scientific Topics, PNAS 2004  EM (original paper): Dempster, Laird and Rubin. Maximum Likelihood from Incomplete Data via the EM algorithm. Journal of the Royal Statistical Society B, 1977  EM, a note by Michael Collins: people.csail.mit.edu/mcollins/papers/wpeII.4.ps  Video Tutorial on EM by Chris Bishop: http://videolectures.net/mlss04_bishop_gmvm/ (see part 4)


Download ppt "Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April 23 2010 TexPoint fonts used in EMF. Read the TexPoint manual before."

Similar presentations


Ads by Google