Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Inference for Mixture Language Models

Similar presentations


Presentation on theme: "Bayesian Inference for Mixture Language Models"— Presentation transcript:

1 Bayesian Inference for Mixture Language Models
Chase Geigle, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

2 Deficiency of PLSA Not a generative model
Can’t compute probability of a new document (why do we care about this?) Heuristic workaround is possible, though Many parameters  high complexity of models Many local maxima Prone to overfitting Overfitting is not necessarily a problem for text mining (only interested in fitting the “training” documents)

3 Latent Dirichlet Allocation (LDA)
Make PLSA a generative model by imposing a Dirichlet prior on the model parameters  LDA = Bayesian version of PLSA Parameters are regularized Can achieve the same goal as PLSA for text mining purposes Topic coverage and topic word distributions can be inferred using Bayesian inference

4 … … w PLSA  LDA Both word distributions and
topic choices are free in PLSA p(w|1) government 0.3 response Topic 1 p(1)=d,1 p(w|2) city 0.2 new orleans Topic 2 w LDA imposes a prior on both p(2)=d,2 donate 0.1 relief 0.05 help p(w|k) Topic k p(k)=d,k

5 Likelihood Functions for PLSA vs. LDA
Core assumption in all topic models PLSA component LDA Added by LDA

6 Parameter Estimation and Inferences in LDA
Parameters can be estimated using ML estimator However, {j} and {d,j} must now be computed using posterior inference Computationally intractable Must resort to approximate inference Many different inference methods are available How many parameters in LDA vs. PLSA?

7 Posterior Inferences in LDA
Computationally intractable, must resort to approximate inference!

8 Illustration of Bayesian Estimation
Bayesian inference: f()=? Posterior: p(|X) p(X|)p() Likelihood: p(X|) X=(x1,…,xN) Posterior Mean Prior: p() 0: prior mode 1: posterior mode ml: ML estimate

9 LDA as a graph model [Blei et al. 03a]
Dirichlet priors distribution over topics for each document (same as d on the previous slides)  (d)  (d)  Dirichlet() distribution over words for each topic (same as  j on the previous slides) topic assignment for each word zi zi  Discrete( (d) )  (j) T  (j)  Dirichlet() word generated from assigned topic wi wi  Discrete( (zi) ) Nd D Most approximate inference algorithms aim to infer from which other interesting variables can be easily computed

10 Approximate Inferences for LDA
Many different ways; each has its pros & cons Deterministic approximation variational EM [Blei et al. 03a] expectation propagation [Minka & Lafferty 02] Markov chain Monte Carlo full Gibbs sampler [Pritchard et al. 00] collapsed Gibbs sampler [Griffiths & Steyvers 04] Most efficient, and quite popular, but can only work with conjugate prior

11 The collapsed Gibbs sampler [Griffiths & Steyvers 04]
Using conjugacy of Dirichlet and multinomial distributions, integrate out continuous parameters Defines a distribution on discrete ensembles z

12 The collapsed Gibbs sampler [Griffiths & Steyvers 04]
Sample each zi conditioned on z-i This is nicer than your average Gibbs sampler: memory: counts can be cached in two sparse matrices optimization: no special functions, simple arithmetic the distributions on  and  are analytic given z and w, and can later be found for each sample

13 Gibbs sampling in LDA iteration 1

14 Gibbs sampling in LDA iteration

15 Gibbs sampling in LDA words in di assigned with topic j
iteration words in di assigned with topic j words in di assigned with any topic Count of instances where wi is assigned with topic j Count of all words

16 Gibbs sampling in LDA What’s the most likely topic for wi in di?
iteration What’s the most likely topic for wi in di? How likely would di choose topic j? How likely would topic j generate word wi ?

17 Gibbs sampling in LDA iteration

18 Gibbs sampling in LDA iteration

19 Gibbs sampling in LDA iteration

20 Gibbs sampling in LDA iteration

21 Gibbs sampling in LDA iteration


Download ppt "Bayesian Inference for Mixture Language Models"

Similar presentations


Ads by Google