Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic Models in Text Processing

Similar presentations


Presentation on theme: "Topic Models in Text Processing"— Presentation transcript:

1 Topic Models in Text Processing
IR Group Meeting Presented by Qiaozhu Mei

2 Topic Models in Text Processing
Introduction Basic Topic Models Probabilistic Latent Semantic Indexing Latent Dirichlet Allocation Correlated Topic Models Variations and Applications

3 Overview Motivation: Basic Assumptions: Applications
Model the topic/subtopics in text collections Basic Assumptions: There are k topics in the whole collection Each topic is represented by a multinomial distribution over the vocabulary (language model) Each document can cover multiple topics Applications Summarizing topics Predict topic coverage for documents Model the topic correlations

4 Basic Topic Models Unigram model Mixture of unigrams Probabilistic LSI
LDA Correlated Topic Models

5 w: a document in the collection
Unigram Model There is only one topic in the collection Estimation: Maximum likelihood estimation Application: LMIR, N: Number of words in w w: a document in the collection wn: a word in w M: Number of Documents

6 Mixture of unigrams Estimation: MLE and EM Algorithm
There is k topics in the collection, but each document only cover one topic Estimation: MLE and EM Algorithm Application: simple document clustering

7 Probabilistic Latent Semantic Indexing
(Thomas Hofmann ’99): each document is generated from more than one topics, with a set of document specific mixture weights {p(z|d)} over k topics. These mixture weights are considered as fixed parameters to be estimated. Also known as aspect model. No prior knowledge about topics required, context and term co-occurrences are exploited

8 PLSI (cont.) Assume uniform p(d) Parameter Estimation:
d: document Assume uniform p(d) Parameter Estimation: π: {p(z|d)}; : {p(w|z)} Maximizing log-likelihood using EM algorithm

9 Problem of PLSI Mixture weights are considered as document specific, thus no natural way to assign probability to a previously unseen document. Number of parameters to be estimated grows with size of training set, thus overfits data, and suffers from multiple local maxima. Not a fully generative model of documents.

10 Latent Dirichlet Allocation
(Blei et al ‘03) Treats the topic mixture weights as a k-parameter hidden random variable (a multinomial) and places a Dirichlet prior on the multinomial mixing weights. This is sampled once per document. The weights for word multinomial distributions are still considered as fixed parameters to be estimated. For a fuller Bayesian approach, can place a Dirichlet prior to these word multinomial distributions to smooth the probabilities. (like Dirichlet smoothing)

11 Dirichlet Distribution
Bayesian Theory: Everything is not fixed, everything is random. Choose prior for the multinomial distribution of the topic mixture weights: Dirichlet distribution is conjugate to the multinomial distribution, which is natural to choose. A k-dimensional Dirichlet random variable  can take values in the (k-1)-simplex, and has the following probability density on this simplex:

12 The Graph Model of LDA

13 Generating Process of LDA
Choose For each of the N words : Choose a topic Choose a word from , a multinomial probability conditioned on the topic . β is a k by n matrix parameterized with the word probabilities.

14 Inference and Parameter Estimation
Parameters: Corpus level: α, β: sampled once in the corpus creating generative model. Document level: , z: sampled once to generate a document Inference: estimation of document-level parameters. However, it is intractable to compute, which needs approximate inference.


Download ppt "Topic Models in Text Processing"

Similar presentations


Ads by Google