Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz
Document classification One-class approach: one topic per document, with words generated according to the topic. For example, a Naive Bayes model.
Document classification It is more realistic to assume more than one topic per document. Generative model: pick a mixture distribution over K topics and generate words from it.
Document classification Even more realistic: topics may be organized in a hierarchy (not independent); Pick a path from root to leaf in a tree; each node is a topic; sample from the mixture.
Dirichlet distribution (DD) Distribution over distribution vectors of dimension K: P(p; u, ) = 1/Z(u) i p i ui Parameters are a prior distribution (“previous observations”); Symmetric Dirichlet distribution assumes a uniform prior distribution (u i = u j, any i, j).
Latent Dirichlet Allocation (LDA) Generative model of multiple-topic documents; Generate a mixture distribution on topics using a Dirichlet distribution; Pick a topic according to their distribution and generate words according to the word distribution for the topic.
Latent Dirichlet Allocation (LDA) K W w Words Topics Topic distribution DD hyper parameter
Chinese Restaurant Process (CRP) 1 out of 9 customers
Chinese Restaurant Process (CRP) 2 out of 9 customers
Chinese Restaurant Process (CRP) 3 out of 9 customers
Chinese Restaurant Process (CRP) 4 out of 9 customers
Chinese Restaurant Process (CRP) 5 out of 9 customers
Chinese Restaurant Process (CRP) 6 out of 9 customers
Chinese Restaurant Process (CRP) 7 out of 9 customers
Chinese Restaurant Process (CRP) 8 out of 9 customers
Chinese Restaurant Process (CRP) 9 out of 9 customers Data point (a distribution itself) sampled
Species Sampling Mixture Generative model of multiple-topic documents; Generate a mixture distribution on topics using a CRP prior; Pick a topic according to their distribution and generate words according to the word distribution for the topic.
Species Sampling Mixture K W w Words Topics Topic distribution CRP hyper parameter
Nested CRP
Hierarchical LDA (hLDA) Generative model of multiple-topic documents; Generate a mixture distribution on topics using a Nested CRP prior; Pick a topic according to their distribution and generate words according to the word distribution for the topic.
hLDA graphical model
Artificial data experiment word documents on 25-term vocabulary Each vertical bar is a topic
CRP prior vs. Bayes Factors
Predicting the structure
NIPS abstracts
Comments Accommodates growing collections of data; Hierarchical organization makes sense, but not clear to me why the CRP prior is the best prior for that; No mention of time; maybe it takes a very long time.