Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May. 10 2013 LDA.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
Teg Grenager NLP Group Lunch February 24, 2005
Gentle Introduction to Infinite Gaussian Mixture Modeling
Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Course: Neural Networks, Instructor: Professor L.Behera.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
Information retrieval – LSI, pLSI and LDA
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Title: The Author-Topic Model for Authors and Documents
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Statistical Topic Modeling part 1
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Integrating Topics and Syntax -Thomas L
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Stick-Breaking Constructions
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Latent Dirichlet Allocation
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Gaussian Processes For Regression, Classification, and Prediction.
Web-Mining Agents Topic Analysis: pLSI and LDA
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Analysis of Social Media MLD , LTI William Cohen
Dirichlet Distribution
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
Completely Random Measures for Bayesian Nonparametrics Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Fast search for Dirichlet process mixture models
Extracting Mobile Behavioral Patterns with the Distant N-Gram Topic Model Lingzi Hong Feb 10th.
Online Multiscale Dynamic Topic Models
Non-Parametric Models
Distributions and Concepts in Probability Theory
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Topic Modeling Nick Jordan.
Bayesian Inference for Mixture Language Models
Topic models for corpora and for graphs
Michal Rosen-Zvi University of California, Irvine
Topic models for corpora and for graphs
Topic Models in Text Processing
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA

Introduction Ouyang Ruofei LDA Parameters: 2 Inference: data = latent pattern + noise

Introduction Ouyang Ruofei LDA Parametric Model: 3 Nonparametric Model: Number of parameters is fixed w.r.t. sample size Number of parameters grows with sample size Infinite dimensional parameter space ProblemParameter Density Estimation Distributions RegressionFunctions ClusteringPartitions

Clustering Ouyang Ruofei LDA 4 1.Ironman2.Thor3.Hulk Indicator variable for each data point

Dirichlet process Ouyang Ruofei LDA 5 Ironman: 3 times Thor: 2 times Hulk: 2 times Without the likelihood, we know that: 1. There are three clusters 2. The distribution over three clusters New data

Dirichlet process Ouyang Ruofei LDA 6 Dirichlet distribution: pdf: mean: Example: Dir(Ironman,Thor,Hulk)

Dirichlet process Ouyang Ruofei LDA 7 Dirichlet distribution: Multinomial distribution: Conjugate prior Posterior: Example:IronmanThorHulkPrior322 Likelihood Posterior Pseudo count

Dirichlet process Ouyang Ruofei LDA 8 In our Avengers model, K=3 (Ironman, Thor, Hulk) Dirichlet process: However, this guy comes… Dirichlet distribution can’t model this stupid guy K = infinity Nonparametrics here mean infinite number of clusters

Dirichlet process Ouyang Ruofei LDA 9 α: Pseudo counts in each cluster G 0 : Base distribution of each cluster A distribution over distributions Dirichlet process: Given any partition Distribution template

Dirichlet process Ouyang Ruofei LDA 10 Construct Dirichlet process by CRP In a restaurant, there are infinite number of tables. Chinese restaurant process: Costumer 1 seats at an unoccupied table with p=1. Costumer N seats at table k with p=

Dirichlet process Ouyang Ruofei LDA 11

Dirichlet process Ouyang Ruofei LDA 12

Dirichlet process Ouyang Ruofei LDA 13

Dirichlet process Ouyang Ruofei LDA 14

Dirichlet process Ouyang Ruofei LDA 15 Customers : data Tables : clusters

Dirichlet process Ouyang Ruofei LDA 16 Train the model by Gibbs sampling

Dirichlet process Ouyang Ruofei LDA 17 Train the model by Gibbs sampling

Gibbs sampling Ouyang Ruofei LDA 18 Gibbs sampling is a MCMC method to obtain a sequence of observations from a multivariate distribution The intuition is to turn a multivariate problem into a sequence of univariate problem. Multivariate: Univariate: In Dirichlet process,

Gibbs sampling Ouyang Ruofei LDA 19 Gibbs sampling pseudo code:

Topic model Ouyang Ruofei LDA 20 Document Mixture of topics we can read words Latent variable But, topics words

Topic model Ouyang Ruofei LDA 21

Topic model Ouyang Ruofei LDA 22

Topic model Ouyang Ruofei LDA 23 word/topic counttopic/doc count topic of x ij observed word other topics other words

Topic model Ouyang Ruofei LDA 24 Apply Dirichlet process in topic model Topic 1 Topic 2 Topic 3 Document P1P1P1P1 P2P2P2P2 P3P3P3P3 Topic 1 Topic 2 Topic 3 Word Q1Q1Q1Q1 Q2Q2Q2Q2 Q3Q3Q3Q3 Learn the distribution of topics in a document Learn the distribution of topics for a word

Topic model Ouyang Ruofei LDA 25 t1t2t3 d1 t1t2t3 d2 t1t2t3 d3 w1w2w3w4 t1 t2 t3 topic/doc table word/topic table

Topic model Ouyang Ruofei LDA 26 Latent Dirichlet allocation: Dirichlet mixture model:

LDA Example Ouyang Ruofei LDA 27 w: ipad apple itunes mirror queen joker ladygaga t1: product t2: story t3: poker d1: ipad apple itunes d2: apple mirror queen d3: queen joker ladygaga d4: queen ladygaga mirror In fact, the topics are latent

LDA example Ouyang Ruofei LDA 28 d1: ipad apple itunes d2: apple mirror queen d3: queen joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t3111 sum t1t2t3 d1111 d2120 d3102 d

LDA example Ouyang Ruofei LDA 29 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t3111 sum t1t2t3 d1111 d2120 d3102 d queen

LDA example Ouyang Ruofei LDA 30 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t sum t1t2t3 d1111 d2120 d d queen

LDA example Ouyang Ruofei LDA 31 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t2212 t3101 sum t1t2t3 d1111 d2120 d3101 d queen

LDA example Ouyang Ruofei LDA 32 d1: ipad apple itunes d2: apple mirror queen d3: joker ladygaga d4: queen ladygaga mirror ipadappleitunesmirrorqueenjokerladygaga t1112 t t3101 sum t1t2t3 d1111 d2120 d d queen 2

Further Ouyang Ruofei LDA 33 Dirichlet distribution prior: K topics Alpha mainly controls the probability of a topic with few training data in the document. Dirichlet process prior: infinite topics Beta mainly controls the probability of a topic with few training data in the words. Supervised Unsupervised

Further Ouyang Ruofei LDA 34 Unrealistic bag of words assumption Lose power law behavior TNG, biLDA Pitman Yor language model David Blei has done an extensive survey on topic model

Q&A Ouyang Ruofei LDA