Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Mixture Models and the EM Algorithm
Expectation Maximization
Probabilistic Clustering-Projection Model for Discrete Data
Segmentation and Fitting Using Probabilistic Methods
Statistical Topic Modeling part 1
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Mixture Language Models and EM Algorithm
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Latent Dirichlet Allocation a generative model for text
Gaussian Mixture Example: Start After First Iteration.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation Maximization Algorithm
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Maximum Likelihood (ML), Expectation Maximization (EM)
Carnegie Mellon Exact Maximum Likelihood Estimation for Word Mixtures Yi Zhang & Jamie Callan Carnegie Mellon University Wei Xu.
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
EM and expected complete log-likelihood Mixture of Experts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Lecture 19: More EM Machine Learning April 15, 2010.
Probabilistic Topic Models
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
An Introduction to Latent Dirichlet Allocation (LDA)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Flat clustering approaches
Techniques for Dimensionality Reduction
Relevance Feedback Hongning Wang
Analysis of Social Media MLD , LTI William Cohen
Statistical Models for Automatic Speech Recognition Lukáš Burget.
DENCLUE 2.0: Fast Clustering based on Kernel Density Estimation Alexander Hinneburg Martin-Luther-University Halle-Wittenberg, Germany Hans-Henning Gabriel.
CSE 446: Expectation Maximization (EM) Winter 2012 Daniel Weld Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Probabilistic Topic Models Hongning Wang Outline 1.General idea of topic models 2.Basic topic models -Probabilistic Latent Semantic Analysis (pLSA)
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
Latent Variables, Mixture Models and EM
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Topic Modeling Nick Jordan.
Stochastic Optimization Maximization for Latent Variable Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Topic models for corpora and for graphs
Topic models for corpora and for graphs
Topic Models in Text Processing
Presentation transcript:

Modeling Annotated Data (SIGIR 2003) David M. Blei, Michael I. Jordan Univ. of California, Berkeley Presented by ChengXiang Zhai, July 10, 2003

A One-slide Summary  Problem: probabilistic modeling of annotated data (images + caption)  Method: Correspondence latent Dirichlet allocation (Corr- LDA) + two baseline models (GM-mixture, GM-LDA)  Evaluation  Held-out likelihood: Corr-LDA=GM-LDA > GM-mixture  Auto annotation: Corr-LDA>GM-mixture > GM-LDA  Image Retrieval: Corr-LDA>{GM-mixture, GM-LDA}  Conclusions: Corr-LDA is a good model

The Problem  Image/Caption Data: (r,w)  R={r1,…,rN}: Regions (primary data)  W={w1,…,wM}: Words (annotations)  R and W are different types  Need models for 3 tasks  Modeling join distribution: p(r,w) (for clustering)  Modeling conditional distr. P(w|r) (labeling an image, retrieval)  Modeling per-region word distr. P(w|ri) (labeling a region)

Three Generative Models  General  Region (ri) is modeled by a multivariate Gaussian with diagonal covariance  Word (wi) is modeled by a multinomial distribution  Assume k clusters  GM-Mixture: Gaussian-multinomial mixture  An image-caption pair belongs to exactly one cluster  Gaussian-multinomial LDA  An image-caption pair may belong to several clusters; each region/word belongs to exactly one cluster  Regions and words of the same image may belong to completely disjoint clusters  Correspondence LDA  An image-caption pair may belong to several clusters; each region/word belongs to exactly one cluster  A word must belong to exactly one of the clusters of the regions.

Detour, Probabilistic models for document/term Clustering…

A Mixture Model of Documents Select a group Generate a document (word sequence) C1C1 P(C 1 ) C2C2 P(C 2 ) P(w|C 1 ) P(w|C 2 ) CkCk … P(C k ) P(w|C k ) Maximum Likelihood Estimator

Applying EM Algorithm c1c1 c2c2 ckck d1d1 dndn Cluster/groupDocument Hidden variables z 11,…z 1k z ij  {0,1} z ij =1 iff d i is in cluster j z n1,…z nk Incomplete likelihood: Complete likelihood: E-step: compute E z |  old [L c (  |D)] M-step:  = argmax  E z |  old [L c (  |D)] Data: D={d1,…,dn} Compute p(z ij |d i,  old ) Compute expected counts for estimating 

EM Updating Formulas  Parameters:  =({p(C i )}, {p(w j |C i )})  Initialization: randomly set  0  Repeat until converge  E-step  M-step Practical issues: - “under-flow” - smoothing

Semi-supervised Classification  D=E(C 1 )  … E(C k )  U U – Unlabeled docs  Parameters:  =({p(C i )}, {p(w j |C i )})  Initialization: randomly set  0  Repeat until converge  E-step (only applied to d i in U)  M-step (pool real counts from E and expected counts from U) Smoothing +1 +|V| Essentially, set p(z ij )=1 for all d i in E(C j )!

End of Detour

GM-Mixture  Model:  Estimation: EM  Annotation:

Gaussian-Multinomial LDA  Model:  Estimation: Variational Bayesian  Annotation: Marginalization

Correspondence LDA  Model:  Estimation: Variational Bayesian  Annotation: Marginalization

Variational EM  General idea:  Using variational approximation to compute a lower bound of the likelihood in the E-step  Procedure:  Initialization  E-step: maximizing variational lower bound, usually involves iterations  M-step: Given variational distribution, estimate model parameters using ML

Experiment Results  Held-out likelihood: Corr-LDA=GM-LDA > GM- mixture  Auto annotation: Corr-LDA>GM-mixture > GM- LDA  Image Retrieval: Corr-LDA>{GM-mixture, GM- LDA}

Summary  A powerful generative model for annotated data (Corr-LDA)  Interesting empirical results