Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Teg Grenager NLP Group Lunch February 24, 2005
Xiaolong Wang and Daniel Khashabi
Course: Neural Networks, Instructor: Professor L.Behera.
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Nonparametric hidden Markov models Jurgen Van Gael and Zoubin Ghahramani.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
Dynamic Bayesian Networks (DBNs)
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Entropy Rates of a Stochastic Process
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Inferring Data Inter-Relationships Via Fast Hierarchical Models Lawrence Carin Duke University
Nonparametric Bayesian Learning
Unsupervised Group Discovery in Relational Datasets: A nonparametric Bayesian Approach P.S. Koutsourelakis School of Civil and Environmental Engineering.
Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
Motivation Parametric models can capture a bounded amount of information from the data. Real data is complex and therefore parametric assumptions is wrong.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Discovering Deformable Motifs in Time Series Data Jin Chen CSE Fall 1.
Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
Randomized Algorithms for Bayesian Hierarchical Clustering
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
LECTURE 17 THURSDAY, 22 OCTOBER STA 291 Fall
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
Storylines from Streaming Text The Infinite Topic Cluster Model Amr Ahmed, Jake Eisenstein, Qirong Ho Alex Smola, Choon Hui Teo, Eric Xing Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Latent Dirichlet Allocation
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.
Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Completely Random Measures for Bayesian Nonparametrics Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Hidden Markov Models BMI/CS 576
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Nonparametric Bayesian Learning of Switching Dynamical Processes
Variational Bayes Model Selection for Mixture Distribution
Dirichlet process tutorial
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Topic Models in Text Processing
Presentation transcript:

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y. W. Teh, M. I. Jordan, M. J. Beal & D. M. Blei, NIPS 2004

Outline Motivation Dirichlet Processes (DP) Hierarchical Dirichlet Processes (HDP) Infinite Hidden Markov Model (iHMM) Results & Conclusions

Motivation Problem – “multi-task learning” in which the “tasks” are clustering problems. Goal – Share clusters among multiple, related clustering problems. The number of clusters are open-ended and inferred automatically by the model. Application –Genome pattern analysis –Information retrieval of corpus

Hierarchical Model A single clustering problem can be analyzed as a Dirichlet process (DP). – –Draws G from DP are discrete, generally not distinct. For J groups, we consider G j for j=1~J is a group-specific DP. To share information, we link the group-specific DPs – If G(τ) is continuous, the draws G j have no atoms in common with probability one. –HDP solution: G 0 is itself a draw from a DP( , H)

Dirichlet Process & Hierarchical Dirichlet Process Three different perspectives –Stick-breaking –Chinese restaurant –Infinite mixture models Setup Properties of DP –

Stick-breaking View A mathematical explicit form of DP. Draws from DP are discrete. In DP In HDP

DP – Chinese Restaurant Process Exhibit clustering property Φ 1,…,Φ i-1, i.i.d., r.v., distributed according to G; Ө 1,…, Ө K to be the distinct values taken on by Φ 1,…,Φ i-1, n k be # of Φ i ’= Ө k, 0<i’<i,

HDP – Chinese Restaurant Franchise First level: within each group, DP mixture – –Φ j1, …,Φ j(i-1), i.i.d., r.v., distributed according to G j ; Ѱ j1, …, Ѱ jT j to be the values taken on by Φ j1, …,Φ j(i-1), n jk be # of Φ ji ’ = Ѱ jt, 0<i ’ <i. Second level: across group, sharing clusters –Base measure of each group is a draw from DP: –Ө 1, …, Ө K to be the values taken on by Ѱ j1, …, Ѱ jT j, m k be # of Ѱ jt =Ө k, all j, t.

HDP – CRF graph The values of  are shared between groups, as well as within groups. This is a key property of HDP. Integrating out G 0

DP Mixture Model One of the most important application of DP: nonparametric prior distribution on the components of a mixture model. G can be looked as an infinite mixture model.

HDP can be used as the prior distribution over the factors for nested group data. We consider a two-level DPs. G 0 links the child G j DPs and forces them to share components. G j is conditionally independent given G 0 HDP mixture model

Infinite Hidden Markov Model The number of hidden states is allowed to be countably infinite. The transition probabilities given in the i th row of the transition matrix A can be interpreted as mixing proportions  = (a i1, a i2, …, a ik, … ) Thus each row of the A in HMM is a DP. Also these DPs must be linked, because they should have same set of “next states”. HDP provides the natural framework for the infinite HMM.

iHMM via HDP Assign observations to groups, where the groups are indexed by the value of the previous state variable in the sequence. Then the current state and emission distribution define a group-specific mixture model. Multiple iHMMs can be linked by adding an additional level of Bayesian hierarchy, letting a master DP couple each of the iHMM, each of which is a set of DPs.

HDP & iHMM HDP (CRF aspect)iHMM GroupRestaurant J (fixed)By S i-1 (random) DataCustomer x ji yiyi Hidden factor Table  ji =  k, k=1~  Dish  k ~ H S i = k, k=1~  B ( S i, : ) DP weights Popularity  jk, k=1~  A ( S i-1, : ) Likelihood F(x ji |  ji ) B ( S i, y i )

Non-trivialities in iHMM HDP assumes a fixed partition of the data into groups while HMM is for time-series data, and the definition of groups is itself random. Consider CRF aspect of HDP, the number of restaurant is infinite. Also in the sampling scheme, changing s t may affect all subsequent data assignment. CRF is natural to describe the iHMM, however it is awkward for sampling. We need to use sampling algorithm from other respects for the iHMM.

HDP Results

iHMM Results

Conclusion HDP is a hierarchical, nonparametric model for clustering problems involving multiple groups of data. The mixture components are shared across groups and the appropriate number is determined by HDP automatically. HDP can be extended to infinite HMM model, providing effective inference algorithm.

Reference Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei, “Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes”, NIPS Beal, M.J., Ghahramani, Z. and Rasmussen, C.E., “The Infinite Hidden Markov Model”, NIPS 2002 Y.W. Teh, M.I. Jordan, M.J. Beal and D.M. Blei, “Hierarchical Dirichlet Processes”, Revised version to appear in JASA, 2006.