Variational Inference for the Indian Buffet Process

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
Xiaolong Wang and Daniel Khashabi
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood
CS590M 2008 Fall: Paper Presentation
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Dictionary Learning on a Manifold
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Parameter Expanded Variational Bayesian Methods Yuan (Alan) Qi and Tommi S. Jaakkola, MIT NIPS 2006 Presented by: John Paisley Duke University, ECE 3/13/2009.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Guillaume Bouchard Xerox Research Centre Europe
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
Chapter Two Probability Distributions: Discrete Variables
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Learning the structure of Deep sparse Graphical Model Ryan Prescott Adams Hanna M Wallach Zoubin Ghahramani Presented by Zhengming Xing Some pictures are.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.
Randomized Algorithms for Bayesian Hierarchical Clustering
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.
Stochastic Loss Reserving with the Collective Risk Model Glenn Meyers ISO Innovative Analytics Casualty Loss Reserving Seminar September 18, 2008.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
The Phylogenetic Indian Buffet Process : A Non- Exchangeable Nonparametric Prior for Latent Features By: Kurt T. Miller, Thomas L. Griffiths and Michael.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
BAYCLONE: BAYESIAN NONPARAMETRIC INFERENCE OF TUMOR SUBCLONES USING NGS DATA SUBHAJIT SENGUPTA, JIN WANG, JUHEE LEE, PETER MULLER, KAMALAKAR GULUKOTA,
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Accelerated Sampling for the Indian Buffet Process
Multimodal Learning with Deep Boltzmann Machines
Omiros Papaspiliopoulos and Gareth O. Roberts
Model Averaging with Discrete Bayesian Network Classifiers
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Kernel Stick-Breaking Process
Collapsed Variational Dirichlet Process Mixture Models
Exact and Approximate Sum Representations for the Dirichlet Process
Chinese Restaurant Representation Stick-Breaking Construction
Stochastic Optimization Maximization for Latent Variable Models
Presentation transcript:

Variational Inference for the Indian Buffet Process Finale Doshi-Velez, Kurt T. Miller, Jurgen Van Gael and Yee Whye Teh AISTATS 2009 Presented by: John Paisley, Duke University, Dept. of ECE

Introduction Outline of Presentation This paper provides variational inference equations for the stick-breaking construction of the Indian buffet process (IBP). In addition, bounds are given on truncated stick-breaking approximations of the IBP to the infinite stick-breaking IBP. Outline of Presentation Review of IBP and stick-breaking construction Variational inference for the IBP Truncation error bounds for variational inference Results on a linear-Gaussian model for toy and real data

Indian Buffet Process First customer selects features The ith customer selects feature k with probability , fraction of all customers selecting this feature. The ith customer then selects new features. Below is the probability of the binary matrix Z. The top term is the probability of K dishes, bottom is for permutation.

The Stick-Breaking Construction of the IBP Rather than marginalizing out ~ , being the probability of selecting a dish, a stick-breaking construction can be used.* (Note: The above generative process is written by the presenter. The probability values are presented in the paper in decreasing order as below) This stick-breaking representation is for this specific parameterization of the beta distribution. ~ * Y.W. The, D. Gorur & Z. Ghahramani (2007). Stick-breaking construction for the Indian buffet process. 11th AISTAT.

VB Inference for the Stick-Breaking Construction Focus on inference for the parameters A lower bound approximation needs to be made for one of the terms. This is given at right, where the authors introduce a multinomial distribution, q, and optimize for this parameter (lower right). This is for the likelihood of z, the posterior of v is more complicated. Using this multinomial lower bound, “terms decompose independently for each vm and we get a closed form exponential family update.”

Truncation Error for VB Inference Given a truncation of the stick-breaking construction at level K, how close are we to the infinite model? A bound is given using the same motivation as Ishwaran & James* in their calculation for the Dirichlet process. * H. Ishwaran & L.F. James (2001). Gibbs sampling methods for stick-breaking priors. JASA. After deriving approximations, an upper bound is, At right is a comparison of this bound with an estimation of this value using 1000 Monte Carlo simulations for N = 30, \alpha = 5.

Results: Synthetic Data (lower left) Randomly generated data and calculated the log-likelihoods of test data using the inferred models as a function of time. This indicates that variational inference is both better and faster. (right) More information about speed for toy problem.

Results: Two Real Datasets Yale Faces: 721, 32 x 32 images of 14 people with different expressions and lighting. Speech Data: 245 observations from 10 microphones and 5 speakers At right, we can see that the variational inference methods outperforms and is faster than Gibbs sampling for the Yale Faces Performance and speed is worse for the speech dataset. A reason is that the dataset is only 10 dimensional, while Yale is 1032-D. In this small dimensional case, inference is fast for MCMC and the VB approximation becomes apparent.