Stick-Breaking Constructions

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

N 3 = 7 Input (s) Output X(s) s1s1 s2s2 s3s3 sPsP m = Number of Input Points (=3) n i = Number of Outputs at Input s i X (i) = Set of Outputs X j (i) at.

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei

Teg Grenager NLP Group Lunch February 24, 2005

Xiaolong Wang and Daniel Khashabi

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.

Course: Neural Networks, Instructor: Professor L.Behera.

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Hierarchical Dirichlet Process (HDP)

Image Modeling & Segmentation

Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.

Hierarchical Dirichlet Processes

Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,

DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood

HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Use of moment generating functions. Definition Let X denote a random variable with probability density function f(x) if continuous (probability mass function.

Entropy Rates of a Stochastic Process

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.

Visual Recognition Tutorial

Latent Dirichlet Allocation a generative model for text

Nonparametric Bayesian Learning

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.

Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.

Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.

Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream (UAI 2010) Amr Ahmed and Eric.

Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Variational Inference for the Indian Buffet Process

Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.

1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.

SAMPLING DISTRIBUTIONS

Latent Dirichlet Allocation

Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Duke University Machine Learning Group Presented by Kai Ni August.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.

Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.

Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.

The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Random Variables By: 1.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

Non-Parametric Models

Statistical Models for Automatic Speech Recognition

A Non-Parametric Bayesian Method for Inferring Hidden Causes

Kernel Stick-Breaking Process

Bayesian Models in Machine Learning

Multitask Learning Using Dirichlet Process

Generalized Spatial Dirichlet Process Models

LECTURE 09: BAYESIAN LEARNING

Presentation transcript:

Stick-Breaking Constructions Patrick Dallaire June 10th, 2011

Outline Introduction of the Stick-Breaking process

Outline Introduction of the Stick-Breaking process Presentation of fundamental representation

Outline Introduction of the Stick-Breaking process Presentation of fundamental representation The Dirichlet process The Pitman-Yor process The Indian buffet process

Outline Introduction of the Stick-Breaking process Presentation of fundamental representation The Dirichlet process The Pitman-Yor process The Indian buffet process Definition of the Beta process

Outline Introduction of the Stick-Breaking process Presentation of fundamental representation The Dirichlet process The Pitman-Yor process The Indian buffet process Definition of the Beta process A Stick-Breaking construction of Beta process

Outline Introduction of the Stick-Breaking process Presentation of fundamental representation The Dirichlet process The Pitman-Yor process The Indian buffet process Definition of the Beta process A Stick-Breaking construction of Beta process Conclusion and current work

The Stick-Breaking process

The Stick-Breaking process Assume a stick of unit length

The Stick-Breaking process Assume a stick of unit length

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut

The Stick-Breaking process Assume a stick of unit length At each iteration, a part of the remaining stick is broken by sampling the proportion to cut How should we sample these proportions?

Beta random proportions Let be the proportion to cut at iteration

Beta random proportions Let be the proportion to cut at iteration The remaining length can be expressed as

Beta random proportions Let be the proportion to cut at iteration The remaining length can be expressed as Thus, the broken part is defined by

Beta random proportions Let be the proportion to cut at iteration The remaining length can be expressed as Thus, the broken part is defined by We first consider the case where

Beta distribution The Beta distribution is a density function on Parameters and control its shape

The Dirichlet process

The Dirichlet process Dirichlet processes are often used to produce infinite mixture models

The Dirichlet process Dirichlet processes are often used to produce infinite mixture models Each observation belongs to one of the infinitely many components

The Dirichlet process Dirichlet processes are often used to produce infinite mixture models Each observation belongs to one of the infinitely many components The model ensures that only a finite number of components have appreciable weight

The Dirichlet process A Dirichlet process, , can be constructed according to a Stick-Breaking process Where is the base distribution and is a unit mass at

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

Construction demo

The Pitman-Yor process

The Pitman-Yor process A Pitman-Yor process, , can be constructed according to a Stick-Breaking process Where and

Evolution of the Beta cuts The parameter controls the speed at which the Beta distribution changes

Evolution of the Beta cuts The parameter controls the speed at which the Beta distribution changes The parameter determines initial shapes of the Beta distribution

Evolution of the Beta cuts The parameter controls the speed at which the Beta distribution changes The parameter determines initial shapes of the Beta distribution When , there is no changes over time and its called a Dirichlet process

Evolution of the Beta cuts The parameter controls the speed at which the Beta distribution changes The parameter determines initial shapes of the Beta distribution When , there is no changes over time and its called a Dirichlet process MATLAB DEMO

The Indian Buffet process

The Indian Buffet process The Indian Buffet process was initially used to represent latent features

The Indian Buffet process The Indian Buffet process was initially used to represent latent features Observations are generated according to a set of unknown hidden features

The Indian Buffet process The Indian Buffet process was initially used to represent latent features Observations are generated according to a set of unknown hidden features The model ensure that only a finite number of features have appreciable probability

The Indian Buffet process Recall the basic Stick-Breaking process

The Indian Buffet process Recall the basic Stick-Breaking process

The Indian Buffet process Recall the basic Stick-Breaking process Here, we only consider the remaining parts

The Indian Buffet process Recall the basic Stick-Breaking process Here, we only consider the remaining parts

The Indian Buffet process Recall the basic Stick-Breaking process Here, we only consider the remaining parts Each value corresponds to a feature probability of appearance

Summary

Summary The Dirichlet process induces a probability over infinitely many classes

Summary The Dirichlet process induces a probability over infinitely many classes This is the underlying de Finetti mixing distribution of the Chinese restaurant process

De Finetti theorem It states that the distribution of any infinitely exchangeable sequence can be written where is the de Finetti mixing distribution

Summary The Dirichlet process induces a probability over infinitely many classes This is the underlying de Finetti mixing distribution of the Chinese restaurant process The Indian Buffet process induces a probability over infinitely many features

Summary The Dirichlet process induces a probability over infinitely many classes This is the underlying de Finetti mixing distribution of the Chinese restaurant process The Indian Buffet process induces a probability over infinitely many features Its underlying de Finetti mixing distribution is the Beta process

The Beta process

The Beta process This process

Beta with Stick-Breaking The Beta distribution has a Stick-Breaking representation which allows to sample from

Beta with Stick-Breaking The Beta distribution has a Stick-Breaking representation which allows to sample from The construction is

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking

Beta with Stick-Breaking The Beta distribution has a Stick-Breaking representation which allows to sample from The construction is

The Beta process A Beta process is defined as as , and is a Beta process

Stick-Breaking the Beta process The Stick-Breaking construction of the Beta process is such that

Stick-Breaking the Beta process Expending the first terms

Conclusion We briefly described various Stick-Breaking constructions for Bayesian nonparametric priors These constructions help to understand the properties of each process It also unveils connections among existing priors The Stick-Breaking process might help to construct new priors

Current work Applying a Stick-Breaking process to select the number of support points in a Gaussian process Defining a stochastic process for unbounded random directed acyclic graph Finding its underlying Stick-Breaking representation