The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009.

Slides:



Advertisements
Similar presentations
Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Advertisements

Xiaolong Wang and Daniel Khashabi
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes
Hierarchical Dirichlet Process (HDP)
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
CS590M 2008 Fall: Paper Presentation
Bayesian Factor Regression Models in the “Large p, Small n” Paradigm Mike West, Duke University Presented by: John Paisley Duke University.
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Dictionary Learning on a Manifold
Learning Scalable Discriminative Dictionaries with Sample Relatedness a.k.a. “Infinite Attributes” Jiashi Feng, Stefanie Jegelka, Shuicheng Yan, Trevor.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Beam Sampling for the Infinite Hidden Markov Model Van Gael, et al. ICML 2008 Presented by Daniel Johnson.
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Nonparametric Bayes and human cognition Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Nonparametric Bayesian Learning
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Amos Storkey, School of Informatics. Density Traversal Clustering and Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Hierarchical Bayesian Nonparametrics with Applications Michael I. Jordan University of California, Berkeley Acknowledgments: Emily Fox, Erik Sudderth,
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Learning the structure of Deep sparse Graphical Model Ryan Prescott Adams Hanna M Wallach Zoubin Ghahramani Presented by Zhengming Xing Some pictures are.
Fast Max–Margin Matrix Factorization with Data Augmentation Minjie Xu, Jun Zhu & Bo Zhang Tsinghua University.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
Integrating Topics and Syntax -Thomas L
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Randomized Algorithms for Bayesian Hierarchical Clustering
Variational Inference for the Indian Buffet Process
Hierarchical Dirichlet Process and Infinite Hidden Markov Model Duke University Machine Learning Group Presented by Kai Ni February 17, 2006 Paper by Y.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Learning to Detect Events with Markov-Modulated Poisson Processes Ihler, Hutchins and Smyth (2007)
1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation Frank Wood and Yee Whye Teh AISTATS 2009 Presented by: Mingyuan.
Statistical Models for Partial Membership Katherine Heller Gatsby Computational Neuroscience Unit, UCL Sinead Williamson and Zoubin Ghahramani University.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
Stick-breaking Construction for the Indian Buffet Process Duke University Machine Learning Group Presented by Kai Ni July 27, 2007 Yee Whye The, Dilan.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
The Phylogenetic Indian Buffet Process : A Non- Exchangeable Nonparametric Prior for Latent Features By: Kurt T. Miller, Thomas L. Griffiths and Michael.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Latent Feature Models for Network Data over Time Jimmy Foulds Advisor: Padhraic Smyth (Thanks also to Arthur Asuncion and Chris Dubois)
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
Variational Bayes Model Selection for Mixture Distribution
Accelerated Sampling for the Indian Buffet Process
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Nonparametric Latent Feature Models for Link Prediction
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Presentation transcript:

The Infinite Hierarchical Factor Regression Model Piyush Rai and Hal Daume III NIPS 2008 Presented by Bo Chen March 26, 2009

Outline Introduction The Infinite Hierarchical Factor Regression Model Indian Buffet Process and Beta Process Experiment Summary

Introduction The latent factor representation benefits: 1. Discovering the latent process underlying the data 2. Simpler predictive modeling through a compact data representation. Large P, Small N. N>=10 · d · C The fundamental advantages over standard FA model: 1. not assume known number of factors; 2. not assume factors are independent; 3. not assume all features are relevant to the factor analysis.

Algorithm Model :

Graphical Model T is used to eliminate the spurious genes or noise features. So T p determines whether the p-th customer will enter restaurant to eat any dish.

Indian Buffet Process --from latent classes to latent features For a finite feature model: (Tom Griffiths, 2006) Indian restaurant with countably many infinite dishes

Differences between DP and IBP DP class matrix IBP ‘class’ matrix 1. Latent feature 2. Clustering 3. others Different styles match different problems.

Two-Parameter Finite Model  the first customer samples Poisson( ) dishes  the i-th customer samples a previously sampled dish with probability then samples new dishes (Z. Ghahramani et. al., 2006)

Beta Process V.S. IBP Beta Process:  the first customer samples Poisson( ) dishes  the i-th customer samples a previously sampled dish with probability then samples new dishes

Hierarchical Factor Prior Kingman’s Coalescent It is a distribution over the genealogy of a countably infinite set of individuals. Construct tree structure Brownian diffusion A Markov process which encodes message (mean and covariance) in each node of the above tree. Y. W. Teh, H. Daume III, and D. M. Roy. Bayesian Agglomerative Clustering with Coalescents. In NIPS, 2008.

Feature Selection Prior Some genes are spurious Before selecting dishes, these ‘spurious’ customers should leave the restaurant.

Provided by Piyush Rai

Experimental results E-coli data: 100 samples 50 genes 8 underlying factors Breast cancer data: 251 samples 226 genes 5 underlying factors 1.The hierarchy can be used to find factors in order of their prominence. 2.Hierarchical modeling results in better predictive performance for the factor regression task. 3.The factor hierarchy leads to faster convergence since most of the unlikely configurations will never be visited as they are constrained by the hierarchy.

The Comparison of Factor Loading Matrice Learned from Different Methods Ground TruthNIPS Method Sparse BPFA on Factor loading VB Sparse BPFA on Factor score VB

Factor Regression Training and test data are combined together and test responses are treated as missing values to be imputed.

The Existing Similar FA Models Putting binary matrix on factor score matrix David Knowles and Zoubin Ghahramani. Infinite Sparse Factor Analysis and Infinite Independent Components Analysis, ICA 2007 John Paisley et. al., Nonparametric Factor Analysis with Beta Process Priors, in submission Summary: 1. For ‘large P, small N’ problems, the first one is faster to learn the small factor score matrix with KxN. Considering MCMC solution, it is difficult for the second one to handle the problem with tens of thousands of genes. 2. The second one can give an explanation to the relationship between gene and factor (pathway). Putting binary matrix on factor loading matrix Piyush Rai and Hal Daume III. The Infinite Hierarchical Factor Regression Model, NIPS 2008.

The New Developments of IBP F. Doshi, K. T. Miller, J. Van Gael and Y.W. Teh, Variational Inference for the Indian Buffet Process, AISTATS Jurgen Van Gael, Yee Whye Teh, Zoubin Ghahramani, The Infinite Factorial Hidden Markov Model, NIPS K. A. Heller and Zoubin Ghahramani, A Nonparametric Bayesian Approach to Modeling Overlapping Clusters, AISTATS 2007.