J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Slides:

Advertisements

Similar presentations

Community Detection with Edge Content in Social Media Networks Paper presented by Konstantinos Giannakopoulos.

Advertisements

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller Computer Science Dept. Stanford.

Computer vision: models, learning and inference Chapter 8 Regression.

Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR.

Supervised Learning Recap

Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.

Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Piecewise Bounds for Estimating Bernoulli- Logistic Latent Gaussian Models Mohammad Emtiyaz Khan Joint work with Benjamin Marlin, and Kevin Murphy University.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Selective Transfer Machine for Personalized Facial Action Unit Detection Wen-Sheng Chu, Fernando De la Torre and Jeffery F. Cohn Robotics Institute, Carnegie.

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

Conditional Topic Random Fields Jun Zhu and Eric P. Xing ICML 2010 Presentation and Discussion by Eric Wang January 12, 2011.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

MStruct: A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Lecture 2: Statistical learning primer for biologists

Latent Dirichlet Allocation

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

Eric Xing © Eric CMU, Machine Learning How to put things together ? A case-study of model design, inference, learning, evaluation in.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

Learning From Measurements in Exponential Families Percy Liang, Michael I. Jordan and Dan Klein ICML 2009 Presented by Haojun Chen Images in these slides.

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Semi-Supervised Clustering

Expectation-Maximization (EM)

Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1

Shuang-Hong Yang, Hongyuan Zha, Bao-Gang Hu NIPS2009

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Computer vision: models, learning and inference

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Statistical Learning Dong Liu Dept. EEIS, USTC.

Graph Based Multi-Modality Learning

Learning with information of features

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Stochastic Optimization Maximization for Latent Variable Models

Jointly primal and dual SPARSE Structured I/O Models

Joint Max Margin & Max Entropy Learning of Graphical Models

Michal Rosen-Zvi University of California, Irvine

Topic Models in Text Processing

Hierarchical Relational Models for Document Networks

Parametric Methods Berlin Chen, 2005 References:

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

Primal Sparse Max-Margin Markov Networks

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Discriminative Training

Presentation transcript:

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009 MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009 Presented By Haojun Chen Sources: http://www.cs.cmu.edu/~junzhu/medlda.htm

Outline Motivation Supervised topic model (sLDA) and Support vector regression (SVR) Maximum entropy discrimination LDA (MedLDA) MedLDA for Regression MedLDA for Classification Experiments Results Conclusion

Motivation Learning latent topic models with side information, like sLDA, has attracted increasingly attention. Maximum likelihood estimation are used for posterior inference and parameter estimation in sLDA. Max-margin methods, such as SVM, for classification have demonstrated success in many applications. General principle for learning max-margin discriminative supervised latent topic models for both regression and classification is proposed in this paper.

Supervised Topic Model (sLDA) Joint distribution for sLDA Variational MLE for sLDA

Support Vector Regression (SVR) Given a training set , the linear SVR finds an optimal linear function by solving the following constrained convex optimization problem

Max-Entropy Discrimination LDA (MedLDA) Maximum entropy discrimination LDA (MedLDA): an integration of max-margin prediction models (e.g. SVR and SVM) and hierarchical Bayesian topic models (e.g. LDA and sLDA) Specifically, a distribution is learned in a max-margin manner in MedLDA. MedLDA for regression and classification are considered in this paper.

MedLDA for Regression For regression, MedLDA is defined as an integration of Bayesian sLDA and SVR is the variational approximation for the posterior

EM Algorithm for MedLDA Regression Variational EM Algorithm: The key difference between sLDA and MedLDA lies in updating

MedLDA for Classification Similar to the regression model, the integrated LDA and multi-class classification model is defined as follow: where

EM Algorithm for MedLDA Classification Similar to the EM algorithm for MedLDA regression Update equation for

Embedding Results 20 Newsgroup dataset MedLDA LDA

Example Topics Discovered

Classification Results 20 Newsgroup Data Relative ratio =

Regression Results Movei Review Data

Time Efficiency

Conclusion MedLDA: an integration of max-margin prediction models and hierarchical Bayesian topic models by optimizing a single objective function with a set of expected margin constraints