1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CS188: Computational Models of Human Behavior
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Probabilistic models Haixu Tang School of Informatics.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Supervised Learning Recap
What is Statistical Modeling
Laboratory for Social & Neural Systems Research (SNS) PATTERN RECOGNITION AND MACHINE LEARNING Institute of Empirical Research in Economics (IEW)
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Pattern Recognition and Machine Learning
Conditional Random Fields
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Today Logistic Regression Decision Trees Redux Graphical Models
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Latent Variable Models Christopher M. Bishop. 1. Density Modeling A standard approach: parametric models  a number of adaptive parameters  Gaussian.
1 Machine Learning Overview Adapted from Sargur N. Srihari University at Buffalo, State University of New York USA.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
CS Statistical Machine learning Lecture 24
Lecture 2: Statistical learning primer for biologists
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
NTU & MSRA Ming-Feng Tsai
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Introduction to Machine Learning Nir Ailon Lecture 11: Probabilistic Models.
CH 5: Multivariate Methods
Statistical Models for Automatic Speech Recognition
Distributions and Concepts in Probability Theory
Statistical Models for Automatic Speech Recognition
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012

2 ML as Searching Hypotheses Space ML Methodologies are increasingly statistical –Rule-based expert systems being replaced by probabilistic generative models –Example: Autonomous agents in AI –Greater availability of data and computational power to migrate away from rule-based and manually specified models to probabilistic data-driven modes MethodHypothesis Space Concept learning Boolean expressions Decision treesAll possible trees Neural Networks Weight space Transfer learning Different spaces

3 Generative and Discriminative Models An example task: determining the language that someone is speaking Generative approach: –is to learn each language and determine as to which language the speech belongs. Discriminative approach: –is determine the linguistic differences without learning any language.

4 Generative and Discriminative Models Generative Methods –Model class-conditional pdfs and prior probabilities –“Generative” since sampling can generate synthetic data points –Popular models Gaussians, Naïve Bayes, Mixtures of multinomials Mixtures of Gaussians, Mixtures of experts, Hidden Markov Models (HMM) Sigmoid belief networks, Bayesian networks, Markov random fields Discriminative Methods –Directly estimate posterior probabilities –No attempt to model underlying probability distributions –Focus computational resources on given task– better performance –Popular models Logistic regression, SVMs Traditional neural networks, Nearest neighbor Conditional Random Fields (CRF)

5 Generative and Discriminative Pairs Data point-based –Naïve Bayes and Logistic Regression form a generative-discriminative pair for classification Sequence-based –HMMs and linear-chain CRFs for sequential data

6 Graphical Model Relationship

7 Generative Classifier: Naïve Bayes Given variables x=(x 1,..,x M ) and class variable y Joint pdf is p(x,y) –Called generative model since we can generate more samples artificially Given a full joint pdf we can –Marginalize –Condition –By conditioning the joint pdf we form a classifier Computational problem: –If x is binary then we need 2 M values –If 100 samples are needed to estimate a given probability, M=10, and there are two classes then we need 2048 samples

8 Naive Bayes Classifier

9 Discriminative Classifier: Logistic Regression Binary logistic regression: How to fit w for logistic regression model? i.e., Logistic or sigmoid function Then we can obtain the log likelihood

10 Logistic Regression vs. Bayes Classifier Posterior probability of class variable y is In a generative model we estimate the class- conditionals (which are used to determine a) In the discriminative approach we directly estimate a as a linear function of x i.e., a = w T x

11 Logistic Regression Parameters For M-dimensional feature space logistic regression has M parameters w=(w 1,..,w M ) By contrast, generative approach –by fitting Gaussian class-conditional densities will result in 2M parameters for means, M(M+1)/2 parameters for shared covariance matrix, and one for class prior p(y=1 ) –Which can be reduced to O(M) parameters by assuming independence via Naïve Bayes

12 Summary Generative and Discriminative methods are two basic approaches in machine learning –former involve modeling, latter directly solve classification Generative and Discriminative Method Pairs –Naïve Bayes and Logistic Regression are a corresponding pair for classification –HMM and CRF are a corresponding pair for sequential data Generative models are more elegant, have explanatory power Discriminative models perform better in language related tasks

13 Thanks! Jie Tang, DCST