Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Linear Classifiers (perceptrons)
An Introduction of Support Vector Machine
Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Discriminative and generative methods for bags of features
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machine
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Decision Theory Naïve Bayes ROC Curves
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Bayesian Learning Rong Jin.
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.
An Introduction to Support Vector Machines Martin Law.
Crash Course on Machine Learning
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Outline Separating Hyperplanes – Separable Case
Classification 2: discriminative models
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
An Introduction to Support Vector Machines (M. Law)
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Linear Models for Classification
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Classification Discriminant Analysis
Classification Discriminant Analysis
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Linear Discrimination
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Review Rong Jin

Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

K Nearest Neighbor (kNN) Approach (k=1) (k=4) Probability interpretation: estimate p(y|x) as

K Nearest Neighbor Approach (KNN)  What is the appropriate size for neighborhood N(x)? Leave one out approach  Weight K nearest neighbor Neighbor is defined through a weight function Estimate p(y|x)  How to estimate the appropriate value for  2 ?

K Nearest Neighbor Approach (KNN)  What is the appropriate size for neighborhood N(x)? Leave one out approach  Weight K nearest neighbor Neighbor is defined through a weight function Estimate p(y|x)  How to estimate the appropriate value for  2 ?

K Nearest Neighbor Approach (KNN)  What is the appropriate size for neighborhood N(x)? Leave one out approach  Weight K nearest neighbor Neighbor is defined through a weight function Estimate p(y|x)  How to estimate the appropriate value for  2 ?

Weighted K Nearest Neighbor  Leave one out + maximum likelihood  Estimate leave one out probability  Leave one out likelihood of training data  Search the optimal  2 by maximizing the leave one out likelihood

Weight K Nearest Neighbor  Leave one out + maximum likelihood  Estimate leave one out probability  Leave one out likelihood of training data  Search the optimal  2 by maximizing the leave one out likelihood

Gaussian Generative Model  p(y|x) ~ p(x|y) p(y): posterior = likelihood  prior  Estimate p(x|y) and p(y) Allocate a separate set of parameters for each class    {  1,  2,…,  c }  p(xly;  )  p(x;  y ) Maximum likelihood estimation

Gaussian Generative Model  p(y|x) ~ p(x|y) p(y): posterior = likelihood  prior  Estimate p(x|y) and p(y) Allocate a separate set of parameters for each class    {  1,  2,…,  c }  p(xly;  )  p(x;  y ) Maximum likelihood estimation

Gaussian Generative Model  Difficult to estimate p(x|y) if x is of high dimensionality Naïve Bayes: Essentially a linear model  How to make a Gaussian generative model discriminative? (  m,  m ) of each class are only based on the data belonging to that class  lack of discriminative power

Gaussian Generative Model  Maximum likelihood estimation How to optimize this objective function?

Gaussian Generative Model  Bound optimization algorithm

Gaussian Generative Model We have decomposed the interaction of parameters between different classes Question: how to handle x with multiple features ?

Logistic Regression Model  A linear decision boundary: w  x+b  A probabilistic model p(y|x)  Maximum likelihood approach for estimating weights w and threshold b

Logistic Regression Model  Overfitting issue  Example: text classification Words that appears in only one document will be assigned with infinite large weight  Solution: regularization Regularization term

 Kernelize logistic regression model Non-linear Logistic Regression Model

 Hierarchical Mixture Expert Model Group linear classifiers into a tree structure Group 1 g 1 (x) m 1,1 (x) Group Layer ExpertLa yer r(x) Group 2 g 2 (x) m 1,2 (x) m 2,1 (x)m 2,2 (x) Products generates nonlinearity in the prediction function

 It could be a rough assumption by assuming all data points can be fitted by a linear model  But, it is usually appropriate to assume a local linear model  KNN can be viewed as a localized model without any parameters Can we extend the KNN approach by introducing a localized linear model? Non-linear Logistic Regression Model

Localized Logistic Regression Model  Similar to the weight KNN Weigh each training example by  Build a logistic regression model using the weighted examples

Localized Logistic Regression Model  Similar to the weight KNN Weigh each training example by  Build a logistic regression model using the weighted examples

Conditional Exponential Model  An extension of logistic regression model to multiple class case  A different set of weights w y and threshold b for each class y  Translation invariance

 Iterative scaling methods for optimization Maximum Entropy Model  Finding the simplest model that matches with the data Maximize Entropy  Prefer uniform distribution Constraints  Enforce the model to be consistent with observed data

Classification Margin Support Vector Machine  Classification margin  Maximum margin principle: Separate data far away from the decision boundary  Two objectives Minimize the classification error over training data Maximize the classification margin  Support vectors Only support vectors have impact on the location of decision boundary denotes +1 denotes -1

Support Vector Machine  Classification margin  Maximum margin principle: Separate data far away from the decision boundary  Two objectives Minimize the classification error over training data Maximize the classification margin  Support vectors Only support vectors have impact on the location of decision boundary denotes +1 denotes -1 Support Vectors

Support Vector Machine  Separable case  Noisy case

Support Vector Machine  Separable case  Noisy case Quadratic programming!

Logistic Regression Model vs. Support Vector Machine  Logistic regression model  Support vector machine Different loss function for punishing mistakes Identical terms

Logistic Regression Model vs. Support Vector Machine Logistic regression differs from support vector machine only in the loss function

Kernel Tricks  Introducing nonlinearity into the discriminative models  Diffusion kernel A graph laplacian L for local similarity Diffusion kernel  Propagate local similarity information into a global one

Fisher Kernel  Derive a kernel function from a generative model  Key idea Map a point x in original input space into the model space The similarity of two data points are measured in the model space Original Input Space Model Space Measure the similarity in the model space

Kernel Methods in Generative Model  Usually, kernels can be introduced to a generative model through a Gaussian process  Define a “kernelized” covariance matrix Positive semi-definitive, similar to Mercer’s condition

Multi-class SVM  SVMs can only handle two-class outputs  One-against-all Learn N SVM’s  SVM 1 learns “Output==1” vs “Output != 1”  SVM 2 learns “Output==2” vs “Output != 2” ::  SVM N learns “Output==N” vs “Output != N”

Error Correct Output Code (ECOC)  Encode each class into a bit vector S 1 S 2 S 3 S 4 x

Ordinal Regression  A special class of multi-class classification problem  There a natural ordinal relationship between multiple classes  Maximum margin principle The computation of margin involves multiple classes ‘good’ ‘OK’ ‘bad’ w’

Ordinal Regression

Decision Tree From slides of Andrew Moore

Decision Tree  A greedy approach for generating a decision tree 1. Choose the most informative feature Using the mutual information measurements 2. Split data set according to the values of the selected feature 3. Recursive until each data item is classified correctly Attributes with real values Quantize the real value into a discrete one

Decision Tree  The overfitting problem  Tree pruning Reduced error pruning Rule post-pruning

Decision Tree  The overfitting problem  Tree pruning Reduced error pruning Rule post-pruning

Generalize Decision Tree +   + a decision tree with simple data partition +   a decision tree using classifiers for data partition   + Each node is a linear classifier Attribute 1 Attribute 2 classifier