Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Models for Classification

Similar presentations


Presentation on theme: "Linear Models for Classification"— Presentation transcript:

1 Linear Models for Classification
Berkay Topçu

2 Linear Models for Classification
Goal: Take an input vector and assign it to one of K classes (Ck where k=1,...,K) Linear separation of classes

3 Generalized Linear Models
We wish to predict discrete class labels, or more generally class posterior probabilities that lies in range (0,1). Classification model as a linear function of the parameters, Classification directly in the original input space , or a fixed nonlinear transformation of the input variables using a vector of basis functions

4 Discriminant Functions
Linear discriminants If , assign to class C1 and to class C2 otherwise Decision boundary is given by determines the orientation of the decision surface and determines the location Compact notation:

5 Multiple Classes K-class discriminant by combining number of two-class discriminant functions (K>2) One-versus-the-rest: seperating points in one particular class Ck from points not in that class One-versus-one: K(K-1)/2 binary discriminant functions

6 Multiple Classes A single K-class discriminant comprising K linear functions Assign to class Ck if for all How to learn the parameters of linear discriminant functions?

7 Least Squares for Classification
Each class Ck is described by its own linear model Training data set for n =1,...,N where Matrix whose nth row is the vector and whose nth row is

8 Least Squares for Classification
Minimizing the sum-of-squares error function Solution : Discriminant function :

9 Fisher’s Linear Discriminant
Dimensionality reduction: take the D-dimensional input vector and project to one dimension using Projection that maximizes class seperation Two-class problem: N1 points of C1 and N2 points of C2 Fisher’s idea: large separation between the projected class means small variance within each class, minimizing class overlap

10 Fisher’s Linear Discriminant
The Fisher criterion:

11 Fisher’s Linear Discriminant
For the two-class problem, Fisher criterion is a special case of least squares (reference : Penalized Discriminant Analysis – Hastie, Buja and Tibshirani) For multiple classes: The weights values are determined by the eigenvectors that corresponds to K highest eigenvalues of

12 The Perceptron Algorithm
Input vector is transformed using a nonlinear transformation Perceptron criterion: For all training samples We need to minimize

13 The Perceptron Algorithm – Stocastic Gradient Descent
Cycle through the training patterns in turn If the pattern is correctly classified weight vectors remains unchanged, else:

14 Probabilistic Generative Models
Depend on simple assumptions about the distribution of the data Logistic sigmoid function Maps the whole real axis to a finite interval

15 Continuous Inputs - Gaussian
Assuming the class-conditional densities are Gaussian Case of two classes

16 Maximum Likelihood Solution
Likelihood function: Maximizing log-likelihood

17 Probabilistic Discriminative Models
Probabilistic generative model Number of parameters grows quadratically with M (# dim.) However has M adjustable parameters Maximum likelihood solution for Logistic Regression Energy function: negative log likelihood

18 Iterative Reweighted Least Squares
Newton-Raphson iterative optimization on linear regression Same as the standard least-squares solution

19 Iterative Reweighted Least Squares
Newton-Raphson update for negative log likelihood Weighted least-squares problem

20 Maximum Margin Classifiers
Support Vector Machines for two-class problem Assuming linearly seperable data set There exists at least one set of variables satisfies That give the smallest generalization error Margin: the smallest distance between decision boundary and any of the samples

21 Support Vector Machines
Optimization of parameters, maximizing the margin Maximizing the margin minimizing : subject to the constraint: Introduction of Lagrange multipliers

22 Support Vector Machines - Lagrange Multipliers
Minimizing with respect to w and b and maximizing with respect to a. The dual form: Quadratic programming problem:

23 Support Vector Machines
Overlapping class distributions (linearly unseparable data) Slack variable: distance from the boundary To maximize the margin while penalizing points that lie on the wrong side of the margin boundary

24 SVM-Overlapping Class Distributions
Identical to separable case Again represents a quadratic programming problem

25 Support Vector Machines
Relation to logistic regression Hinge loss used in SVM and the error function of logistic regression approximate the ideal misclassification error(MCE) Black : MCE, Blue: Hinge Loss, Red: Logistic Regression, Green: Squared Error


Download ppt "Linear Models for Classification"

Similar presentations


Ads by Google