Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Artificial Neural Networks
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Separating Hyperplanes
Artificial Neural Networks
Linear Discriminant Functions
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Support Vector Machines (and Kernel Methods in general)
Chapter 5: Linear Discriminant Functions
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Principles of Pattern Recognition
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Discriminant Functions
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Linear Classification with Perceptrons
Lecture 4 Linear machine
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Linear Discriminant Functions  Discriminant Functions  Least Squares Method  Fisher’s Linear Discriminant  Probabilistic Generative Models.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
LECTURE 03: DECISION SURFACES
School of Computer Science & Engineering
LINEAR DISCRIMINANT FUNCTIONS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
Backpropagation.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Backpropagation.
Linear Discrimination
Perceptron Learning Rule
Perceptron Learning Rule
Perceptron Learning Rule
Presentation transcript:

Linear Discriminant Functions Wen-Hung Liao, 11/25/2008

Introduction: LDF Assume we know the proper form of the discriminant functions, instead of the underlying probability densities. Use samples to estimate the parameters of the classifier.(statistical or non-statistical) Will be concerned with discriminant functions that are either linear in the components of x, or linear in some given set of functions of x.

Why LDF? Simplicity vs. accuracy Attractive candidates for initial, trial classifiers Related to neural networks

Approach Find the LDF by minimizing a criterion function. Use gradient descent procedure for minimization Convergence property Computational complexities Example of criterion function: Sample risk, or training error. (Not appropriate, why?) Because a small training error does not guarantee a small test error.

LDF and Decision Surfaces A linear discriminant function: where w : weight vector w 0 : bias or threshold

Two-Category Case Decision rule: Decide w 1 if g(x) > 0, decide w 2 if g(x)<0 In other words, x is assigned to w 1 if the inner product w t x exceeds the threshold – w 0.

Decision Boundary A hyperplane H defined by g(x)=0 If x1 and x2 are both on the decision surface, then: w is normal to any vector lying on the hyperplane.

Distance Measure For any x, where x p is the normal projection of x onto H, and r is the algebraic distance.

Multi-category Case General case: c-1 2-class c(c-1)/2 linear discriminant

Use c linear discriminants

Distance Measure w i -w j is normal to H ij. Distance for x to H ij is given by:

Quadratic DF Add terms involving products of pairs of component of x to obtain the quadratic discriminant function: The separating surface defined by g(x)=0 is a hyperquadric function.

Hyperquadric Surfaces If W=[w ij ] is not singular, then the linear terms in g(x) can be eliminated by translating the axes. Define a scale matrix: Hypersphere Hyperellipsoid Hyperperboloid

Generalized LDF Polynomial discriminant functions Generalized LDF:

Augment Vectors Augment feature vector: Augment weight vector: Mapping a d-dimensional x-space to (d+1)-dimensional y-space

2-Category Separable Case Look for a weight vector that classifies all of the samples correctly. If such a weight does exist, then the samples are said to be linearly separable.

Gradient Descent Procedure Define a criterion function J(a) that is minimized if a is a solution vector. Step 1: Randomly pick a(1), and compute the gradient vector: Step 2: a(2) is obtained by moving some distance from a(1) in the direction of the steepest descent.

Setting the Learning Rate Second-order expansion of J(a): Substituting Minimized when

Newton Descent For nonsingular H Converges faster but more difficult to compute per step.

Perceptron Criterion Function where Y(a) is the set of samples misclassified by a. Since Update rule:

Convergence Proof Refer to page 229 to 232 of textbook.