Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Slides:



Advertisements
Similar presentations
Support Vector Machines
Advertisements

Separating Hyperplanes
Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.
Linear Discriminant Functions
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
PatReco: Bayes Classifier and Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Lecture 10: Support Vector Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
This week: overview on pattern recognition (related to machine learning)
Support Vector Machine (SVM) Based on Nello Cristianini presentation
Discriminant Functions
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Lecture 4 Linear machine
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Linear Models for Classification
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Computational Intelligence: Methods and Applications Lecture 21 Linear discrimination, linear machines Włodzisław Duch Dept. of Informatics, UMK Google:
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Recognition and Image Analysis Dr. Manal Helal – Fall 2014 Lecture 7 Linear Classifiers.
PatReco: Introduction Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
CH 5: Multivariate Methods
LINEAR DISCRIMINANT FUNCTIONS
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Linear machines 28/02/2017.
Minimax Probability Machine (MPM)
CSSE463: Image Recognition Day 14
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
Linear Discrimination
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Discriminant Functions  Main Idea: Describe parametrically the decision boundary (instead of the properties of the class), e.g., the two classes are separated by a straight line a x 1 + b x 2 + c = 0, with parameters (a,b,c) (instead of the feature PDFs are 2-D Gaussians)

Example: Two classes, two features a x 1 + b x 2 + c = 0 x1x1 x2x2 w1w1 w2w2 x1x1 x2x2 w1w1 w2w2  11  22  12  21 N(  1,  1 ) N(  2,  2 ) Model Class BoundaryModel Class Characteristics

Duality  Dualism Parametric class description  Bayes classifier  Decision boundary  Parametric Discriminant Functions  For example modeling class features by Gaussians with same (across-class) variance results in hyper-plane discriminant functions

Discriminant Functions  Discriminant functions g i (x) are functions of the features x of a class i  A sample x is classified to class c for which g i (x) is maximized, i.e., c = argmax i {g i (x)}  The function g i (x) = g j (x) defines class boundaries for each pair of (different) classes i and j

Linear Discriminant Functions  Two class problem: A single discriminant function is defined as: g(x) = g 1 (x) – g 2 (x)  If g(x) is a linear function g(x) = w T x + w 0 then the boundary is a hyper-plane (point, line, plane for 1-D, 2-D, 3-D features respectively)

Linear Discriminant Functions a x 1 + b x 2 + c = 0 x1x1 x2x2 w = (a,b) -c/b -c/a

Non Linear Discriminant Functions  Quadratic discriminant functions g(x) = w 0 +  i w i x i +  ij w ij x i x j for examples for a two class 2-D problem g(x) = a + b x 1 + c x 2 + d x 1 2  Any non-linear discriminant function can become linear by increasing the dimensionality, e.g., y 1 = x 1, y 2 = x 2, y 3 = x 1 2 (2D nonlinear  3D linear) g(y) = a + b y 1 + c y 2 + d y 3

Parameter Estimation  The parameters w are estimated by functional minimization  The function to be minimized J models the average distance of training samples from the decision boundary for either Misclassifier training samples All training samples  The function J is minimized using gradient descent

Gradient Descent  Iterative procedure towards a local minimum a(k+1) = a(k) – n(k)  J(a(k)) where k is the iteration number, n(k) is the learning rate and  J(a(k)) is the gradient of the function to be minimized evaluated at a(k)  Newton descent is the gradient descent with learning rate equal to the inverse Hessian matrix

Distance Functions  Perceptron Criterion Function J p (a) =  misclassified ( - a T y)  Relaxation With Margin b J r (a) =  misclassified (a T y - b) 2 / ||y|| 2  Least Mean square (LMS) J s (a) =  all samples (a T y i - b i ) 2  Ho-Kashyap rule J s (a,b) =  all samples (a T y i - b i ) 2

Discriminant Functions  Working on misclassified samples only (Perceptron, Relaxation with Margin) provides better results but converges only for separable training sets

High Dimensionality  Using non-linear discriminant functions and linearizing them in a high dimensional space can make ANY training set separable large # of parameters (curse of dimensionality)  Support vector machines: A smart way to select appropriate terms (dimensions) is needed