Discriminant Functions

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Supervised Learning Recap
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
Separating Hyperplanes
Computer vision: models, learning and inference
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.
Linear Discriminant Functions
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Linear Discriminators Chapter 20 From Data to Knowledge.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
SVM by Sequential Minimal Optimization (SMO)
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
Lecture 4 Linear machine
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
Discriminant Analysis
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 09: Discriminant Analysis Objectives: Principal.
D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Large Margin classifiers
LECTURE 11: Advanced Discriminant Analysis
LECTURE 04: DECISION SURFACES
LINEAR DISCRIMINANT FUNCTIONS
Statistical Learning Dong Liu Dept. EEIS, USTC.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Support Vector Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

LECTURE 20: LINEAR DISCRIMINANT FUNCTIONS Objectives: Linear Discriminant Functions Gradient Descent Nonseparable Data Resources: SM: Gradient Descent JD: Optimization Wiki: Stochastic Gradient Descent MJ: Linear Programming

Discriminant Functions Recall our discriminant function for minimum error rate classification: For a multivariate normal distribution: Consider the case: Σi = σ2I (statistical independence, equal variance, class-independent variance)

Gaussian Classifiers The discriminant function can be reduced to: Since these terms are constant w.r.t. the maximization: We can expand this: The term xtx is a constant w.r.t. i, and μitμi is a constant that can be precomputed.

Linear Machines We can use an equivalent linear discriminant function: wi0 is called the threshold or bias for the ith category. A classifier that uses linear discriminant functions is called a linear machine. The decision surfaces defined by the equation:

Linear Discriminant Functions A discriminant function that is a linear combination of the components of x can be written as: In the general case, c discriminant functions for c classes. For the two class case:

Generalized Linear Discriminant Functions Rewrite g(x) as: Add quadratic terms: Generalize to a functional form: For example:

A Gradient Descent Solution Define a cost function, J(a), and minimize: Gradient descent: Approximate J(a) with a Taylor’s series: The optimal learning rate is:

Additional Gradient Descent Approaches Newton Descent: Perceptron Criterion: Relaxation Procedure:

The Ho-Kashyap Procedure Previous algorithms do not converge if the data is nonseparable. If linearly separable, we can define a cost function: If a and b are allowed to vary (with b > 0), the minimum value of Js is zero for separable data. Computing gradients with respect to a and b: Solving for a and b yields the Ho-Kashyap update rule:

Linear Programming A classical linear programming problem can be stated as: Find a vector u = (u1, u2, …, um) that minimizes the linear objective function: u is arbitrarily constrained such that ui > 0. The solution to such an optimization problem is not unique. A range of solutions lie in a convex polytope (an n-dimensional polyhedron). Solutions can be found in polynomial time: O(nk). Useful for problems involving scheduling, asset allocation, or routing. Example: An airline has to assign crews to its flights Make sure each flight is covered. Meet regulations such as the number of hours flown each day. Minimize costs such as fuel, lodging, etc. XOR: x_1, x_2, and we want to transform to x_1^2, x_2^2, x_1 x_2 It can also be viewed as feature extraction from the feature vector x, but now we extract more feature than the number of features in x.

Summary Machine learning in its most elementary form is a constrained optimization problem in which we find a weighting vector. The solution is only as good as the cost function. There are many gradient descent type algorithms that operate using first or second derivatives of a cost function. Convergence of these algorithms can be slow and hence selecting a suitable convergence factor is critical. Nonseparable data poses additional challenges and makes the use of margin- based classifiers critical.