ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Pattern Recognition and Machine Learning
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Visual Recognition Tutorial
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Principles of Pattern Recognition
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Discriminant Functions
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
URL:.../publications/courses/ece_8443/lectures/current/exam/2004/ ECE 8443 – Pattern Recognition LECTURE 15: EXAM NO. 1 (CHAP. 2) Spring 2004 Solutions:
1 E. Fatemizadeh Statistical Pattern Recognition.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 12 Sept 30, 2005 Nanjing University of Science & Technology.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 8 Sept 23, 2005 Nanjing University of Science & Technology.
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
Discriminant Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS Objectives: Typical Examples Performance Bounds ROC Curves Resources: D.H.S.: Chapter 2 (Part.
Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Recognition and Machine Learning
Bayesian Classification
Mathematical Foundations of BME
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
LECTURE 05: THRESHOLD DECODING
LECTURE 11: Exam No. 1 Review
Presentation transcript:

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening Transformations Linear Discriminants ROC Curves Resources : D.H.S: Chapter 2 (Part 3) K.F.: Intro to PR X. Z.: PR Course A.N. : Gaussian Discriminants E.M. : Linear Discriminants M.R.: Chernoff Bounds S.S.: Bhattacharyya T.T. : ROC Curves NIST: DET Curves D.H.S: Chapter 2 (Part 3) K.F.: Intro to PR X. Z.: PR Course A.N. : Gaussian Discriminants E.M. : Linear Discriminants M.R.: Chernoff Bounds S.S.: Bhattacharyya T.T. : ROC Curves NIST: DET Curves

ECE 8527: Lecture 02, Slide 1 Decision rule:  For an observation x, decide ω 1 if P(ω 1 |x) > P(ω 2 |x); otherwise, decide ω 2 Probability of error: The average probability of error is given by: If for every x we ensure that is as small as possible, then the integral is as small as possible. Thus, Bayes decision rule minimizes. Bayes Decision Rule

ECE 8527: Lecture 02, Slide 2 The evidence,, is a scale factor that assures conditional probabilities sum to 1: We can eliminate the scale factor (which appears on both sides of the equation): Special cases:  : x gives us no useful information.  : decision is based entirely on the likelihood. Evidence

ECE 8527: Lecture 04, Slide 3 Recall our discriminant function for minimum error rate classification: For a multivariate normal distribution: Consider the case: Σ i = σ 2 I (statistical independence, equal variance, class-independent variance) Discriminant Functions

ECE 8527: Lecture 04, Slide 4 The discriminant function can be reduced to: Since these terms are constant w.r.t. the maximization: We can expand this: The term x t x is a constant w.r.t. i, and μ i t μ i is a constant that can be precomputed. Gaussian Classifiers

ECE 8527: Lecture 04, Slide 5 We can use an equivalent linear discriminant function: w i0 is called the threshold or bias for the i th category. A classifier that uses linear discriminant functions is called a linear machine. The decision surfaces defined by the equation: Linear Machines

ECE 8527: Lecture 04, Slide 6 This has a simple geometric interpretation: The decision region when the priors are equal and the support regions are spherical is simply halfway between the means (Euclidean distance). Threshold Decoding

ECE 8527: Lecture 04, Slide 7 General Case for Gaussian Classifiers

ECE 8527: Lecture 04, Slide 8 Case: Σ i = σ 2 I This can be rewritten as: Identity Covariance

ECE 8527: Lecture 04, Slide 9 Case: Σ i = Σ Equal Covariances

ECE 8527: Lecture 04, Slide 10 Arbitrary Covariances

ECE 8527: Lecture 04, Slide 11 Typical Examples of 2D Classifiers

ECE 8527: Lecture 04, Slide 12 How do we compare two decision rules if they require different thresholds for optimum performance? Consider four probabilities: Receiver Operating Characteristic (ROC)

ECE 8527: Lecture 04, Slide 13 An ROC curve is typically monotonic but not symmetric: One system can be considered superior to another only if its ROC curve lies above the competing system for the operating region of interest. General ROC Curves

ECE 8527: Lecture 04, Slide 14 Summary Decision Surfaces: geometric interpretation of a Bayesian classifier. Gaussian Distributions: how is the shape of the distribution influenced by the mean and covariance? Bayesian classifiers for Gaussian distributions: how does the decision surface change as a function of the mean and covariance? Gaussian Distributions: how is the shape of the decision region influenced by the mean and covariance? Bounds on performance (i.e., Chernoff, Bhattacharyya) are useful abstractions for obtaining closed-form solutions to problems. A Receiver Operating Characteristic (ROC) curve is a very useful way to analyze performance and select operating points for systems. Discrete features can be handled in a way completely analogous to continuous features.

ECE 8527: Lecture 04, Slide 15 Bayes decision rule guarantees lowest average error rate Closed-form solution for two-class Gaussian distributions Full calculation for high dimensional space difficult Bounds provide a way to get insight into a problem and engineer better solutions. Need the following inequality: Assume a > b without loss of generality: min[a,b] = b. Also, a β b (1- β) = (a/b) β b and (a/b) β > 1. Therefore, b < (a/b) β b, which implies min[a,b] < a β b (1- β). Apply to our standard expression for P(error). Error Bounds

ECE 8527: Lecture 04, Slide 16 Recall: Note that this integral is over the entire feature space, not the decision regions (which makes it simpler). If the conditional probabilities are normal, this expression can be simplified. Chernoff Bound

ECE 8527: Lecture 04, Slide 17 If the conditional probabilities are normal, our bound can be evaluated analytically: where: Procedure: find the value of β that minimizes exp(-k(β ), and then compute P(error) using the bound. Benefit: one-dimensional optimization using β Chernoff Bound for Normal Densities

ECE 8527: Lecture 04, Slide 18 The Chernoff bound is loose for extreme values The Bhattacharyya bound can be derived by β = 0.5: where: These bounds can still be used if the distributions are not Gaussian (why? hint: Occam’s Razor). However, they might not be adequately tight.Occam’s Razor Bhattacharyya Bound