Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 5: Linear Discriminant Functions
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
1 E. Fatemizadeh Statistical Pattern Recognition.
Bayesian Decision Theory (Classification) 主講人:虞台文.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Discriminant Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Lecture 2. Bayesian Decision Theory
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Probability theory retro
LECTURE 03: DECISION SURFACES
CH 5: Multivariate Methods
Pattern Classification, Chapter 3
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
ECE 417 Lecture 4: Multivariate Gaussians
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Generally Discriminant Analysis
Bayesian Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 Pattern Classification, Chapter 2 (Part 2)

Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density

Minimum-Error-Rate Classification Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i  j Seek a decision rule that minimizes the probability of error which is the error rate Pattern Classification, Chapter 2 (Part 2)

Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error”  Pattern Classification, Chapter 2 (Part 2)

Minimize the risk requires maximize P(i | x) (since R(i | x) = 1 – P(i | x)) For Minimum error rate Decide i if P (i | x) > P(j | x) j  i Pattern Classification, Chapter 2 (Part 2)

Regions of decision and zero-one loss function, therefore: If  is the zero-one loss function wich means: Pattern Classification, Chapter 2 (Part 2)

Pattern Classification, Chapter 2 (Part 2)

Classifiers, Discriminant Functions and Decision Surfaces The multi-category case Set of discriminant functions gi(x), i = 1,…, c The classifier assigns a feature vector x to class i if: gi(x) > gj(x) j  i Pattern Classification, Chapter 2 (Part 2)

Pattern Classification, Chapter 2 (Part 2)

(max. discriminant corresponds to min. risk!) Let gi(x) = - R(i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate, we take gi(x) = P(i | x) (max. discrimination corresponds to max. posterior!) gi(x)  P(x | i) P(i) gi(x) = ln P(x | i) + ln P(i) (ln: natural logarithm!) Pattern Classification, Chapter 2 (Part 2)

Feature space divided into c decision regions if gi(x) > gj(x) j  i then x is in Ri (Ri means assign x to i) The two-category case A classifier is a “dichotomizer” that has two discriminant functions g1 and g2 Let g(x)  g1(x) – g2(x) Decide 1 if g(x) > 0 ; Otherwise decide 2 Pattern Classification, Chapter 2 (Part 2)

The computation of g(x) Pattern Classification, Chapter 2 (Part 2)

Pattern Classification, Chapter 2 (Part 2)

The Normal Density Univariate density Where: Density which is analytically tractable Continuous density A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Where:  = mean (or expected value) of x 2 = expected squared deviation or variance Pattern Classification, Chapter 2 (Part 2)

Pattern Classification, Chapter 2 (Part 2)

Multivariate density where: Multivariate normal density in d dimensions is: where: x = (x1, x2, …, xd)t (t stands for the transpose vector form)  = (1, 2, …, d)t mean vector  = d*d covariance matrix || and -1 are determinant and inverse respectively Pattern Classification, Chapter 2 (Part 2)

For simplicity, we often abbreviate Eq. 37 as p(x) ∼ N(μ,Σ) For simplicity, we often abbreviate Eq. 37 as p(x) ∼ N(μ,Σ). Formally, we have μ ≡ E[x] = xp(x) dx And In other words, if xi is the ith component of x, μi the ith component of μ, and σij the ijth component of Σ, then Pattern Classification, Chapter 2 (Part 2)

If and A is a d-by-k matrix and is a k-component vector, then Σ is positive definite, so that the determinant of Σ is strictly positive. The diagonal elements σii are the variances of the respective xi and the off-diagonal elements σij are the covariances of xi and covariance xj . We would expect a positive covariance for the length and weight features of a population of fish, for instance. If xi and xj are statistically independent, σij = 0. Linear combinations of jointly normally distributed random variables, independent or not, are normally distributed. If and A is a d-by-k matrix and is a k-component vector, then Pattern Classification, Chapter 2 (Part 2)

It is sometimes convenient to perform a coordinate transformation that converts an arbitrary multivariate normal distribution into a spherical one, i.e., one having a covariance matrix proportional to the identity matrix I. If we define Φ to be the matrix whose columns are the orthonormal eigenvectors of Σ, and Λ the diagonal matrix of the corresponding eigenvalues, then the transformation applied to the coordinates insures that the transformed distribution has covariance matrix equal to the identity matrix. In signal processing, the transform Aw is called a whitening transformation. Pattern Classification, Chapter 2 (Part 2)

Multivariate Gaussian function The loci of points of constant density are hyperellipsoids for which the quadratic form is constant. The principal axes of these hyperellipsoids are given by the eigenvectors of Σ . The quantity is called the squared Mahalanobis distance from x to μ. The volume of these hyperellipsoids measures the scatter of the samples about the mean.