Bayesian Classification

Bayesian Classification
A reference

Copyright, G. A. Tagliarini, PhD
Example1: 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 1 (continued) Overall objective: count the number of people on the beach Intermediate objectives: Reduce the search space Segment the image into three zones (classes) Surf, Beach, and Building 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 1 (continued) Consider a randomly selected pixel x from the image Suppose the a priori probabilities with respect to the three classes are: P(x is in the building area)  0.17 P(x is in the beach area)  0.58 P(x is in the surf area)  0.25 What decision rule minimizes error? 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 1: Suppose additional information regarding a property (such as color, brightness, or variability) of the pixel (or its neighborhood) is available. Can such knowledge aid classification? What is p(the pixel x came from the beach area given the pixel is red), i.e., p(x | red)? 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 1: Consider the hypothetical, regional color distributions h 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 1: The joint probability that a randomly selected pixel is from the beach area and has a hue h, written p(beach, h) = p(h|beach) P(beach) = P(beach|h) p(h) Solving for P(beach|h) we get P(beach|h) = p(h|beach) P(beach) / p(h) p(h) = p(h|building) P(building) p(h|beach) P(beach) + p(h|surf) P(surf) 1. With a priori probabilities consider p(card is red and a king) then p(red and king)=2/52 = p(red|king)*p(king)=2/4 * 4/52 = 1/26 =p(king|red)*p(red) = 2/26 * 26/52 = 2/52. 2. p(h) merely accumulates a weighted average of the occurrences of hue h, since the areas are assume to be independent, which is simply non-overlapping in this example. 2/24/2019 Copyright, G. A. Tagliarini, PhD

A General Formulation 2/24/2019 Copyright, G. A. Tagliarini, PhD

A Casual Formulation The prior probability reflects knowledge of the relative frequency of instances of a class The likelihood is a measure of the probability that a measurement value occurs in a class The evidence is a scaling term 2/24/2019 Copyright, G. A. Tagliarini, PhD

Forming a Classifier Create discriminant functions gi(x) for each class i = 1,…,c Not unique Partition measurement space with crisp boundaries Assign x to class k if gk(x) > gj(x) for all k ≠ j For a minimum error classifier, gi(x)=P(i|x) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Equivalent Discriminants
If f is monotone increasing, the collection hi(x) = f(gi(x)), i = 1,…,c forms an equivalent family of discriminant functions, e.g., 2/24/2019 Copyright, G. A. Tagliarini, PhD

Gaussian Distributions
2/24/2019 Copyright, G. A. Tagliarini, PhD

Gaussian Distributions Details

Discriminants for Normal Density
Recall the classifier functions Assuming the measurements are normally distributed, we have 2/24/2019 Copyright, G. A. Tagliarini, PhD

Some Algebra to Simplify the Discriminants
Since We take the natural logarithm to re-write the first term 2/24/2019 Copyright, G. A. Tagliarini, PhD

Some Algebra to Simplify the Discriminants (continued)

The Discriminants (Finally!!)

Special Case 1: i = 2I 2/24/2019 Copyright, G. A. Tagliarini, PhD

Special Case 1: i = 2I If the classes are equally likely, the discriminants depend only upon the distances to the means A diagonal covariance matrix implies the parameters are statistically independent A constant diagonal implies the class measurements have identical variability in each dimension and hence, they are spherical in d space The discriminant functions define hyperplanes orthogonal to the line segments joining the distribution means 2/24/2019 Copyright, G. A. Tagliarini, PhD

Special Case 1: i = 2I 2/24/2019 Copyright, G. A. Tagliarini, PhD

Special Case 2: i =  2/24/2019 Copyright, G. A. Tagliarini, PhD

Special Case 2: i =  Since may possess nonzero, off-diagonal elements and varying diagonal elements the measurement distributions lie in hyper-ellipsoids The discriminant hyperplanes are often not orthogonal to the segments joining the class means 2/24/2019 Copyright, G. A. Tagliarini, PhD

Special Case 2: i =  The quadratic term is independent of i and may be eliminated. 2/24/2019 Copyright, G. A. Tagliarini, PhD

Case 3: i = arbitrary This is quadratic in x The discriminant decision surfaces can arise from hyperplanes, hyperparabloids, hyperellipsoids, hyperspheres, or combinations of these!!! 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: A Problem Exemplars (transposed) For w1 = {(2, 6), (3, 4), (3, 8), (4, 6)} For w2 = {(1, -2), (3, 0), (3, -4), (5, -2)} Calculated means (transposed) m1 = (3, 6) m2 = (3, -2) 2/24/2019 Copyright, G. A. Tagliarini, PhD

Example 2: Covariance Matrices

Example 2: Inverse and Determinant for Each of the Covariance Matrices

Example 2: A Discriminant Function for Class 1

Example 2: A Discriminant Function for Class 2

Example 2: The Class Boundary

Example 2: A Quadratic Separator

Example 2: Plot of the Discriminant

Summary Steps for Building a Bayesian Classifier
Collect class exemplars Estimate class a priori probabilities Estimate class means Form covariance matrices, find the inverse and determinant for each Form the discriminant function for each class 2/24/2019 Copyright, G. A. Tagliarini, PhD

Using the Classifier Obtain a measurement vector x Evaluate the discriminant function gi(x) for each class i = 1,…,c Decide x is in the class j if gj(x) > gi(x) for all i  j 2/24/2019 Copyright, G. A. Tagliarini, PhD

Bayesian Classification

Similar presentations

Presentation on theme: "Bayesian Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian Classification

Similar presentations

Presentation on theme: "Bayesian Classification"— Presentation transcript:

Similar presentations

About project

Feedback