Pattern Recognition PhD Course.

Pattern Recognition PhD Course

Automatic Letter Recognition
Steps for the letter recognition: 1. Creating a training set: - Separating characters from text - Creating the feature vector for the separated character 2. Identify the unidentified characters using the training set

the training set ?

Pattern Recognition X p-dimesional random vector, the feature vector
Y discrete variable, the classification, RY={1,2,…,M} decision function g: ℝ 𝑝 ⟶ 𝑅 𝑌 If g 𝑋 ≠𝑌 then the decision makes error

In the formulation of the Bayes decision problem, introduce a cost function
𝐶(𝑦,𝑦′)≥0 which is the cost if the label Y = y and the decision g( 𝑋 ) = y’ . For a decision function g, the risk is the expectation of the cost: 𝑅 𝑔 =𝐄(C(y),g( 𝑋 )) In Bayes decision problem, the aim is to minimize the risk, i.e., the goal is to find a function 𝑔 ∗ : ℝ 𝑝 → 1,2,…,𝑀 such that 𝑅 𝑔 ∗ = min 𝑔: ℝ 𝑝 ⟶ 1,2,…𝑀 𝑅(𝑔) where 𝑔 ∗ is called the Bayes decision function, and 𝑅 ∗ =𝑅( 𝑔 ∗ ) is the Bayes risk

For the posteriori probabilities, introduce the notations:
𝑃 𝑦 𝑋 =𝐏(Y=y| 𝑋 ) Let the decision function 𝑔 ∗ be defined by 𝑔 ∗ 𝑋 = arg min 𝑦′ 𝑦=1 𝑀 𝐶(𝑦,𝑦′ ) 𝑃 𝑦 ( 𝑋 ) If arg min is not unique then choose the smallest y’ , which minimizes the sum. This definition implies that for any decision function g, 𝑦=1 𝑀 𝐶(𝑦, 𝑔 ∗ 𝑋 ) 𝑃 𝑦 ( 𝑋 )≤ 𝑦=1 𝑀 𝐶(𝑦, 𝑔 𝑋 ) 𝑃 𝑦 ( 𝑋 )

Theorem For any decision function g, we have that
Proof. For a decision function g, let’s calculate the risk. =𝐄 𝑦=1 𝑀 𝑦′=1 𝑀 𝐶(𝑦,𝑦′ )𝐏 𝑌=𝑦,𝑔 𝑋 = 𝑦 ′ | 𝑋 𝑅 𝑔 =𝐄 𝐶(𝑌,𝑔( 𝑋 ) =𝐄 𝐶(𝑌,𝑔( 𝑋 )| 𝑿 =𝐄 𝑦=1 𝑀 𝑦′=1 𝑀 𝐶(𝑦,𝑦′ ) 𝐼 𝑔 𝑋 = 𝑦 ′ 𝐏 𝑌=𝑦| 𝑋 =𝐄 𝑦=1 𝑀 𝐶(𝑦,𝑔 𝑋 ) 𝑃 𝑦 ( 𝑋 ) This implies that 𝑅 𝑔 =𝐄 𝑦=1 𝑀 𝐶(𝑦,𝑔 𝑋 ) 𝑃 𝑦 ( 𝑋 ) ≥ ≥𝐄 𝑦=1 𝑀 𝐶 𝑦, 𝑔 ∗ 𝑋 𝑃 𝑦 𝑋 =R( 𝑔 ∗ )

Concerning the cost function, the most frequently studied example is the so called 0 − 1 loss:
𝐶 𝑦, 𝑦 ′ = 1 𝑖𝑓 𝑦≠𝑦′ 0 𝑖𝑓 𝑦=𝑦′ For the 0 − 1 loss, the corresponding risk is the error probability: 𝑅 𝑔 =𝐄 𝐶(𝑌,𝑔( 𝑋 ) =𝐄 𝐼 𝑌≠𝑔( 𝑋 ) =𝐏 𝑌≠𝑔( 𝑋 ) , and the Bayes decision is of form 𝑔 ∗ 𝑋 = arg min 𝑦′ 𝑦≠𝑦′ 𝑃 𝑦 𝑋 = arg max 𝑦′ 𝑃 𝑦′ 𝑋 which is called maximum posteriori decision, too.

If the distribution of the observation vector 𝑋 has density, then the Bayes decision has an equivalent formulation. Introduce the notations for density of 𝑋 by 𝐏 𝑋 𝜖𝐵 = 𝐵 𝑓 𝑥 𝑑 𝑥 and for the conditional densities by 𝑃 𝑋 𝜖𝐵|𝑌=𝑦 = 𝐵 𝑓 𝑥 𝑑 𝑥 and for a priori probabilities 𝑞 𝑦 =𝑃 𝑌=𝑦 then it is easy to check that 𝑃 𝑦 𝑌=𝑦| 𝑋 = 𝑥 = 𝑞 𝑦 𝑓 𝑦 𝑥 𝑓 𝑥

and therefore From the proof of Theorem we may derive a formula for the optimal risk:

If 𝑋 has density then For the 0 − 1 loss, we get that which has the form, for densities,

Multivariate Normal Distribution

Linear Combinations

MVN Properties

Discriminant Analysis (DA)

That is, in multivariate normal case, we can reach the minimal risk!

A goodness-of-fit parameter, Wilks’ lambda, is defined as follows:
where λj is the jth eigenvalue corresponding to the eigenvector described above and m is the minimum of C-1 and p. Wilks' Lambda Test Wilks' Lambda test is to test which variable contribute significance in discriminant function. The closer Wilks' lambda is to 0, the more the variable contributes to the discriminant function. The table also provide a Chi-Square statsitic to test the significance of Wilk's Lambda. If the e-value if less than 0.05, we can conclude that the corresponding function explain the group membership well.

Pattern Recognition PhD Course.

Similar presentations

Presentation on theme: "Pattern Recognition PhD Course."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pattern Recognition PhD Course.

Similar presentations

Presentation on theme: "Pattern Recognition PhD Course."— Presentation transcript:

Similar presentations

About project

Feedback