Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Recognition PhD Course.

Similar presentations


Presentation on theme: "Pattern Recognition PhD Course."— Presentation transcript:

1 Pattern Recognition PhD Course

2 Automatic Letter Recognition
Steps for the letter recognition: 1. Creating a training set:   - Separating characters from text   - Creating the feature vector for the separated character  2. Identify the unidentified characters using the training set

3 the training set ?

4 Pattern Recognition X p-dimesional random vector, the feature vector
Y discrete variable, the classification, RY={1,2,…,M} decision function g: ℝ 𝑝 ⟶ 𝑅 𝑌 If g 𝑋 ≠𝑌 then the decision makes error

5 In the formulation of the Bayes decision problem, introduce a cost function
𝐶(𝑦,𝑦′)≥0 which is the cost if the label Y = y and the decision g( 𝑋 ) = y’ . For a decision function g, the risk is the expectation of the cost: 𝑅 𝑔 =𝐄(C(y),g( 𝑋 )) In Bayes decision problem, the aim is to minimize the risk, i.e., the goal is to find a function 𝑔 ∗ : ℝ 𝑝 → 1,2,…,𝑀 such that 𝑅 𝑔 ∗ = min 𝑔: ℝ 𝑝 ⟶ 1,2,…𝑀 𝑅(𝑔) where 𝑔 ∗ is called the Bayes decision function, and 𝑅 ∗ =𝑅( 𝑔 ∗ ) is the Bayes risk

6 For the posteriori probabilities, introduce the notations:
𝑃 𝑦 𝑋 =𝐏(Y=y| 𝑋 ) Let the decision function 𝑔 ∗ be defined by 𝑔 ∗ 𝑋 = arg min 𝑦′ 𝑦=1 𝑀 𝐶(𝑦,𝑦′ ) 𝑃 𝑦 ( 𝑋 ) If arg min is not unique then choose the smallest y’ , which minimizes the sum. This definition implies that for any decision function g, 𝑦=1 𝑀 𝐶(𝑦, 𝑔 ∗ 𝑋 ) 𝑃 𝑦 ( 𝑋 )≤ 𝑦=1 𝑀 𝐶(𝑦, 𝑔 𝑋 ) 𝑃 𝑦 ( 𝑋 )

7 Theorem For any decision function g, we have that
Proof. For a decision function g, let’s calculate the risk. =𝐄 𝑦=1 𝑀 𝑦′=1 𝑀 𝐶(𝑦,𝑦′ )𝐏 𝑌=𝑦,𝑔 𝑋 = 𝑦 ′ | 𝑋 𝑅 𝑔 =𝐄 𝐶(𝑌,𝑔( 𝑋 ) =𝐄 𝐶(𝑌,𝑔( 𝑋 )| 𝑿 =𝐄 𝑦=1 𝑀 𝑦′=1 𝑀 𝐶(𝑦,𝑦′ ) 𝐼 𝑔 𝑋 = 𝑦 ′ 𝐏 𝑌=𝑦| 𝑋 =𝐄 𝑦=1 𝑀 𝐶(𝑦,𝑔 𝑋 ) 𝑃 𝑦 ( 𝑋 ) This implies that 𝑅 𝑔 =𝐄 𝑦=1 𝑀 𝐶(𝑦,𝑔 𝑋 ) 𝑃 𝑦 ( 𝑋 ) ≥ ≥𝐄 𝑦=1 𝑀 𝐶 𝑦, 𝑔 ∗ 𝑋 𝑃 𝑦 𝑋 =R( 𝑔 ∗ )

8 Concerning the cost function, the most frequently studied example is the so called 0 − 1 loss:
𝐶 𝑦, 𝑦 ′ = 1 𝑖𝑓 𝑦≠𝑦′ 0 𝑖𝑓 𝑦=𝑦′ For the 0 − 1 loss, the corresponding risk is the error probability: 𝑅 𝑔 =𝐄 𝐶(𝑌,𝑔( 𝑋 ) =𝐄 𝐼 𝑌≠𝑔( 𝑋 ) =𝐏 𝑌≠𝑔( 𝑋 ) , and the Bayes decision is of form 𝑔 ∗ 𝑋 = arg min 𝑦′ 𝑦≠𝑦′ 𝑃 𝑦 𝑋 = arg max 𝑦′ 𝑃 𝑦′ 𝑋 which is called maximum posteriori decision, too.

9 If the distribution of the observation vector 𝑋 has density, then the Bayes decision has an equivalent formulation. Introduce the notations for density of 𝑋 by 𝐏 𝑋 𝜖𝐵 = 𝐵 𝑓 𝑥 𝑑 𝑥 and for the conditional densities by 𝑃 𝑋 𝜖𝐵|𝑌=𝑦 = 𝐵 𝑓 𝑥 𝑑 𝑥 and for a priori probabilities 𝑞 𝑦 =𝑃 𝑌=𝑦 then it is easy to check that 𝑃 𝑦 𝑌=𝑦| 𝑋 = 𝑥 = 𝑞 𝑦 𝑓 𝑦 𝑥 𝑓 𝑥

10 and therefore From the proof of Theorem we may derive a formula for the optimal risk:

11 If 𝑋 has density then For the 0 − 1 loss, we get that which has the form, for densities,

12 Multivariate Normal Distribution

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38 Linear Combinations

39 MVN Properties

40

41

42

43

44

45

46

47

48 Discriminant Analysis (DA)

49

50

51

52 That is, in multivariate normal case, we can reach the minimal risk!

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68 A goodness-of-fit parameter, Wilks’ lambda, is defined as follows:
where λj is the jth eigenvalue corresponding to the eigenvector described above and m is the minimum of C-1 and p. Wilks' Lambda Test Wilks' Lambda test is to test which variable contribute significance in discriminant function. The closer Wilks' lambda is to 0, the more the variable contributes to the discriminant function. The table also provide a Chi-Square statsitic to test the significance of Wilk's Lambda. If the e-value if less than 0.05, we can conclude that the corresponding function explain the group membership well.


Download ppt "Pattern Recognition PhD Course."

Similar presentations


Ads by Google