Download presentation

Presentation is loading. Please wait.

Published byAntonia Presson Modified about 1 year ago

1
1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek Popuri Department of Chemical and Materials Engineering University of Alberta, Canada.

2
2 CPC group SeminarThursday, June 1, 2006 Optical Character Recognition (OCR) Predict the label of each image using the classification function learned from training OCR is basically a classification task on multivariate data Pixel Values Variables Each type of character Class Objective: To recognise images of Handwritten digits based on classification methods for multivariate data. Introduction

3
3 CPC group SeminarThursday, June 1, 2006 16 x16 (= 256 pixel) Grey Scale images of digits in range 0-9 X i =[x i1, x i2, ……. x i256 ] y i { 0,1,2,3,4,5,6,7,8,9} 9298 labelled samples Training set ~ 1000 images Test set Randomly selected from the full data base Basic idea – Correctly identify the digit given an image x ij 16 Handwritten Digit data

4
4 CPC group SeminarThursday, June 1, 2006 Dimension reduction - PCA PCA done on the mean centered images The eigenvectors of ∑ 256x256 matrix are called the Eigen digits (256 dimensional) The larger an Eigen value the more important is that Eigen digit. The i th PC of an image X is y i =e i ’ X AVERAGE DIGIT

5
5 CPC group SeminarThursday, June 1, 2006 PCA (continued…) Based on the Eigen values first 64 PCs were found to be significantEigen values Variance captured ~ 92.74% Any image represented by its PC: Y= [y 1 y 2 ….....y 64 ] Reduced Data Matrix with 64 variables Y = 1000 x 64 matrix

6
6 CPC group SeminarThursday, June 1, 2006

7
7 CPC group SeminarThursday, June 1, 2006 Interpreting the PCs as Image Features The Eigen vectors are the rotation of the original axes to more meaningful directions. The PCs are the projection of the data onto each of these new axes. Image Reconstruction : Image Reconstruction : The original image can be reconstructed by projecting the PCs back to old axes. Using the most significant PC will give a reconstructed image that is close to original image. These features can be used for carrying out further investigations e.g. Classification!!

8
8 CPC group SeminarThursday, June 1, 2006 Image Reconstruction Mean Centered Image: I=(X-X mean ) PC as Features: y i = e i ’ I Y= [y 1, y 2,…….. y 64 ]’ = E’I where E=[e 1 e 2 …. e 64 ] Reconstruction: X recon = E*Y + X mean

9
9 CPC group SeminarThursday, June 1, 2006 Normality test on PCs

10
10 CPC group SeminarThursday, June 1, 2006 Classification Principle Components used as features of images LDA assuming multivariate normality of the feature groups and common covariance Fisher discriminant procedure which assumes only common covariance

11
11 CPC group SeminarThursday, June 1, 2006 Classification (contd..) Equal cost of misclassification Misclassification error rate: APER based on training data AER on the validation data Error rate using different number of PCs were compared Averaged over several random sampling of training and validation data from the full data set.

12
12 CPC group SeminarThursday, June 1, 2006 Performing LDA Prior probabilities of each class were taken as the frequency of that class in data. Equivalence of covariance matrix Strong Assumption Error rates used to check validity of assumption S pooled used for covariance matrix

13
13 CPC group SeminarThursday, June 1, 2006 LDA Results APER AER APER underestimates the AER Using 64 PCs is better than using 150/256 PCs! The PCs with lower Eigen values tend to capture the noise in the data. No of PCs APER % No of PCs AER %

14
14 CPC group SeminarThursday, June 1, 2006 Fisher Discriminants Uses equal prior probabilities, covariances. No of discriminants can be r <= 9 When all discriminants are used Fischer equivalent to LDA (verified by error rates) i.e. when r=9 Error rates with different r compared

15
15 CPC group SeminarThursday, June 1, 2006 Fisher Discriminant Results r=2 discriminants APER AER Both AER and APER are very high No of PCs APER % No of PCs AER %454240

16
16 CPC group SeminarThursday, June 1, 2006 Fisher Discriminant Results r=7 discriminants APER AER Considerable improvement in AER and APER Performance is close to LDA Using 64 PCs is better No of PCs APER % No of PCs AER %

17
17 CPC group SeminarThursday, June 1, 2006 Fisher Discriminant Results r=9(all) discriminants APER AER No significant performance gain from r=7 Error rates are ~ LDA (as expected!) No of PCs APER % No of PCs AER %

18
18 CPC group SeminarThursday, June 1, 2006 Nearest Neighbour Classifier No assumption about distribution of data Euclidean distance to find nearest neighbour Test point assigned to Class 2 Class 2 Class 1 Finds the nearest neighbours from the training set to test image and assigns its label to test image.

19
19 CPC group SeminarThursday, June 1, 2006 K-Nearest Neighbour Classifier (KNN) Compute the k nearest neighbours and assign the class by majority vote. k = 3 Test point assigned to Class 1 Class 2 ( 1 vote ) Class 1 ( 2 votes )

20
20 CPC group SeminarThursday, June 1, NN Classification Results: No of PCs AER % Test error rates have improved compared to LDA and Fisher Using 64 PCs gives better results Using higher k’s does not show improvement in recognition rate

21
21 CPC group SeminarThursday, June 1, 2006 Misclassification in NN: Misclassification in NN: Euclidean distances between transformed images of same class can be very high

22
22 CPC group SeminarThursday, June 1, 2006 Misclassification in NN:

23
23 CPC group SeminarThursday, June 1, 2006 Issues in NN: Expensive: To determine the nearest neighbour of a test image, must compute the distance to all N training examples Storage Requirements: Must store all training data

24
24 CPC group SeminarThursday, June 1, 2006 Euclidean-NN method inefficient Store all possible instances (positions, sizes, angles, thickness, writing styles…), this is impractical.

25
25 CPC group SeminarThursday, June 1, 2006 Euclidean distance metric fails Pattern to be classifiedPrototype APrototype B Prototype B seems more similar than Prototype A according to Euclidean distance. Digit “9” misclassified as “4”. Possible solution is to use an distance metric invariant to irrelevant transformations.

26
26 CPC group SeminarThursday, June 1, 2006 Effect of a Transformation X s (X, α) X + α. Pixel Space S X = { y | there exists α for which y = s (X, α) }

27
27 CPC group SeminarThursday, June 1, 2006 Tangent Distance P E Euclidean distance between P and E Tangent distance Distance between S P and S E SPSP SESE

28
28 CPC group SeminarThursday, June 1, 2006 Images in tangent plane Rotation Scaling Thickness X Translation Diag. Deformation Axis Deformation Y Translation

29
29 CPC group SeminarThursday, June 1, 2006 Implementation The vectors tangent to the manifold S X form the hyper plane T X tangent to S X. The Tangent distance D(E,P) is found by minimizing distance between T E and T P. The images are smoothed with a gaussian σ = 1.

30
30 CPC group SeminarThursday, June 1, 2006 The Equations of T P and T E are given by where Implementation (Contd…)

31
31 CPC group SeminarThursday, June 1, 2006 Implementation (Contd…) Solving for α P and α E we can calculate D(E,P) the Tangent Distance between two patterns E and P.

32
32 CPC group SeminarThursday, June 1, 2006 Tangent Distance method Results USPS data set,1000 training examples and 7000 test examples. The misclassification error rate using 3-NN is 3.26 %. The time taken is sec.

33
33 CPC group SeminarThursday, June 1, 2006 References: “The Elements of Statistical Learning- Data Mining, Inference and Prediction” by Trevor Hastie, Robert Tibshirani, Jerome Friedman “Applied Multivariate Statistical Analysis” by Richard A. Johnson, Dean W. Wichern. “Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent propagation” by Patrice Y. Simard, Yann A. Le Cun.

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google