1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek.

Slides:



Advertisements
Similar presentations
Face Recognition Sumitha Balasuriya.
Advertisements

Component Analysis (Review)
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Machine Learning Lecture 8 Data Processing and Representation
Dimension reduction (1)
Face Recognition and Biometric Systems
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Dimensional reduction, PCA
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Face Recognition Jeremy Wyatt.
Face Recognition Using Eigenfaces
Ch. 10: Linear Discriminant Analysis (LDA) based on slides from
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Summarized by Soo-Jin Kim
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Basics of Neural Networks Neural Network Topologies.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CSE 185 Introduction to Computer Vision Face Recognition.
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
CSSE463: Image Recognition Day 27 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Discriminant Analysis
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis (PCA)
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Descriptive Statistics The means for all but the C 3 features exhibit a significant difference between both classes. On the other hand, the variances for.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Principal Component Analysis (PCA)
Dimensionality Reduction
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Face Recognition and Feature Subspaces
Principal Component Analysis (PCA)
Machine Learning Dimensionality Reduction
Classification Discriminant Analysis
Principal Component Analysis
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Classification Discriminant Analysis
Principal Component Analysis
Cheng-Yi, Chuang (莊成毅), b99
Dimensionality Reduction
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

1 CPC group SeminarThursday, June 1, 2006 Classification techniques for Hand-Written Digit Recognition Venkat Raghavan N. S., Saneej B. C., and Karteek Popuri Department of Chemical and Materials Engineering University of Alberta, Canada.

2 CPC group SeminarThursday, June 1, 2006  Optical Character Recognition (OCR) Predict the label of each image using the classification function learned from training  OCR is basically a classification task on multivariate data Pixel Values  Variables Each type of character  Class Objective: To recognise images of Handwritten digits based on classification methods for multivariate data. Introduction

3 CPC group SeminarThursday, June 1, 2006  16 x16 (= 256 pixel) Grey Scale images of digits in range 0-9 X i =[x i1, x i2, ……. x i256 ] y i { 0,1,2,3,4,5,6,7,8,9}  9298 labelled samples Training set ~ 1000 images Test set Randomly selected from the full data base  Basic idea – Correctly identify the digit given an image x ij 16 Handwritten Digit data

4 CPC group SeminarThursday, June 1, 2006 Dimension reduction - PCA  PCA done on the mean centered images  The eigenvectors of ∑ 256x256 matrix are called the Eigen digits (256 dimensional)  The larger an Eigen value the more important is that Eigen digit.  The i th PC of an image X is y i =e i ’ X AVERAGE DIGIT

5 CPC group SeminarThursday, June 1, 2006 PCA (continued…)  Based on the Eigen values first 64 PCs were found to be significantEigen values  Variance captured ~ 92.74%  Any image represented by its PC: Y= [y 1 y 2 ….....y 64 ]  Reduced Data Matrix with 64 variables Y = 1000 x 64 matrix

6 CPC group SeminarThursday, June 1, 2006

7 CPC group SeminarThursday, June 1, 2006 Interpreting the PCs as Image Features  The Eigen vectors are the rotation of the original axes to more meaningful directions. The PCs are the projection of the data onto each of these new axes.  Image Reconstruction : Image Reconstruction : The original image can be reconstructed by projecting the PCs back to old axes. Using the most significant PC will give a reconstructed image that is close to original image. These features can be used for carrying out further investigations e.g. Classification!!

8 CPC group SeminarThursday, June 1, 2006 Image Reconstruction  Mean Centered Image: I=(X-X mean )  PC as Features: y i = e i ’ I Y= [y 1, y 2,…….. y 64 ]’ = E’I where E=[e 1 e 2 …. e 64 ]  Reconstruction: X recon = E*Y + X mean

9 CPC group SeminarThursday, June 1, 2006 Normality test on PCs

10 CPC group SeminarThursday, June 1, 2006 Classification  Principle Components used as features of images  LDA assuming multivariate normality of the feature groups and common covariance  Fisher discriminant procedure which assumes only common covariance

11 CPC group SeminarThursday, June 1, 2006 Classification (contd..)  Equal cost of misclassification  Misclassification error rate: APER based on training data AER on the validation data  Error rate using different number of PCs were compared Averaged over several random sampling of training and validation data from the full data set.

12 CPC group SeminarThursday, June 1, 2006 Performing LDA  Prior probabilities of each class were taken as the frequency of that class in data.  Equivalence of covariance matrix Strong Assumption Error rates used to check validity of assumption S pooled used for covariance matrix

13 CPC group SeminarThursday, June 1, 2006 LDA Results  APER  AER  APER underestimates the AER  Using 64 PCs is better than using 150/256 PCs! The PCs with lower Eigen values tend to capture the noise in the data. No of PCs APER % No of PCs AER %

14 CPC group SeminarThursday, June 1, 2006 Fisher Discriminants  Uses equal prior probabilities, covariances.  No of discriminants can be r <= 9 When all discriminants are used Fischer equivalent to LDA (verified by error rates) i.e. when r=9  Error rates with different r compared

15 CPC group SeminarThursday, June 1, 2006 Fisher Discriminant Results r=2 discriminants APER AER  Both AER and APER are very high No of PCs APER % No of PCs AER %454240

16 CPC group SeminarThursday, June 1, 2006 Fisher Discriminant Results r=7 discriminants APER AER  Considerable improvement in AER and APER  Performance is close to LDA  Using 64 PCs is better No of PCs APER % No of PCs AER %

17 CPC group SeminarThursday, June 1, 2006 Fisher Discriminant Results r=9(all) discriminants APER AER  No significant performance gain from r=7  Error rates are ~ LDA (as expected!) No of PCs APER % No of PCs AER %

18 CPC group SeminarThursday, June 1, 2006 Nearest Neighbour Classifier  No assumption about distribution of data  Euclidean distance to find nearest neighbour Test point assigned to Class 2 Class 2 Class 1  Finds the nearest neighbours from the training set to test image and assigns its label to test image.

19 CPC group SeminarThursday, June 1, 2006 K-Nearest Neighbour Classifier (KNN)  Compute the k nearest neighbours and assign the class by majority vote. k = 3 Test point assigned to Class 1 Class 2 ( 1 vote ) Class 1 ( 2 votes )

20 CPC group SeminarThursday, June 1, NN Classification Results: No of PCs AER %  Test error rates have improved compared to LDA and Fisher  Using 64 PCs gives better results  Using higher k’s does not show improvement in recognition rate

21 CPC group SeminarThursday, June 1, 2006 Misclassification in NN: Misclassification in NN:  Euclidean distances between transformed images of same class can be very high

22 CPC group SeminarThursday, June 1, 2006 Misclassification in NN:

23 CPC group SeminarThursday, June 1, 2006 Issues in NN:  Expensive: To determine the nearest neighbour of a test image, must compute the distance to all N training examples  Storage Requirements: Must store all training data

24 CPC group SeminarThursday, June 1, 2006 Euclidean-NN method inefficient  Store all possible instances (positions, sizes, angles, thickness, writing styles…), this is impractical.

25 CPC group SeminarThursday, June 1, 2006 Euclidean distance metric fails Pattern to be classifiedPrototype APrototype B  Prototype B seems more similar than Prototype A according to Euclidean distance.  Digit “9” misclassified as “4”.  Possible solution is to use an distance metric invariant to irrelevant transformations.

26 CPC group SeminarThursday, June 1, 2006 Effect of a Transformation X s (X, α) X + α. Pixel Space S X = { y | there exists α for which y = s (X, α) }

27 CPC group SeminarThursday, June 1, 2006 Tangent Distance P E Euclidean distance between P and E Tangent distance Distance between S P and S E SPSP SESE

28 CPC group SeminarThursday, June 1, 2006 Images in tangent plane Rotation Scaling Thickness X Translation Diag. Deformation Axis Deformation Y Translation

29 CPC group SeminarThursday, June 1, 2006 Implementation  The vectors tangent to the manifold S X form the hyper plane T X tangent to S X.  The Tangent distance D(E,P) is found by minimizing distance between T E and T P.  The images are smoothed with a gaussian σ = 1.

30 CPC group SeminarThursday, June 1, 2006 The Equations of T P and T E are given by where Implementation (Contd…)

31 CPC group SeminarThursday, June 1, 2006 Implementation (Contd…) Solving for α P and α E we can calculate D(E,P) the Tangent Distance between two patterns E and P.

32 CPC group SeminarThursday, June 1, 2006 Tangent Distance method Results  USPS data set,1000 training examples and 7000 test examples.  The misclassification error rate using 3-NN is 3.26 %.  The time taken is sec.

33 CPC group SeminarThursday, June 1, 2006 References:  “The Elements of Statistical Learning- Data Mining, Inference and Prediction” by Trevor Hastie, Robert Tibshirani, Jerome Friedman  “Applied Multivariate Statistical Analysis” by Richard A. Johnson, Dean W. Wichern.   “Transformation Invariance in Pattern Recognition – Tangent Distance and Tangent propagation” by Patrice Y. Simard, Yann A. Le Cun.