Handwritten Digit Recognition Using Stacked Autoencoders Yahia Saeed, Jiwoong Kim, Lewis Westfall, and Ning Yang Seidenberg School of CSIS Pace University, New York
Optical Character Recognition Convert text into machine processable data 1910s Telegraph codes Reading device for the blind Circa 1930 Searching microfilm archives Fixed font characters Accuracy affected by noise Hand Written or printed characters Characters not consistent Likely to contain noise
We Explored Digit Recognition Neural Networks Autoencoder SoftMax Regression MNIST hand written digit data base Matlab
MNIST Database Large database of hand written digits Re-mix of NIST digit databases Training images from American Census Bureau employees Testing images from American high school students NIST images normalized 20x20 Antialiased introduces greyscale MNIST Images 28x28 Centered on center of mass
MNIST Database (cont.) 60k training images 10k testing images Our training set ½ from MNIST training images ½ from MNIST testing images Our testing set was similar
Neural Network Learning algorithm Each feature has an input Layers One or more hidden layers Output layer Inputs to each layer Combination of products of the outputs of the previous layer and weights
Neural Network Layer 1 – Input layer Layer 2 – Undercomplete hidden layer Layer 3 – Output layer
Auto-Encoders A type of unsupervised learning which tries to discover generic features of the data Learn identity function by learning important sub-features (not by just passing through data) Compression, etc. Can use just new features in the new training set or concatenate both Mention Zipser auotencoder with reverse engineering, then Cottrell compression where unable to reverse engineer Point out that don't have to have less hidden nodes in the next layer, but careful, if train too long, will just learn to pass through, more on that in a bit
Autoencoder Neural Network Unsupervised Backpropagation Output not intended to match input Features captured may not be intuitive Undercomplete constraint used
Stacked Auto-Encoders Stack many (sparse) auto-encoders in succession and train them using greedy layer-wise training
Stacked Auto-Encoders supervised training on the last layer using final features Then supervised training on the entire network to fine- tune all weights Shows softmax, but could use BP or any other variation
784 features -> 100 features Undercomplete 784 features -> 100 features 100 features -> 50 features Sparse network 'SparsityProportion',0.15, ...
1ST Autoencoder
Sparse Autoencoder L weightRegularization to control the Weight of the network (should be small) SparsityProportion is a parameter to control the sparsity of the output from the hidden layer must be (between 0-1)
2nd Autoencoder
SoftMax Classifier Supervised learning Classifies results of autoencoder processing of original inputs Goal to match output to original input
Stacked Autoencoder Output of hidden layer of one autoencoder input to the next autoencoder
Constraints Undercomplete Sparse network 784 features -> 100 features 100 features -> 50 features Sparse network
MatLab We used autoencoder and softMax functions Hides math processing GPU enhances speed
Previous Work LeCun, Cortes, and Burgess “The MNIST Database of Handwritten Digits” http://yann.lecum.com/exdb/mnist Method Accuracy Linear Classifier 92.4% K Nearest Neighbor 99.3% Boosted Stumps 99.1% Non-Linear Classifier 96.4% SVM 99.4% Neural Net 99.6% Convolutional Net 99.7%
Training 10k MNIST images 1st autoencoder 784 features / image Encode undercomplete to 100 features / image Decode to 784 features / image 400 epochs Sparsity parameter of 0.15
Training (cont.) 2nd autoencoder SoftMax Classifier 100 features / image Encode undercomplete to 50 features / image Decode to 100 features / image 100 epochs Sparsity parameter of 0.10 SoftMax Classifier 50 features / image to 1 of 10 classes / image 400 epochs
Testing First results - 79.7% accuracy Conducted retraining Final results – 99.7% accuracy
Output of 1st Autoencoder
Output of 2nd Autoencoder
1st Confusion Matrix
Final Confusion Matrix
Conclusion 2 stacked autoencoder layers and a softMax classifier layer Ultimately getting 99.7% accuracy MatLab really needs a GPU