Presentation on theme: "Analysis of Classification Algorithms in Handwriting Pattern Recognition Logan Helms Jon Daniele."— Presentation transcript:
Analysis of Classification Algorithms in Handwriting Pattern Recognition Logan Helms Jon Daniele
Problem The construction and implementation of computerized systems capable of classifying visual input with speed and accuracy comparable to that of the human brain has remained an open problem in computer science for over 40 years.
Formal Statement Given an unknown function (the ground truth) that maps input instances to output labels, along with training data assumed to represent accurate examples of the mapping, produce a function that approximates as closely as possible the correct mapping.
Basic Pattern Recognition Model
Pattern Recognition Preprocessor A system that processes its input data to produce output data that will be used as input data for another system. Classifier A system that attempts to assign each input value to one of a given* set of classes. *A predetermined set of classifications may not always exist
MNIST Dataset A subset constructed from NIST’s Special Database 3 (SD-3) and Special Database 1 (SD-1) which contain binary images of handwritten digits. SD-3 was collected from Census Bureau employees SD-1 was collected from high school students Samples are 28x28 pixels, size-normalized and centered Samples contain gray levels as a result of the anti- aliasing technique used by the normalization algorithm. Dataset has been used without further preprocessing.
MNIST: Training Set 30,000 from Census Bureau employees 30,000 from high school students
MNIST: Testing Set 5,000 from Census Bureau employees 5,000 from high school students
Algorithms We Will Analyze Template Matching Naïve Bayes Classifier Feed Forward Neural Network with Backpropagation
Template Matching Low hanging fruit
Template Matching Digital image processing technique used to match smaller parts of one image that match a template image Multiple approaches: template- matching vs feature-matching Computationally complex
Approaches: Template-based Used when there are no 'strong features' available in a template Makes use of the entire template image rather than just portions Becomes difficult with high-resolution images Potentially requires a massive search area to find the best match Facial recognition: Uses the entire face as a template May become difficult if multiple features on a face are obscured or unavailable
Approaches: Feature-based Identify specific, 'strong' features in a given template and match those features rather than match the entire template Less computationally complex as it doesn't require the resolution of an entire template May fail when templates are not differentiated between strong features Facial recognition example: Match the relative position of strong facial features, i.e. nose, mouth, ears Match the strong features themselves as well Works at a far lower resolution, which may obscure the features necessary for a template match
Bayesian classifier A naïve approach (that works)
But that’s not naïve!!
Some points to note
Let’s do this: Leeeeeeroy Jennnnkins!
Leroy’s example We have 3 pieces of data each for 1000 players on WoW How loud are they Do people like them How many times do they screw up a raid Training set: Let’s predict Leroy?LoudNot loudDislikedLikedScrew-upNot a screw-upTotal Leroy fanboy Kinda Leroy Not Leroy Totals
Easy math: The base rates Class occurences: p(Leroy fanboy) = 0.5 (500/1000) p(Kinda Leroy) = 0.3 p(Not Leroy) = 0.2 Probability of “likelihood” p(Loud/Leroy fanboy) p(Loud/Kinda Leroy) … p(Not a screw-up/Not Leroy) = 0.25 (50/200) p(Screw-up/Not Leroy) = 0.75 Given features from the unknown player p(Loud) = 0.5 p(Disliked) = 0.65 p(Screw-up) = 0.8
Easy math: the base rates Feature per class: Feature/Class total LoudNot loudNot likedLikedScrew-upNot a screw-up Leroy Fanboy Kinda Leroy Not Leroy Probability of class: Class/total players Leroy Fanboy 0.5 Kinda Leroy 0.3 Not Leroy 0.2 Features from evidence: Feature/total players Loud 0.5 Disliked 0.65 Screw-up 0.8
Bad math Ok, here’s a new player on Wow, and we want to know into which category of player they should be placed. Are they a Leroy fanboy, Kinda Leroy, or Not a Leroy? We observe the following characteristics for the unknown player: Loud Disliked Screw-up We run the numbers for each of the 3 outcomes, then choose the one with the highest probability and classify the unknown player as part of the class with highest probability according to our base rates established by the training set.
Bad math, cont’d
Bad math, fin We now have the following probabilities: Leroy fanboy:0.252 Kinda Leroy: 0 Not Leroy: And now we know that the new player falls into the Leroy fanboy category of players!
Feed Forward Neural Network with Backpropagation The birth of SKYNET
Neural Network A computational model inspired by central nervous systems, in particular the brain. Generally presented as systems of interconnected neurons in a brain. Neuron is the basic unit.
Transfer Function Backpropagation requires the transfer function be differentiable. We chose to use sigmoid function for our transfer function because it is easily differentiable and easier to work with.
Feed Forward Neural Network
Solutions are known Weights are learned Evolves in the weight space Used for: Prediction Classification Function approximation
Backpropagation Common method of training neural networks Backpropagation requires the transfer function to be differentiable. Two Step Process 1. Propagation 2. Weight update
1. Propagation Forward propagation of a training pattern’s input through the neural network in order to generate the propagation’s output activations. Backward propagation of the propagation’s output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons.
2. Weight Update Each weight-synapse will follow these steps: Multiply its output delta and input activation to get the gradient of the weight. Subtract a ratio (percentage) of the gradient from the weight.
A Pseudo-Code Algorithm Randomly choose the initial weights While error is too large For each training pattern (in random order) Apply the inputs to the network Propagation (as described earlier) Weight Update (as described earlier) Apply weight adjustments Periodically evaluate the network performance
Constraints As a control, each classification algorithm will be presented with the MNIST test set, in the exact same order.
Hypothesis The neural network will match samples to targets with a higher accuracy than the template matching and Naïve Bayes classifier.
Hypothesis Testing Accuracy is defined as such: In addition we will also test the following: Implementation: Based on ease of implementation Average Runtime: Based on average of end time – start time of running an algorithm Overall Feasibility: All factors taken into account for the desired use-case.