Hand-written character recognition

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
EE 690 Design of Embodied Intelligence
Neural networks Introduction Fitting neural networks
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Supervised Learning Recap
Artificial Neural Networks
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Machine Learning Neural Networks
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Unconstrained Optimization Rong Jin. Recap  Gradient ascent/descent Simple algorithm, only requires the first order derivative Problem: difficulty in.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
The back-propagation training algorithm
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
1 L-BFGS and Delayed Dynamical Systems Approach for Unconstrained Optimization Xiaohui XIE Supervisor: Dr. Hon Wah TAM.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.
Biointelligence Laboratory, Seoul National University
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory 101 Quasi-Newton Methods.
Appendix B: An Example of Back-propagation algorithm
Classification / Regression Neural Networks 2
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Multi-Layer Perceptron
Research Vignette: The TransCom3 Time-Dependent Global CO 2 Flux Inversion … and More David F. Baker NCAR 12 July 2007 David F. Baker NCAR 12 July 2007.
Non-Bayes classifiers. Linear discriminants, neural networks.
Handwritten digit recognition
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Chapter 2 Single Layer Feedforward Networks
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Variations on Backpropagation.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Machine Learning Supervised Learning Classification and Regression
One-layer neural networks Approximation problems
Classification with Perceptrons Reading:
General Aspects of Learning
Incremental Training of Deep Convolutional Neural Networks
Variations on Backpropagation.
network of simple neuron-like computing elements
Neural Networks Geoff Hulten.
Backpropagation.
Variations on Backpropagation.
Backpropagation.
Presentation transcript:

Hand-written character recognition MNIST: a data set of hand-written digits 60,000 training samples 10,000 test samples Each sample consists of 28 x 28 = 784 pixels Various techniques have been tried Linear classifier: 12.0% 2-layer BP net (300 hidden nodes) 4.7% 3-layer BP net (300+200 hidden nodes) 3.05% Support vector machine (SVM) 1.4% Convolutional net 0.4% 6 layer BP net (7500 hidden nodes): 0.35% Failure rate for test samples

Hand-written character recognition Our own experiment: BP learning with 784-300-10 architecture Total # of weights: 784*300+300*10 = 238,200 Total # of Δw computed for each epoch: 1.4*10^10 Ran 1 month before it stopped Test error rate: 5.0%

Risk-Averting Error Function Mean Squared Error (MSE) Risk-Averting Error (RAE) James Ting-Ho Lo. Convexification for data fitting. Journal of Global Optimization, 46(2):307–315, February 2010.

Normalized Risk-Averting Error Normalized Risk-Averting Error (NRAE) It can be simplified as

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method A quasi-Newton method for solving the nonlinear optimization problems Using first-order gradient information to generate an approximation to the Hessian (second-order gradient) matrix Avoiding the calculation of the exact Hessian matrix can significantly save the computational cost during the optimization

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method The BFGS Algorithm: Generate an initial guess and an initial approximate inverse Hessian Matrix . Obtain a search direction at step k by solving: where is the gradient of the objective function evaluated at . Perform a line search to find an acceptable stepsize in the direction , then update

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Set and . Update the approximate Hessian matrix by Repeat step 2-5 until converges to the solution. Convergence can be checked by observing the norm of the gradient, .

The Broyden-Fletcher-Goldfarb-Shanno (BFGS) Method Limited-memory BFGS Method: A variation of the BFGS method Only using a few vectors to represent the approximation of the Hessian matrix implicitly Less memory requirement Well suited for optimization problems with a large number of variables

References J. T. Lo and D. Bassu. An adaptive method of training multilayer perceptrons. In Proceedings of the 2001 International Joint Conference on Neural Networks, volume 3, pages 2013–2018, July 2001. James Ting-Ho Lo. Convexification for data fitting. Journal of Global Optimization, 46(2):307–315, February 2010. BFGS: http://en.wikipedia.org/wiki/BFGS

A Notch Function

MSE vs. RAE

MSE vs. RAE