November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
NEURAL NETWORKS Perceptron
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Functions
November 19, 2009Introduction to Cognitive Science Lecture 20: Artificial Neural Networks I 1 Artificial Neural Network (ANN) Paradigms Overview: The Backpropagation.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
September 14, 2010Neural Networks Lecture 3: Models of Neurons and Neural Networks 1 Visual Illusions demonstrate how we perceive an “interpreted version”
September 16, 2010Neural Networks Lecture 4: Models of Neurons and Neural Networks 1 Capabilities of Threshold Neurons By choosing appropriate weights.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Back-Propagation Algorithm
Chapter 6: Multilayer Neural Networks
November 30, 2010Neural Networks Lecture 20: Interpolative Associative Memory 1 Associative Networks Associative networks are able to store a set of patterns.
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 1 K-Class Classification Problem Let us denote the k-th class by C k, with n k exemplars.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
October 7, 2010Neural Networks Lecture 10: Setting Backpropagation Parameters 1 Creating Data Representations On the other hand, sets of orthogonal vectors.
September 28, 2010Neural Networks Lecture 7: Perceptron Modifications 1 Adaline Schematic Adjust weights i1i1i1i1 i2i2i2i2 inininin …  w 0 + w 1 i 1 +
Neural Networks Lecture 17: Self-Organizing Maps
CS 484 – Artificial Intelligence
November 21, 2012Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms III 1 Learning in the BPN Gradients of two-dimensional functions:
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Classification Part 3: Artificial Neural Networks
December 5, 2012Introduction to Artificial Intelligence Lecture 20: Neural Network Application Design III 1 Example I: Predicting the Weather Since the.
Artificial Neural Networks
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
From Biological to Artificial Neural Networks Marc Pomplun Department of Computer Science University of Massachusetts at Boston
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
November 26, 2013Computer Vision Lecture 15: Object Recognition III 1 Backpropagation Network Structure Perceptrons (and many other classifiers) can only.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
November 21, 2013Computer Vision Lecture 14: Object Recognition II 1 Statistical Pattern Recognition The formal description consists of relevant numerical.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
March 31, 2016Introduction to Artificial Intelligence Lecture 16: Neural Network Paradigms I 1 … let us move on to… Artificial Neural Networks.
April 5, 2016Introduction to Artificial Intelligence Lecture 17: Neural Network Paradigms II 1 Capabilities of Threshold Neurons By choosing appropriate.
1 Neural Networks Winter-Spring 2014 Instructor: A. Sahebalam Instructor: A. Sahebalam Neural Networks Lecture 3: Models of Neurons and Neural Networks.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Supervised Learning in ANNs
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
of the Artificial Neural Networks.
Creating Data Representations
Capabilities of Threshold Neurons
The Naïve Bayes (NB) Classifier
Computer Vision Lecture 19: Object Recognition III
Presentation transcript:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 1 Linear Separability So by varying the weights and the threshold, we can realize any linear separation of the input space into a region that yields output 1, and another region that yields output 0. As we have seen, a two-dimensional input space can be divided by any straight line. A three-dimensional input space can be divided by any two-dimensional plane. In general, an n-dimensional input space can be divided by an (n-1)-dimensional plane or hyperplane. Of course, for n > 3 this is hard to visualize.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 2 Capabilities of Threshold Neurons What do we do if we need a more complex function? We can combine multiple artificial neurons to form networks with increased capabilities. For example, we can build a two-layer network with any number of neurons in the first layer giving input to a single neuron in the second layer. The neuron in the second layer could, for example, implement an AND function.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 3 Capabilities of Threshold Neurons What kind of function can such a network realize? x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2... xixixixi

November 20, 2014Computer Vision Lecture 19: Object Recognition III 4 Capabilities of Threshold Neurons Assume that the dotted lines in the diagram represent the input-dividing lines implemented by the neurons in the first layer: 1 st comp. 2 nd comp. Then, for example, the second-layer neuron could output 1 if the input is within a polygon, and 0 otherwise.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 5 Capabilities of Threshold Neurons However, we still may want to implement functions that are more complex than that. An obvious idea is to extend our network even further. Let us build a network that has three layers, with arbitrary numbers of neurons in the first and second layers and one neuron in the third layer. The first and second layers are completely connected, that is, each neuron in the first layer sends its output to every neuron in the second layer.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 6 Capabilities of Threshold Neurons What type of function can a three-layer network realize? x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2 x1x1x1x1 x2x2x2x2... oioioioi...

November 20, 2014Computer Vision Lecture 19: Object Recognition III 7 Capabilities of Threshold Neurons Assume that the polygons in the diagram indicate the input regions for which each of the second-layer neurons yields output 1: 1 st comp. 2 nd comp. Then, for example, the third-layer neuron could output 1 if the input is within any of the polygons, and 0 otherwise.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 8 Capabilities of Threshold Neurons The more neurons there are in the first layer, the more vertices can the polygons have. With a sufficient number of first-layer neurons, the polygons can approximate any given shape. The more neurons there are in the second layer, the more of these polygons can be combined to form the output function of the network. With a sufficient number of neurons and appropriate weight vectors w i, a three-layer network of threshold neurons can realize any (!) function R n  {0, 1}.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 9Terminology Usually, we draw neural networks in such a way that the input enters at the bottom and the output is generated at the top. Arrows indicate the direction of data flow. The first layer, termed input layer, just contains the input vector and does not perform any computations. The second layer, termed hidden layer, receives input from the input layer and sends its output to the output layer. After applying their activation function, the neurons in the output layer contain the output vector.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 10Terminology Example: Network function f: R 3  {0, 1} 2 output layer hidden layer input layer input vector output vector

November 20, 2014Computer Vision Lecture 19: Object Recognition III 11 Sigmoidal Neurons Sigmoidal neurons accept any vectors of real numbers as input, and they output a real number between 0 and 1. Sigmoidal neurons are the most common type of artificial neuron, especially in learning networks. A network of sigmoidal units with m input neurons and n output neurons realizes a network function f: R m  (0,1) n

November 20, 2014Computer Vision Lecture 19: Object Recognition III 12 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and  = f i (net i (t)) net i (t)  = 1  = 0.1

November 20, 2014Computer Vision Lecture 19: Object Recognition III 13 Sigmoidal Neurons This leads to a simplified form of the sigmoid function: We do not need a modifiable threshold , because we will use “dummy” (offset) inputs. The choice  = 1 works well in most situations and results in a very simple derivative of S(net).

November 20, 2014Computer Vision Lecture 19: Object Recognition III 14 Sigmoidal Neurons This result will be very useful when we develop the backpropagation algorithm.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 15 Feedback-Based Weight Adaptation Feedback from environment (possibly teacher) is used to improve the system’s performanceFeedback from environment (possibly teacher) is used to improve the system’s performance Synaptic weights are modified to reduce the system’s error in computing a desired functionSynaptic weights are modified to reduce the system’s error in computing a desired function For example, if increasing a specific weight increases error, then the weight is decreasedFor example, if increasing a specific weight increases error, then the weight is decreased Small adaptation steps are needed to find optimal set of weightsSmall adaptation steps are needed to find optimal set of weights Learning rate can vary during learning processLearning rate can vary during learning process Typical for supervised learningTypical for supervised learning

November 20, 2014Computer Vision Lecture 19: Object Recognition III 16 Evaluation of Networks Basic idea: define error function and measure error for untrained data (testing set)Basic idea: define error function and measure error for untrained data (testing set) Typical: where d is the desired output, and o is the actual output.Typical: where d is the desired output, and o is the actual output. For classification: E = number of misclassified samples/ total number of samplesFor classification: E = number of misclassified samples/ total number of samples

November 20, 2014Computer Vision Lecture 19: Object Recognition III 17 Gradient Descent Gradient descent is a very common technique to find the absolute minimum of a function. It is especially useful for high-dimensional functions. We will use it to iteratively minimizes the network’s (or neuron’s) error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 18 Gradient Descent Gradient-descent example: Finding the absolute minimum of a one-dimensional error function f(x): f(x)x x0x0x0x0 slope: f’(x 0 ) x 1 = x 0 -  f’(x 0 ) Repeat this iteratively until for some x i, f’(x i ) is sufficiently close to 0.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 19 Gradient Descent Gradients of two-dimensional functions: The two-dimensional function in the left diagram is represented by contour lines in the right diagram, where arrows indicate the gradient of the function at different locations. Obviously, the gradient is always pointing in the direction of the steepest increase of the function. In order to find the function’s minimum, we should always move against the gradient.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 20 Multilayer Networks The backpropagation algorithm was popularized by Rumelhart, Hinton, and Williams (1986). This algorithm solved the “credit assignment” problem, i.e., crediting or blaming individual neurons across layers for particular outputs. The error at the output layer is propagated backwards to units at lower layers, so that the weights of all neurons can be adapted appropriately.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 21Terminology Example: Network function f: R 3  R 2 output layer hidden layer input layer input vector output vector x1x1x1x1 x2x2x2x2 o2o2o2o2 o1o1o1o1 x3x3x3x3

November 20, 2014Computer Vision Lecture 19: Object Recognition III 22 Backpropagation Learning The goal of the backpropagation learning algorithm is to modify the network’s weights so that its output vector o p = (o p,1, o p,2, …, o p,K ) is as close as possible to the desired output vector d p = (d p,1, d p,2, …, d p,K ) for K output neurons and input patterns p = 1, …, P. The set of input-output pairs (exemplars) {(x p, d p ) | p = 1, …, P} constitutes the training set.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 23 Backpropagation Learning We need a cumulative error function that is to be minimized: We can choose the mean square error (MSE), where the 1/P factor does not matter for minimizing error: where

November 20, 2014Computer Vision Lecture 19: Object Recognition III 24 Backpropagation Learning For input pattern p, the i-th input layer node holds x p,i. Net input to j-th node in hidden layer: Network error for p: Output of k-th node in output layer: Net input to k-th node in output layer: Output of j-th node in hidden layer:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 25 Backpropagation Learning As E is a function of the network weights, we can use gradient descent to find those weights that result in minimal error. For individual weights in the hidden and output layers, we should move against the error gradient (omitting index p): Output layer: Derivative easy to calculate Hidden layer: Derivative difficult to calculate

November 20, 2014Computer Vision Lecture 19: Object Recognition III 26 Backpropagation Learning When computing the derivative with regard to w k,j (2,1), we can disregard any output units except o k : Remember that o k is obtained by applying the sigmoid function S to net k (2), which is computed by: Therefore, we need to apply the chain rule twice.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 27 Backpropagation Learning Since We have: We know that: Which gives us:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 28 Backpropagation Learning For the derivative with regard to w j,i (1,0), notice that E depends on it through net j (1), which influences each o k with k = 1, …, K: Using the chain rule of derivatives again:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 29 Backpropagation Learning This gives us the following weight changes at the output layer: … and at the inner layer:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 30 Backpropagation Learning As you surely remember from a few minutes ago: Then we can simplify the generalized error terms: And:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 31 Backpropagation Learning The simplified error terms  k and  j use variables that are calculated in the feedforward phase of the network and can thus be calculated very efficiently. Now let us state the final equations again and reintroduce the subscript p for the p-th pattern:

November 20, 2014Computer Vision Lecture 19: Object Recognition III 32 Backpropagation Learning Algorithm Backpropagation; Start with randomly chosen weights; Start with randomly chosen weights; while MSE is above desired threshold and computational bounds are not exceeded, do while MSE is above desired threshold and computational bounds are not exceeded, do for each input pattern x p, 1  p  P, for each input pattern x p, 1  p  P, Compute hidden node inputs; Compute hidden node inputs; Compute hidden node outputs; Compute hidden node outputs; Compute inputs to the output nodes; Compute inputs to the output nodes; Compute the network outputs; Compute the network outputs; Compute the error between output and desired output; Compute the error between output and desired output; Modify the weights between hidden and output nodes; Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes; Modify the weights between input and hidden nodes; end-for end-for end-while. end-while.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 33 Supervised Function Approximation There is a tradeoff between a network’s ability to precisely learn the given exemplars and its ability to generalize (i.e., inter- and extrapolate). This problem is similar to fitting a function to a given set of data points. Let us assume that you want to find a fitting function f:R  R for a set of three data points. You try to do this with polynomials of degree one (a straight line), two, and nine.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 34 Supervised Function Approximation Obviously, the polynomial of degree 2 provides the most plausible fit. f(x)x deg. 1 deg. 2 deg. 9

November 20, 2014Computer Vision Lecture 19: Object Recognition III 35 Supervised Function Approximation The same principle applies to ANNs: If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too few neurons, it may not have enough degrees of freedom to precisely approximate the desired function. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. If an ANN has too many neurons, it will learn the exemplars perfectly, but its additional degrees of freedom may cause it to show implausible behavior for untrained inputs; it then presents poor ability of generalization. Unfortunately, there are no known equations that could tell you the optimal size of your network for a given application; there are only heuristics.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 36 Creating Data Representations The problem with some data representations is that the meaning of the output of one neuron depends on the output of other neurons. This means that each neuron does not represent (detect) a certain feature, but groups of neurons do. In general, such functions are much more difficult to learn. Such networks usually need more hidden neurons and longer training, and their ability to generalize is weaker than for the one-neuron-per-feature-value networks.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 37 Creating Data Representations On the other hand, sets of orthogonal vectors (such as 100, 010, 001) representing individual features can be processed by the network more easily. This becomes clear when we consider that a neuron’s net input signal is computed as the inner product of the input and weight vectors. The geometric interpretation of these vectors shows that orthogonal vectors are especially easy to discriminate for a single neuron.

November 20, 2014Computer Vision Lecture 19: Object Recognition III 38 Creating Data Representations Another way of representing n-ary data in a neural network is using one neuron per feature, but scaling the (analog) value to indicate the degree to which a feature is present. Good examples: the brightness of a pixel in an input image the brightness of a pixel in an input image the distance between a robot and an obstacle the distance between a robot and an obstacle Poor examples: the letter (1 – 26) of a word the letter (1 – 26) of a word the type (1 – 6) of a chess piece the type (1 – 6) of a chess piece