Presentation on theme: "Hazırlayan NEURAL NETWORKS Perceptron PROF. DR. YUSUF OYSAL."— Presentation transcript:
Hazırlayan NEURAL NETWORKS Perceptron PROF. DR. YUSUF OYSAL
The Perceptron NEURAL NETWORKS – Perceptron The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter. The operation of Rosenblatt’s perceptron is based on the McCulloch and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter. The weighted sum of the inputs is applied to the hard limiter. (Step or sign function) The weighted sum of the inputs is applied to the hard limiter. (Step or sign function) Threshold Inputs x2x2 Output Y Hard Limiter w2w2 w1w1 Linear Combiner
Exercise NEURAL NETWORKS – Perceptron x1x2Y A neuron uses a step function as its activation function, has threshold -1.5 and has w1 = 0.1, w2 = 0.1. What is the output with the following values of x1 and x2? Can you find weights that will give an OR function? An XOR function?
Can a single neuron learn a task? In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training a simple ANN: a perceptron. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter. The perceptron is the simplest form of a neural network. It consists of a single neuron with adjustable synaptic weights and a hard limiter. Linear Threshold Units (LTUs) : Perceptron Learning Rule Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule How does the perceptron learn its classification tasks? How does the perceptron learn its classification tasks? This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [ 0.5, 0.5], and then updated to obtain the output consistent with the training examples. NEURAL NETWORKS – Perceptron
Perceptron for Classification We try to find suitable values for the weights in such a way that the training examples are correctly classified. A single perceptron classifies input patterns, x, into two classes. The input patterns that can be classified by a single perceptron into two distinct classes are called linearly separable patterns. Geometrically, we try to find a hyper-plane that separates the examples of the two classes. In order for the perceptron to perform a desired task, its weights must be properly selected. In general two basic methods can be employed to select a suitable weight vector: By off-line calculation of weights. If the problem is relatively simple it is often possible to calculate the weight vector from the specification of the problem. By learning procedure. The weight vector is determined from a given (training) set of input- output vectors (exemplars) in such a way to achieve the best classification of the training vectors. Once the weights are selected (the encoding process), the perceptron performs its desired task (e.g., pattern classification) (the decoding process). NEURAL NETWORKS – Perceptron
Learning Rules NEURAL NETWORKS – Perceptron Linear Threshold Units (LTUs) : Perceptron Learning Rule Linearly Graded Units (LGUs) : Widrow-Hoff learning Rule... x1x1 x2x2 x m = 1 y1y1 y2y2 ynyn x m-1... w 11 w 12 w1mw1m w 21 w 22 w2mw2m wn1wn1 w nm wn2wn2 d1d1 d2d2 dndn Training Set Goal:
Perceptron Geometric View NEURAL NETWORKS – Perceptron The equation below describes a (hyper-)plane in the input space consisting of real valued m- dimensional vectors. The plane splits the input space into two regions, each of them describing one class. x2x2 C1C1 C2C2 x1x1 decision boundary w 1 x 1 + w 2 x 2 + w 0 = 0 decision region for C1 decision region for C2
Implementing Logic Gates with Single Layer Perceptron NEURAL NETWORKS – Perceptron In each case we have inputs x and outputs y, and need to determine the weights and thresholds. It is easy to find solutions by inspection: NOT Xy AND x1 x2 y OR x1 x2 y ANDOR NOT w = -1 = 0.5 w1 = 1 w2 = 1 = -1.5 w1 = 1 w2 = 1 = -0.5
Limitations of Simple Perceptrons NEURAL NETWORKS – Perceptron XOR Linearly Non-Separable XOR x1 x2 y We can follow the same procedure for the XOR network: w1 0 + w2 0 + 0 w1 0 + w2 1 + 0w2 w1 1 + w2 0 + 0 w1 w1 1 + w2 1 + < 0 w1 + w2 < Clearly the second and third inequalities are incompatible with the fourth, so there is in fact no solution. We need more complex networks, e.g. that combine together many simple networks, or use different activation/thresholding/transfer functions. It then becomes much more difficult to determine all the weights and thresholds by hand. In the next slides we will see how a neural network can learn these parameters. First, we need to consider what these more complex networks might involve.
Decision Hyperplanes and Linear Separability NEURAL NETWORKS – Perceptron If we have two inputs, then the weights define a decision boundary that is a one dimensional straight line in the two dimensional input space of possible input values. If we have n inputs, the weights define a decision boundary that is an n-1 dimensional hyperplane in the n dimensional input space: w1x1 + w2x wnxn + = 0 This hyperplane is clearly still linear (i.e. straight/flat) and can still only divide the space into two regions. We still need more complex transfer functions, or more complex networks, to deal with XOR type problems. Problems with input patterns which can be classified using a single hyperplane are said to be linearly separable. Problems (such as XOR) which cannot be classified in this way are said to be non-linearly separable.
Learning and Generalization NEURAL NETWORKS – Perceptron A network will also produce outputs for input patterns that it was not originally set up to classify (shown with question marks). There are two important aspects of the network’s operation to consider: Learning The network must learn decision surfaces from a set of training patterns so that these training patterns are classified correctly. Generalization After training, the network must also be able to generalize, i.e. correctly classify test patterns it has never seen before. Often we wish the network to learn perfectly, and then generalize well. Sometimes, the training data may contain errors (e.g. noise in the experimental determination of the input values, or incorrect classifications). In this case, learning the training data perfectly may make the generalization worse. There is an important tradeoff between learning and generalization that arises quite generally.
Generalization in Classification Suppose the task of our network is to learn a classification decision boundary: Our aim is for the network to generalize to classify new inputs appropriately. If we know that the training data contains noise, we don’t necessarily want the training data to be classified totally accurately as that is likely to reduce the generalization ability. NEURAL NETWORKS – Perceptron
Training a Neural Network Whether our neural network is a simple Perceptron, or a much more complicated multilayer network with special activation functions, we need to develop a systematic procedure for determining appropriate connection weights. The general procedure is to have the network learn the appropriate weights from a representative set of training data. In all but the simplest cases, however, direct computation of the weights is intractable. Instead, we usually start off with random initial weights and adjust them in small steps until the required outputs are produced. We will now look at a brute force derivation of such an iterative learning algorithm for simple Perceptrons. Then, in later lectures, we shall see how more powerful and general techniques can easily lead to learning algorithms which will work for neural networks of any specification we could possibly dream up. NEURAL NETWORKS – Perceptron