Unit 1 Basics of Artificial Neural Networks

Unit 1 Basics of Artificial Neural Networks

From Biological to Artificial Neural Networks
What Do You See in this Picture????

Is there any conventional computer at present with the capability of perceiving both the trees and Baker's transparent head in this picture at the same time? Most probably, the answer is no. Although such a visual perception is an easy task for human being, we are faced with difficulties when sequential computers are to be programmed to perform visual operations.

In a conventional computer, usually there exist a single processor implementing a sequence of arithmetic and logical operations, nowadays at speeds approaching billion operations per second. However this type of devices have ability neither to adapt their structure nor to learn in the way that human being does. There is a large number of tasks for which it is proved to be virtually impossible to device an algorithm or sequence of arithmetic and/or logical operations. For example, in spite of many attempts, a machine has not yet been produced which will automatically read handwritten characters, or recognize words spoken by any speaker let alone translate from one language to another, or drive a car, or walk and run as an animal or human being.

New models of computing to perform pattern recognition (PR) tasks are inspired by the structure & performance of our biological neural network. But they don’t reach anywhere near the performance of the BNN because of: 1. Incomplete knowledge about operation of a biological neuron & the neural interconnections. 2. Impossible to simulate the number of neurons & their interconnections as it exists in a BNN. 3. Impossible to simulate their operations in the natural asynchronous mode.

Characteristics of Neural Networks
Points to consider : Problem solving: Pattern recognition tasks by human and machine Pattern Vs data Pattern processing Vs data processing Architectural mismatch Need for new models of computing

Computers Vs Brain For a conventional computer, it is neither processing speed of the computers nor their processing ability. Today’s processors have a speed 106 times faster than the basic processing elements of the brain called neuron. When the abilities are compared, the neurons are much simpler. The difference is mainly due to the structural and operational trend In a conventional computer the instructions are executed sequentially in a complicated and fast processor, the brain is a massively parallel interconnection of relatively simple and slow processing elements.

Computers Vs Brain Some features of BNN that makes them superior to most sophisticated AI computer system for PR tasks are as follows: Robustness and Fault Tolerance: The decay of nerve cells does not affect the performance significantly. Flexibility: The network automatically adjusts to a new environment without using any preprogrammed instructions. 3. Ability to deal with a variety of data situations: The network can deal with information i.e. fuzzy, probabilistic, noisy and inconsistent. Collective Computation: The network performs routinely many operations in parallel and also a given task in a distributed manner.

Biological Neural Network
The features of BNN (Human Nervous System) are attributed to its structure and function. It is claimed that the human central nervous system is comprised of about 1,3x1010 neurons and that about 1x1010 of them takes place in the brain. At any time, some of these neurons are firing and the power dissipation due this electrical activity is estimated to be in the order of 10 watts. Monitoring the activity in the brain has shown that, even when asleep, 5x107 nerve impulses per second are being relayed back and forth between the brain and other parts of the body. This rate is increased significantly when awake.

Figure : Schematic diagram of a neuron or nerve cell
Biological Neuron A typical biological neuron has following components: The fundamental unit of the network is called a neuron or a nerve cell. A neuron has a roughly spherical cell body called soma. Nucleus is located here. The signals generated in soma are transmitted to other neurons through an extension on the cell body called axon or nerve fibre. Tree – like nerve fibres called dendrites are associated with the cell body. Dendrites are responsible for receiving the incoming signals from other neurons. Figure : Schematic diagram of a neuron or nerve cell

Figure : Schematic diagram of a neuron or nerve cell
Biological Neuron A typical biological neuron has following components: Axon is a single long fibre extending from cell body, which eventually branches into strands & sub-strands connecting to many other neurons at the synaptic junctions, or synapses. The receiving end of these junctions on other cells can be found both on the dendrites and on the cell bodies themselves. The axon of a typical neuron makes a few thousand synapses with other neurons. Figure : Schematic diagram of a neuron or nerve cell

Biological Neuron A typical biological neuron has following components: At the other end, the axon is separated into several branches, at the very end of which the axon enlarges and forms terminal buttons. Terminal buttons are placed in special structures called the synapses which are the junctions transmitting signals from one neuron to another. A neuron typically drive 103 to 104 synaptic junctions. In general the synapses take place between an axon branch of a neuron and the dendrite of another one. Although it is not very common, synapses may also take place between two axons or two dendrites of different cells or between an axon and a cell body.

Biological Neuron process
A neuron receives inputs from a large number of neurons via its synaptic connections. Nerve signals arriving at the presynaptic cell membrane cause chemical transmitters to be released in to the synaptic cleft. These chemical transmitters diffuse across the gap and join to the postsynaptic membrane of the receptor site. The membrane of the post-synaptic cell gathers the chemical transmitters. This causes either a decrease or an increase in the efficiency of the local sodium and potassium pumps depending on the type of the chemicals released in to the synaptic cleft. In turn, the soma potential, which is called graded potential, changes. While the synapses whose activation decreases the efficiency of the pumps cause depolarization of the graded potential, the effects of the synapses that increase the efficiency of pumps result in hyper-polarization. The first kind of synapses encouraging depolarization is called excitatory and the others discouraging it are called inhibitory synapses.

Biological Neuron process
Neurons transmit signals as follows : The transmission of a signal from one cell to another at a synapse is a complex chemical process, in which specific transmitter substances are released from the sending side of the junction. The effect is to raise or lower the electrical potential inside the body of the receiving cell. If this potential reaches a threshold, electrical activity in the form of short pulses takes place. When this happens, the cell is said to have fired. This electrical activity of fixed strength and duration is sent down the axon. The dendrites serve as receptors for signals from adjacent neurons, whereas the axon's purpose is the transmission of the generated neural activity to other nerve cells or to muscle fibres. In the first case the term interneuron may be used, whereas the neuron in the latter case is called motor neuron. A third type of neuron, which receives information from muscles or sensory organs, such as the eye or ear, is called a receptor neuron. The complexity of the human central nervous system is due to the vast number of neurons and their mutual connections.

Performance comparison of Computer (ANN) & Biological Neuron Network
A set of processing units when assembled in a closely interconnected network, offers a surprisingly rich structure exhibiting some features of the BNN. This structure is ANN. Since artificial neural networks are implemented on computers, it is worth comparing the processing capabilities of computers with that of the biological neural networks. 1. Speed : Neural networks are slow in processing information. The cycle time corresponding to execution of one step of a program in a computer is in the range of a few nanoseconds, whereas the cycle time corresponding to a neural event prompted by an external stimulus, is in the milliseconds range. Thus computers process information a million times faster. 2. Processing: Neural networks perform massively parallel operations. Most programs operate in a serial mode, one instruction after another, in a conventional computer, whereas the brain operates with massively parallel programs that have comparatively fewer steps.

3. Size and Complexity : Neural networks have large numbers of computing elements, and the computing is not restricted to within neurons. It is this size and complexity of connections that may be giving the brain the power of performing complex PR tasks which we are unable to realize on a computer. The conventional computer typically has one central processing unit where all the computing takes place. 4. Storage : Neural networks store information in the strengths of the interconnections. In a computer, information is stored in the memory which is addressed by its location. New information is added by adjusting the interconnection strengths without completely destroying the old information, whereas in a computer the information is strictly replaceable.

5. Fault Tolerance : Neural networks distribute the encoded information throughout the network, and hence they exhibit fault tolerance. In contrast, computers are inherently not fault tolerant, in the sense that information corrupted in the memory cannot be retrieved. 6. Control Mechanism : There is no central control in processing information in the brain. Thus there in no specific control mechanism external to the computing task. In a computer, on the other hand, there is a control unit which monitors all the activities of computing.

While the superiority of the human information processing system over the conventional computer for pattern recognition tasks stems from the basic structure and operation of the biological neural network, it is possible to realize some of the features of the human system using an artificial neural network consisting of basic computing elements. In particular, it is possible to show that such a network exhibits parallel and distributed processing capability. In addition, information can be stored in a distributed manner in the connection strengths so as to achieve fault tolerance.

Artificial Neural Networks: Terminology
Processing unit: An artificial neural network (ANN) as a highly simplified model of the structure of the biological neural network. An ANN consists of interconnected processing units. The general model of a processing unit consists of a summing part followed by an output part. The summing part receives n input values, weighs each value, and performs a weighted sum. The weighted sum is called the activation value. The sign of the weight for each input determines whether the input is excitatory (positive weight) or inhibitory (negative weight). The inputs could be discrete or continuous data values, and likewise the outputs also could be discrete or continuous. The input and output may also be viewed as deterministic or stochastic or fuzzy, depending on the nature of the problem and its solution.

Interconnections: In an artificial neural network several processing units are interconnected according to some topology to accomplish a pattern recognition task. Therefore the inputs to a processing unit may come from outputs of other processing units, and/or from an external source. The output of each unit may be given to several units including itself. The amount of the output of one unit received by another unit depends on the strength of the connection between the units, and it is reflected in the weight value associated with the connecting link. If there are N units in a given ANN then at any instant of time each unit will have a unique activation value and a unique output value.

Interconnections: The set of the N activation values of the network defines the activation state of the network at that instant. Likewise, the set of the N output values of the network define the output state of the network at that instant. Depending on the discrete or continuous nature of the activation and output values, the state of the network can be described by a point in a discrete or continuous N-dimensional

Operations: Each unit of an ANN receives inputs from other connected units and/or from an external source. A weighted sum of the inputs is computed at a given instant of time. The resulting activation value determines the actual output from the output function unit, i.e., the output state of the unit. The output values and other external inputs in turn determine the activation and output states of the other units. The activation values of the units (activation state) of the network as a function of time are referred to as activation dynamics. The activation dynamics also determine the dynamics of the output state of the network.

The set of all activation states defines the state space of the network. The set of all output states defines the output or signal state space of the network. Activation dynamics determines the trajectory of the path of the states in the state space of the network. For a given network, defined by the units and their interconnections with appropriate weights, the activation states refer to the short term memory function of the network. Generally the activation dynamics is followed to recall a pattern stored in a network. In order to store a pattern in a network, it is necessary to adjust the weights of the network. The sets of all weight values (corresponding to strengths of all connecting links of an ANN) defines the weight space.

If the weights are changing, then the set of weight values as a function of time defines the synaptic dynamics of the network. Synaptic dynamics is followed to adjust the weights in order to store given patterns in the network. The process of adjusting the weights is referred to as learning. Once the learning process is completed, the final set of weight values corresponds to the long term memory function of the network. The procedure to incrementally update each of the weights is called a learning law or learning algorithm.

Update : In implementation, there are several options available for both activation and synaptic dynamics. In particular, the updating of the output states of all units could be performed synchronously. In this case, the activation values of all units are computed at the same time assuming a given output state throughout. From these activation values the new output state of the network is derived. In an asynchronous update, on the other hand, each unit is updated sequentially, taking the current output state of the network into account each time. For each unit, the output state can be determined from the activation value either deterministically or stochastically. In practice, the activation dynamics, including the update, is much more complex in a biological neural network. The ANN models along with the equations governing the activation and synaptic dynamics are developed according to the complexity of the PR task to be handled.

Models of a Neuron McCulloch-Pitts (MP) model :
In the McCulloch-Pitts (MP) model the activation (x) is given by a weighted sum of its M -input signal values (ai) and a bias term (θ).

Models of a Neuron McCulloch-Pitts (MP) model : The output signal (s) is typically a linear or nonlinear function of the activation value. Three common nonlinear functions (binary, ramp and sigmoid) are shown in figure, although the binary function was used in the original MP model. Fig (a) shows Binary (b) shows Ramp ( c) shows Sigmoid

Models of a Neuron Activation : x = ∑ wiai - θ i= 1
McCulloch-Pitts (MP) model : The following equations describe the operation of an MP model: M Activation : x = ∑ wiai - θ i= 1 Output signal : s = f (x).

Models of a Neuron

Models of a Neuron When the threshold function is used as the neuron output function, and binary input values 0 and 1 are assumed, the basic Boolean functions AND, OR and NOT of two variables can be implemented by choosing appropriate weights and threshold values,

Models of a Neuron

Problems on MP Model

Learning Laws A NN learns about its environment through an interactive process of adjustments applied to its synaptic weights . Learning is the process by which the free parameters of a NN get adapted through a process of stimulation. The type of learning is determined by the manner in which the parameter change take place. The set of well defined rules for the solution of a learning problem is called a learning algorithm. Each learning algorithm differs from the other in the way in which the adjustments to a synaptic weight of a neuron is formulated.

Learning Laws The learning or weight changes could be supervised or unsupervised. In supervised learning the weight changes are determined by the difference between the desired output and the actual output. Some of the supervised learning laws are: error correction learning or delta rule and stochastic learning. Supervised learning may be used for structural learning or for temporal learning. Structural learning is concerned with capturing in the weights the relationship between a given input-output pattern pair. Temporal learning is concerned with capturing in the weights the relationship between neighboring patterns in a sequence of patterns. Unsupervised learning discovers features in a given set of patterns and organizes the patterns accordingly. There is no externally specified desired output as in the case of supervised learning. Examples of unsupervised learning laws are: Hebbian learning and competitive learning.

Learning Laws Unsupervised learning uses mostly local information to update the weights. The local information consists of signal or activation values of the units at either end of the connection for which the weight update is being made. Learning methods can be grouped into off-line and on-line. In off-line learning all the given patterns are used, may be several times if needed, to adjust the weights. Most error correction learning laws belong to the off-line category. In on-line learning each new pattern or set of patterns can be incorporated into the network without any loss of the prior stored information. Thus an on-line learning allows the neural network to add new information continuously. An off-line learning provides superior solutions because information is extracted when all the training patterns are available, whereas an on-line learning updates only the available information of the past patterns in the form of weights.

Learning Laws Stability and convergence:
The activation state of each unit at each stage is computed in terms of the state of the network in the previous stage. The state update at each stage could be made asynchronously, i.e. each unit is updated using the new updated state, or synchronously, i.e., all the units are updated using the same previous state. The implications of these implementations are on the stability of the equilibrium activation states of a feedback neural network, and on the convergence of the synaptic weights while minimizing the error between the desired output and the actual output during learning. In general, there are no standard methods to determine whether network activation dynamics or synaptic dynamics leads to stability or convergence.

Basic Learning Laws 1. Hebbian Learning Rule:
Hebb’s learning rule is the oldest and most famous. Rule: “ When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic changes take place in one or both cells such that A’s efficiency as one of the cells firing B, is increased”. This learning can also be called correlational learning. It is also stated as: 1. If two neurons on either side of a synapse are activated simultaneously, then the strength of that synapse is selectively increased. 2. If two neurons on either side of a synapse are activated asynchronously, then the strength of that synapse is selectively weakened.

Basic Learning Laws The simplest form of rule is: ∆ w = xi y
It represents a Feed-Forward, unsupervised learning. It says that if the cross product of output and input is positive, this results in increase of weight, otherwise the weight decreases.

Basic Learning Laws 0 if - θ <=yin <= θ -1 if yin < - θ
2. Perceptron Learning Rule: The learning signal is the difference between the desired and actual neuron’s response. This type of learning is supervised. Consider a finite "n" number of input training vectors (x), with their associated target (desired) values s(n) and t(n), where "n" ranges from 1 to N. The target is either+ 1 or -1. The output ''y" is obtained on the basis of the net input calculated and activation function being applied over the net input. y = f(yin) 1 if yin > θ 0 if - θ <=yin <= θ -1 if yin < - θ where yin = ∑ xi wi The weight updation is given by : If y ≠ t, then w(new) = w(old) + ηt x η = 0 to 1 else, w(new) = w(old)

Basic Learning Laws The Delta rule is given by : ∆ wi = α (t - y) xi
3. Delta Learning Rule (Widrow-Hoff or Least Mean Square): The adjustments made to a synaptic weight of a neuron is proportional to the product of the error signal and the input signal of the synapse. The aim of delta rule is to minimize the error over all training patterns. The Delta rule is given by : ∆ wi = α (t - y) xi where x is the vector of activation of input units y is the output , a function of activation ∑ xi w t is the target output α is the learning rate parameter Widrow-Hoff Least Mean Square is a special case of Delta rule, where output function is assumed linear i.e., y= f(x)

Basic Learning Laws <back

Models of a Neuron 2. Perceptron Model :
Frank Rosenblatt & Minsky and Papert developed learning rule for large class of ANNs called Perceptrons. The Perceptron learning rule uses an iterative weight adjustment i.e. more powerful than the Hebb rule. The Perceptrons use threshold output function and the MP model of a neuron. Their iterative learning converges to correct weights i.e. the weights that produce the exact output value for the training input pattern. Training in Perceptron will continue until no error occurs. This net solves the problem and is also used to learn the classification.

Models of a Neuron Activation: ai= ∑ xi wi i= 1 to n
2. Perceptron Model : The main deviation from the MP model is that here learning (i.e., adjustment of weights) is incorporated in the operation of the unit. The target output (b) is compared with the actual binary output(s) and the error is used to adjust the weights. The following equations describe the operation of the perceptron model of a neuron. Activation: ai= ∑ xi wi i= 1 to n Output signal: si = f (ai) Error: δ= b-s, Weight update: ∆ wi = ηδai where η is called learning rate parameter.

Models of a Neuron 2. Perceptron Model :

Models of a Neuron Activation: a= ∑ xi wi i= 1 to n
3. ADALINE Model (ADAptive LINear Element): The main distinction between Rosenblatt's perceptron model and Widrow's adaline model is that in the adaline model the analog activation value (a) is compared with the target output (b). In other words, the output is a linear function of the activation value (x). The equations that describe the operation of an adaline are as follows Activation: a= ∑ xi wi i= 1 to n Output signal: s = f (a) = a Error: δ= b-s = b-a Weight update: ∆ wi = ηδai where η is called learning rate parameter.

Models of a Neuron

Topology of ANN Models Artificial neural networks are useful only when the processing units are organized in a suitable manner to accomplish a given pattern recognition task. The arrangement of the processing units, connections, and pattern input/ output is referred to as topology. Artificial neural networks are normally organized into layers of processing units. Connections can be made either from units of one layer to units of another (interlayer connections) or from the units within the layer (intra-layer connections) or both inter and intra-layer connections. The connections among the layers and among the units within a layer can be organized either in a feedforward manner or in a feedback manner. In a feedback network the same processing unit may be visited more than once. Let us consider two layers F1 & F2 with N and M processing units, respectively. By providing connections to the jth unit in F2, from all the units in F1, we get two network structures instar and outstar, which have fan-in and fan-out geometries.

Topology

Topology of ANN Models When all the connections from units in F1 and F2 are made as in figure c, then we obtain a hetero-association network. This network can be viewed as a group of instars, if the flow is from F1 to F2. On the other hand, if the flow is from F2 to F1 then the network can be viewed as a group of outstars (figure d). When the flow is bidirectional, and the weights are symmetric wij = wji, then we get a bidirectional associative memory (figure e), where either of the layers can be used as input/output. If the two layers F1 and F2 coincide, then we obtain an auto-associative memory in which each unit is connected to every other unit and to itself , figure f.

Basic Functional Units for Pattern Recognition Tasks
There are three types of artificial neural networks. They are: feedforward, feedback and combination of both. The simplest networks of each of these types form the basic functional units. They are functional because they can perform by themselves some simple pattern recognition tasks. They are basic because they form building blocks for developing neural network architectures for complex ,pattern recognition tasks. The simplest FFNN is a 2-layer network with M input units and N output units.

Two Layer Feedforward Neural Network (FFNN)

Functional Units and Pattern Recognition Tasks
Feedforward ANN Pattern association Pattern classification Pattern mapping/classification Feedback ANN Autoassociation Pattern storage (LTM) Pattern environment storage (LTM) Feedforward and Feedback (Competitive Learning) ANN Pattern storage (STM) Pattern clustering Feature map

Pattern Recognition Tasks by FFNN
Pattern association Architecture: Two layers, linear processing, single set of weights Learning:, Hebb's (orthogonal) rule, Delta (linearly independent) rule Recall: Direct Limitation: Linear independence, number of patterns restricted to input dimensionality To overcome: Nonlinear processing units, leads to a pattern classification problem Pattern classification Architecture: Two layers, nonlinear processing units, geometrical interpretation Learning: Perceptron learning Limitation: Linearly separable functions, cannot handle hard problems To overcome: More layers, leads to a hard learning problem Pattern mapping/classification Architecture: Multilayer (hidden), nonlinear processing units, geometrical interpretation Learning: Generalized delta rule (backpropagation) Limitation: Slow learning, does not guarantee convergence To overcome: More complex architecture

Unit 2 Perceptron learning

Perceptron Learning: Supervised
Training and test data sets Training set; input & target are specified

{  . Perceptron Networks  wi xi x1 x2 xn w1 w2 wn w0 o n i=0
One type of NN is based on Perceptron A perceptron computes a sum of weighted combination of its input. If the sum is greater than a certain threshold, then its output is “1” else “-1”. A Linear threshold unit  x1 x2 xn . w1 w2 wn w0  wi xi 1 if  wi xi >0 f(xi)= -1 otherwise o { n i=0

Perceptron Learning Wi (new) = wi (old)+ wi (old) wi =  (t - o) xi
where t = c(x) is the target (desired) value, o is the perceptron output (actual),  Is a small constant (e.g., 0.1) called learning rate. If the output is correct (t = o) the weights wi are not changed If the output is incorrect (t  o) the weights wi are changed such that the output of the perceptron for the new weights is closer to t. The algorithm converges to the correct classification if the training data is linearly separable  is sufficiently small

Perceptron Learning Presentation of the entire training set to the neural network. In the case of the AND function, an iteration consists of four sets of inputs being presented to the network (i.e. [0,0], [0,1], [1,0], [1,1]). Error: The error value is the amount by which the value output by the network differs from the target value. For example, if we required the network to output 0 and it outputs 1, then Error = -1. Target Value, T : When we are training a network we not only present it with the input but also with a value that we require the network to produce. For example, if we present the network with [1,1] for the AND function, the training value will be 1. Output , O : The output value from the neuron. Ij : Inputs being presented to the neuron. Wj : Weight from input neuron (Ij) to the output neuron. LR : The learning rate. This dictates how quickly the network converges. It is set by a matter of experimentation. It is typically 0.1.

Multi- Layer Perceptron
Output Values Input Signals (External Stimuli) Output Layer Adjustable Weights Input Layer

Multi- Layer Perceptron
The input layer: Introduces input values into the network. No activation function or other processing. The hidden layer(s): Performs classification of features. Two hidden layers are sufficient to solve any problem. Features imply more layers may be better. The output layer: Functionally is just like the hidden layers. Outputs are passed on to the world outside the neural network.

Pattern classification network: Perceptron
A two layer feedforward network with nonlinear (hard-limiting) output functions for the units in the output layer can be used to perform the task of pattern classification. The number of units in the input layer corresponds to the dimensionality of the input pattern vectors. The units in the input layer are all linear, as the input layer merely contributes to fan out the input to each of the output units. The number of output units depends on the number of distinct classes in the pattern classification task. We assume that the output units are binary. Each output unit is connected to all the input units, and a weight is associated with each connection.

Since the output function of a unit is a hard-limiting threshold function, for a given set of input-output patterns, the weighted sum of the input values is compared with the threshold for the unit to determine whether the sum is greater or less than the threshold. Thus there is no unique solution for the weights in this case, as in the case of linear associative network. It is necessary to determine a set of weights to satisfy all the inequalities. Determination of such weights is usually. accompanied by means of incremental adjustment of the weights using a learning law. Typically, if the weighted sum of the input values to the output unit exceeds the threshold, the output signal is labelled as 1, otherwise as 0. Multiple binary output units are needed if the number of pattern classes exceeds 2.

Pattern classification problem: If a subset of the input patterns belong to one class (say class A1) and the remaining subset of the input patterns to another class (say class A2) then the objective in a pattern classification problem is to determine a set of weights w1, w2, ..., WM such that if the weighted sum ∑ ai wi > θ, then a = (a1, a2, ..., aM)T belongs to class A1, i = 1 to M ∑ ai wi <= θ, then a = (a1, a2, ..., aM)T belongs to class A2, i = 1 to M Note that the dividing surface between the two classes is given by ∑ ai wi = θ

Perceptron learning law: In the above perceptron classification problem, the input space is an M-dimensional space and the number of output patterns are two, corresponding to the two classes. Suppose the subsets A1 and A2 of points in the M-dimensional space contain the sample patterns belonging to the classes A1 and A2respectively. The objective in the perceptron learning is to systematically adjust the weights for each presentation of an input vector belonging to A1 or A2 along with its class identification. The perceptron learning law for the two-class problem may be stated as follows: w(m + 1) = w(m) + ηa, if a ϵ Al and wT(m)a <= 0 = w(m) - ηa, if a ϵ A2 and wT(m)a > 0 The vectors a and w(m) are the input and weight vectors, respectively, at the mth step, and η is a positive learning rate parameter, can be varying at each learning step, although it is assumed as constant in the perceptron learning.

Perceptron learning law: Linearly Separable classes No adjustment of weights is made when the input vector is correctly classified. That is, w(m + 1) = w(m) , if a ϵ Al and wT(m)a > 0 = w(m) , if a ϵ A2 and wT(m)a <= 0 The initial value of the weight vector w(0) could be random. Fig shows : Decision boundaries formed during implementation of perceptron learning for linearly separable classes.

Geometric Interpretation of Perceptron Learning

Pattern association problem: The input patterns are shown as a1, a2 and the corresponding output patterns as b1, b2, b3. The objective of designing a neural network is to capture the association between input-output pattern pairs in the given set of training data, so that when any of the inputs al is given, the corresponding output bl is retrieved. Input pattern space Output pattern space Association

Pattern association problem: An example of a pattern association problem is associating a unique binary code to a printed alphabet character, say [OOOOO]T for A, [OOOO1]T for B

Pattern association problem: The input patters A,B etc could be represented as black and white pixels in a grid of size, say 16 x 16 points. Then the input pattern space is a binary 256-dimensional space, and the output pattern space is a binary 5-dimensional space. Noisy versions of the input patterns are obtained when some of the pixels in the grid containing a character are transformed from black to white or vice versa.

Pattern classification: In the pattern association problem if a group of input patterns correspond to the same output pattern, then typically there will be far fewer output patterns compared to the number of input patterns. In other words, if some of the output patterns in the pattern association problem are identical, then the number of distinct output patterns can be viewed as class labels, and the input patterns corresponding to each class can be viewed as samples of that class. The problem then becomes a pattern classification problem. In this case whenever a pattern belonging to a class is given as input, the network identifies the class label. During training, only a few samples of patterns for each class are given. In testing, the input pattern is usually different from the patterns used in the training set for the class.

Pattern classification : can be done using perceptron learning rule. Input pattern space Output pattern space Classification

Pattern classification problem An example : An example of pattern classification problem could be labeling hand printed characters within a specified grid into the corresponding printed character. Note that the printed character patterns are unique and fixed in number, and serve as class labels. These labels could be a unique 5-bit code.

Pattern mapping: Given a set of input-output pattern pairs as in the pattern association problem, if the objective is to capture the implied mapping, instead of association, then the problem becomes a pattern mapping problem. In a pattern mapping problem both the input and the output patterns are only samples from the mapping system. Once the system behaviour is captured by the network, the network would produce a possible output pattern for a new input pattern, not used in the training set. The possible output pattern would be approximately an interpolated version of the output patterns corresponding to the input training patterns close to the given test input pattern. Thus the network displays an interpolative behavior. A pattern mapping problem is the most general case, from which the pattern classification and pattern association problems can be derived as special cases.

Pattern mapping: An example of the data for a pattern mapping problem could be the input data given to a complex physical system and the corresponding output data from the system for a number of trials. The objective is to capture the unknown system behaviour from the samples of the input-output pair data. Input pattern space Output pattern space

References Artificial Neural Networks, B. Yegnanarayana
2. F. Rosenblatt, "The perceptron: A probabilistic model for information storage and organization in the brain", Psychological Review, vol. 65, pp , 1958. 3. M.L. Minsky and S.A. Papert, Perceptrons, expanded ed., Cambridge, MA: MIT Press, 1990.

Unit 1 Basics of Artificial Neural Networks

Similar presentations

Presentation on theme: "Unit 1 Basics of Artificial Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unit 1 Basics of Artificial Neural Networks

Similar presentations

Presentation on theme: "Unit 1 Basics of Artificial Neural Networks"— Presentation transcript:

Similar presentations

About project

Feedback