2 General ConceptsNeurons: the cells that perform information processing in the brain. It is the fundamental functional unit of all nervous system tissue, including brainSoma: The neuron’s cell bodyDendrites: collection of fibers branching out of the soma body cellAxon: A single long fiber in the collection of dendrites. Eventually, the axon also branches into strands and sub-strands that connect to the dendrites and cell bodies of other neuronsSynapse: The point where stands from two neurons connect
3 Neural NetworksA neural network is composed of a number of nodes, or units, connected by links. Each link has a numeric weight associated with it.Weights are the primary means of long-term storage in neural networks, and learning usually takes place by updating the weights.Each unit has a set of input links from other units, a set of output links to other units, a current activation level, and a means of computing the activation level at the next step in time, given its inputs and weights.
4 Neural NetworksTo build a neural network to perform some task, one must first decide how many units are to be used, what kind of units, and how the units are connected to form a network.One then initializes the weights of the network, and “trains” the weights using a learning algorithm applied to a set of training examples for the task.The use of examples also implies that one must decide how to encode the examples in terms of inputs and outputs of the network.
5 Neural NetworksTo build a neural network to perform some task, one must first decide how many units are to be used, what kind of units, and how the units are connected to form a network.One then initializes the weights of the network, and “trains” the weights using a learning algorithm applied to a set of training examples for the task.The use of examples also implies that one must decide how to encode the examples in terms of inputs and outputs of the network.
6 Simple Computing Elements Each unit performs a simple computation: It receives signals from its input links and computes a new activation level that it sends along each of its output links.The computation of the activation level is based on the values of each input signal received from a neighboring node, and the weights of each input link.The computation is split into two components. First is a linear function ini that computes weighted sum of the unit’s input values. Second is a nonlinear component called the activation function g, that transforms the weighted sum into the final value that serves as the unit’s activation value ai.
7 Models for Activation Functions Different models are obtained by using different mathematical functions for g. Three common choices are the step, sign, and sigmoid functions.+1+1+1tiniini-1Step Sign Sigmoid
8 Network StructuresThere are a variety of kinds of network structure, each of which results in a very different computational properties.The main distinction is between feed-forward and recurrent networks.In a feed-forward network, the links can form arbitrary topologies. In essence these networks are DAGs.Usually we deal with networks that are arranged in layers. In a layered feed-forward network, each unit is linked only to the units in the next layer; there are no links between units in the same layer, no links backward to a previous layer, and no links that skip a layer.
9 Fundamental Network Types Hopfield Networks: They use bi-directional connections with symmetric weights; all of the units are input and output units, the activation function g is the sign function; and the activation levels can only be +1 or -1.Boltzmann Machines: also use symmetric weights, but include units that are neither input nor output units. They also use a stochastic activation function, such that the probability of the output being 1 is some function of the total weighted input.Networks with no hidden units are called perceptrons.Input units are directly connected to the external input sources. Output units are connected to the observed output. Hidden units are neither connected to input sources nor the observed output.Networks with one or more layers of hidden units are called multi-layer networks.
10 Perceptron Neural Network Learning function NEURAL-NETWORK-LEARNING(examples) returns networknetwork = a network with randomly assigned weights;repeatfor each e in examples doO = NEURAL-NETWORK-OUTPUT(network, e);T = the observed output values from e;update the weights in network based on e, O, T;enduntil all examples correctly predicted or stopping criterion is reachedreturn networkEssentiallyErr = T - OWj = Wj + (a * Ij * Err)
11 Multi-Layer Feed-Forward Networks Initial work in the 1950’s.Learning algorithms for multi-layer are neither efficient, nor can guarantee that they can converge to a global optimumOn the other hand, learning general functions from examples is an intractable problem in the worst caseThe most popular method for learning in multi-layer networks is called back-propagation.
12 Back Propagation Learning Learning in multi-layer feed-forward networks using back-propagation proceeds the same way as for perceptrons: example inputs are presented to the network, and if the network computes an output vector that matches the output, nothing is done. If there is an error, then the weights are adjusted to reduce the error.The trick is to assess the blame for an error and divide it among the contributing weights. In perceptrons, this is easy because there is only one weight between each input and the output. But in multilayer networks, there are many weights connecting each input to an output, and each of these weights contributes to more than one outputThe back-propagation algorithm is a sensible approach to dividing the contribution of each weight.
13 Back Propagation Learning As in the perceptron learning algorithm, we try to minimize the error between each target output and the output value computed by the network.At the output layer, the weight update rule is very similar to the rule for the perceptrons. However, there are two differences: The activation of the hidden unit aj is used instead of the input value, and the rule contains a term for the gradient of the activation function.If Erri is the error Ti - O at the output node, then the weight update rule for the link from unit j to unit i isW j,i = W j,i + (alpha * aj * Erri * g’(ini)where g’ is the derivative of the activation function g, and the above can be rewritten as:Wj,i = Wj, i + alpha * aj * Deltai
14 Back Propagation Learning On the previous formula, for updating the connections between the input units and the hidden units, we need to define a quantity analogous to the error term for output nodes.The idea is that hidden node j is “responsible” for some fraction of the error Deltai, in each of the output nodes to which it connects. Thus, the Deltai values are divided according to the strength of the connection between the hidden node and the output node, and propagated back to provide the Deltai values for the hidden layer. The propagation rule for the Delta values is the following:Deltai = g’(inj) * Sumi (Wj,i * Deltai)Now the update rule for the weights between the inputs and the hidden layer is almost identical to the update rule for the output layer:W k,j = W k,j + (alpha * Ik * Deltaj)
15 Back Propagation Learning The learning algorithm can be summarized as follows:Compute the Delta values for the output units using the observed behaviorStarting with the output layer, repeat the following for each layer in the network, until the earliest (closest to input) hidden layer is reachedPropagate the Delta values values back to the previous layerUpdate the weights between the two layers
16 Back Propagation Learning Algorithm Algorithm Back-Prop-Update(network, examples, alpha) : new network weightsrepeatfor each e in examples doO = Run-Network(network, Ie)Erre = Te - OW j,i = W j,i + (alpha * aj * Erre i * g’(ini))for each subsequent layer in network doDeltaj = g’(inj) * Sum i W j,i * Delta IW k,j = W k,j +(alpha * Ik * Deltaj)enduntil network has converged
17 DiscussionExpressiveness: Well suited for continuous input/output, but do not have the expressive power of general logical representationsComputational Efficiency: For m examples and |W| weights each epoch takes O(m|W|) time. The worst case number of epochs is exponential to the number of inputsGeneralization: Good on generalizing on continuous functions that vary smoothly with the inputSensitivity to noise: Very sensitive to noise since they do non-linear regressionTransparency: Neural networks are essential black boxesPrior knowledge: Difficult to chose good training examples, and the best network topology