 # A BRIEF Introduction to Neural Networks

## Presentation on theme: "A BRIEF Introduction to Neural Networks"— Presentation transcript:

A BRIEF Introduction to Neural Networks
Luke Flemmer

History First proposed by McCulloch and Pitts in 1943
Early models were binary; shown to be Turing complete for large enough networks McCulloch and Pitts also demonstrated, in 1947, how such networks could perform pattern recognition Rosenblatt coined term ‘Perceptron’ Criticism of two layer networks by Minksy and Papert in led to reduced interest in the field Re-emergence in early 80’s driven by more powerful hardware, and multi-layer networks

A Biological Neuron

An Artificial Neuron Activation Value Input Summation
Transfer Function One or more inputs connect into the neuron. Each connection has a weight, which is analogous to the synaptic strength of the connection. The effect that the input has on this neuron will be determined by the original strength of the signal, and its weight. The neuron sums all the inputs, and applies a transfer function, to define a single scalar value for its output. The transfer function can be as simple as y = x, i.e. just pass on the summed value. The output is then fed to other neurons.

A Simple Linear Network
4 12 8 W = .5 6 4 5 W = .5

And Another… 1 W = 1 W = 0 1 W = 1 W = 0

The Basic Node Evaluation
Input Values are the product of the Activation Values of incoming nodes and the weight of their connection NetInput = ∑(incomingNodeValue * connectionStrength) Inputs can be inhibitory (negative weight) Simple model is called Linear Activation: OutputValue = InputValue More complex (non-linear)functions can be used Often activation is gated with some threshold, to generate a binary output You can use a bias to adjust firing threshold

More Sophisticated Networks
Introduce intermediate layers between input and output Must use a non-linear activation function; if we don’t the set of intermediate nodes can be collapsed to a single set of weights in a two layer network, rendering the hidden layer(s) useless Popular activation function is the ‘Logistic Activation Function’: ActivationValue = 1 / (1 + e ^ ( -1 * netInput)) netInput is the sum of weighted inputs, as we have used previously ‘hidden layer’

Learning Systems Also called Evolutionary or Adaptive Systems
Systems are able to adjust their characteristics based on the data to which they are exposed The systems ‘learn’ the characteristics of the data – in the case of networks, the connection weights between layers Other examples would be Genetic Algorithms and other Gradient Descent methods like Simulated Annealing and Simplex Neural Networks and Genetic Algorithms have in common that they can evolve not just the constants, but the model itself Two kinds – Supervised and Unsupervised Learning (e.g. clustering in Kohonen Maps) Supervised systems use training datasets Try to minimize error, but there is a risk of over-fitting

Learning in Networks: Hebbian Learning
Simplest kind, and basis of more complex models Based on neurobiological theory that synaptic connections are strengthened between neurons that fire concurrently Only applicable to two-layer networks Output values are fixed at the expected value Over multiple iterations (called ‘epochs’), the value of each connection is adjusted with the formula: ∆weight = OuputValue * InputValue * LearningRate Where Learning rate is a constant, e.g. 0.05 This has the effect of strengthening weights between an input node and an output node when the value of the input agrees with the value of the output node (based on the sum of all its inputs) This is not a stable formula, i.e. it will not converge on the optimal values for the weights

Learning in Networks – The Delta Rule
Also only applicable to two layer networks Similar to Hebbian learning, but does not fix output values Rather, compares actual output with desired output at each epoch, and generates an error correction (effectively minimizes sum of square errors across the networks outputs) Assuming a linear activation function, the formula for adjustment of the weight of a connection to an individual output unit is given by: ∆weight = (DesiredActivation – ActualActivation) * Input Value * LearningRate This has the effect of adjusting the connection weight based on the discrepancy between expected and actual output, and doing it in proportion to how large the input value was This means that, if the input value of a particular node was 0, the connection between it and the otuput unit will not have its weight adjusted, since we cannot hold it responsible for the error on the output We are actually taking the partial derivative of the total output error across all nodes, with respect to the input weight for each node. This will converge to the optimal weights for the network

Learning in Networks: Back Propagation
More sophisticated approach Applicable to networks incorporating one or more ‘hidden layers’ The concept is the same as for the Delta Rule: we seek to minimize the sum of squared errors of the networks output, measured against the training set The problem of dividing the responsibility for the error across the multiple layers becomes more complex We end up with a recursively applied equation in which we apply the error backwards from the output layer through the one or more hidden layers, using the weights to determine how much error to expose to the hidden layers

Back Propagation Learning: A Visual Representation
Error = 6 * * 0.25 = 5 Error = 6 Weight = 0.5 Weight = 0.25 Error = 8 Direction of Error Propagation

‘A’ A Non-trivial Network Pixel 1,1 = 0 Pixel 4,9 = 1 1
400 Node Input Layer 400 Node Hidden Layer 8 Node Output Layer Pixel 1,1 = 0 Pixel 4,9 = 1 1 ASCII Character 65 Pixel 18,16 = 1 1 Pixel 20,20 = 0

Auto-Associative Networks
Use excitatory and inhibitory connections to establish connections between items and their properties When an item is associated with some set of properties, excitatory connections are created between them, and inhibitory connections are created to the non-associated properties This has the effect of making related items and properties ‘pop out’ when one of them is chosen For this reason, such systems are also known as ‘content addressable memories’ as items are retrieved by presenting related content stimuli to the network Generally, the network is evaluated over several cycles to allow the activation to propagate, until it reaches a steady state, which is the network’s ‘answer’ to the inputs

An Example Auto-Associative Network
Presenting ‘Steve’ to the network will yield responses for ‘Lawyer’ and ‘NYC’ as well as weaker responses for ‘Ian’ and ‘London’ Presenting ‘Doctor’ to the network will yield a response of ‘Marcy’ and a weaker response of ‘Paris’ Presenting ‘Lawyer’ and ‘London’ will yield a response of ‘Ian’ – this is the essence of a ‘content addressable memory’ The excitatory / inhibitory connection strengths do not have to be binary, and can be learned from the data using the Hebbian learning rule. More complex and interdependent networks yield richer semantic discovery Steve NYC Lawyer Ian London Doctor Marcy Paris Architect John Oslo

Attractive Aspects of Networks
Neural Plausibility More interesting for neuroscientists than computer scientists Support for Soft Constraints Graceful degradation Content-addressable memory Auto-associative networks Ability to learn and to generalize

Weaknesses of Networks
Have been called ‘the second best way to do anything’ Biological plausibility of the classic model is poor Problems of modeling scale, particularly in the area of psychology Researchers look for equivalence between human learning and the behavior of networks with a tiny number of nodes Large number of training epochs required Still prone to the same problems as other optimization techniques – over/under fitting, local minima etc

Interesting Areas of Research
Temporal correlation Spiking models Feedback and Recursion Content addressable memory Hierarchical Networks Hippocampal Modelling

A Short Reading List Connectionism and The Mind – William Bechtel and Adele Abrahamsen The Quest For Consciousness – Christof Koch Gateway to Memory – Mark Gluck and Catherine Myers On Intelligence – Jeff Hawkins and Sandra Blakeslee

Similar presentations