Presentation on theme: "74.419 Artificial Intelligence 2004 - Neural Networks - Neural Networks (NN) basic processing units general network architectures learning qualities and."— Presentation transcript:
74.419 Artificial Intelligence 2004 - Neural Networks - Neural Networks (NN) basic processing units general network architectures learning qualities and problems of NNs
Neural Networks – Central Concepts biologically inspired –McCulloch-Pitts Neuron (automata theory), Perceptron basic architecture –units with activation state, –directed weighted connections between units –"activation spreading", output used as input to connected units basic processing in unit –integrated input: sum of weighted outputs of connected “pre-units” –activation of unit = function of integrated input –output depends on input/activation state –activation function or output function often threshold dependent, also sigmoid (differentiable for backprop!) or linear
Anatomy of a Neuron
Diagram of an Action Potential From: Ana Adelstein, Introduction to the Nervous System, Part I http://www.ualberta.ca/~anaa/PSYCHO377/PSYCH377Lectures/L02Psych377/
General Neural Network Model Network of simple processing units (neurons) Units connected by weighted links (labelled di- graph; connection matrix)
Neuron Model as FSA
NN - Activation Functions Sigmoid Activation FunctionThreshold Activation Function (Step Function) adapted from Thomas Riga, University of Genoa, Italy http://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp
Parallelism – Competing Rules
NN Architectures + Function Feedforward, layered networks simple pattern classification, function estimating Recurrent networks for space/time-variant input (e.g. natural language) Completely connected networks Boltzman Machine, Hopfield Network optimization; constraint satisfaction Self-Organizing Networks SOMs, Kohonen networks, winner-take-all (WTA) networks unsupervised development of classification best-fitting weight vector slowly adapted to input vector
NN Architectures + Function Feedforward networks layers of uni-directionally connected units strict forward processing from input to output units simple pattern classification, function estimating, decoder, control systems Recurrent networks Feedforward network with internal feedback (context memory) processing of space/time-variant input, e.g. natural language e.g. Elman networks
Haykin, Simon: Neural Networks - A Comprehensive Foundation, Prentice-Hall, 1999, p. 22. Feed-forward Network
NN Architectures + Function Completely connected networks all units bi-directionally connected positive weight positive association between units; units support each other, are compatible optimization; constraint satisfaction Boltzman Machine, Hopfield Network Self-Organizing Networks SOMs, Kohonen networks, also winner-take-all (WTA) networks best-fitting weight vector slowly adapts to input vector unsupervised learning of classification
Neural Networks - Learning Learning = change connection weights adjust connection weights in network, changes input- output behaviour, make it react “properly” to input pattern –supervised = network is told about “correct” answer = teaching input; e.g. backpropagation, reinforcement learning –unsupervised = network has to find correct output (usually classification of input patterns) on it’s own; e.g. competitive learning, winner- take-all networks, self-organizing or Kohonen maps
Backpropagation - Schema Backpropagation - Schematic Representation The input is processed in a forward pass. Then the error is determined at the output units and propagated back through the network towards the input units. adapted from Thomas Riga, University of Genoa, Italy http://www.hku.nl/~pieter/EDU/neuro/doc/ThomasRiga/ann.htm#pdp
Backpropagation Learning Backpropagation Learning is supervised Correct input-output relation is known for some pattern samples; take some of these patterns for training: calculate error between produced output and correct output; propagate error back from output to input units and adjust weights. After training perform tests with known I/O patterns. Then use with unknown input patterns. Idea behind the Backpropagation Rule (next slides): Determine error for output units (compare produced output with 'teaching input' = correct or wanted output). Adjust weights based on error, activation state, and current weights. Determine error for internal units based on the derivation of activation function. Adjust weights for internal units using the error function, using an adapted delta-rule.
NN-Learning as Optimization Learning: adjust network in order to adapt its input-output behaviour so that it reacts “properly” to input patterns Learning as optimization process: find parameter setting for network (in particular weights) which determines network that produces best-fitting behaviour (input-output relation) minimize error in I/O behaviour optimize weight setting w.r.t error function find minimum in error surface for different weight settings Backpropagation implements a gradient descent search for correct weight setting (method not optimal) Statistical models (include a stochastic parameter) allow for “jumps” out of local minima (cf. Hopfield Neuron with probabilistic activation function, Thermodynamic Models with temperature parameter, Simulated Annealing) Genetic Algorithms can be used to determine parameter setting of Neural Network.
Backpropagation - Delta Rule The Error is calculated as err i = (t i - y i ) where t i is the teaching input (the correct or wanted output) y i is the produced output Note: In the textbook it is called (T i - O i ) Backpropagation- or delta-rule: w j,i w j,i + a j i where is a constant, the learning rate, a j is the activation of u j and i is the backpropagated error. i = err i g' for units in the output layer j = g' (x j ) w j,i i for internal hidden units Where g' is the derivative of the activation function g. Thenw k,j w k,j + x k j
Backpropagation as Error Minimization Find Minimum of the Error function E = 1/2 i (t i - y i ) 2 Transform the above formula by integrating the weights (substitute the output term y i with g( w j,i a j ) = sum of weighted outputs of pre-neurons): E(W) = 1/2 i (t i - g( j w j,i a j )) 2 where W is the complete weight matrix for the net. Determine the derivative of the error function (the gradient) w.r.t to a single weight w k,j : dE / dw k,j = -x k j To minimize the error, take the inverse of the gradient (+x k j ). This yields the Backpropagation- or delta-rule: w k,j w k,j + x k j
Implementation of Backprop-Learning Choose description of input and output patterns which is suitable for the task. Determine test set and training set (disjoint sets) Do – in general thousands of – training runs (with various patterns) until parameters of the NN converge. The training goes several times through the different pattern classes (outputs), either one class at a time or one pattern from each class at a time. Measure performance of the network for test data (determine error – wrong vs. right reaction of NN) re-train if necessary
Competitive Learning 1 Competitive Learning is unsupervised. Discovers classes in the set of input patterns. Classes are determined by similarity of inputs. Determines (output) unit which responds to all sample inputs of the same class. Unit reacts to patterns which are similar and thus represents this class. Different classes are represented by different units. The system can thus - after learning - be used for classification.
Competitive Learning 2 Units specialize to recognize pattern classes Unit which responds strongest (among all units) to the current input, moves it's weight vector towards the input vector (use e.g. Euclidean distance): reduce weight on inactive lines, raise weight on active lines all other units keep or reduce their weights (often a Gaussian curve used to determine which units change their weights and how) Winning units (their weight vectors) represent a prototype of the class they recognize.
from Haykin, Simon: Neural Networks, Prentice-Hall, 1999, p. 60 Competitive Learning - Figure
Example: NetTalk (from 319) Terry Sejnowski of Johns Hopkins developed a system that can pronounce words of text The system consists of a backpropagation network with 203 input units (29 text characters, 7 characters at a time), 80 hidden units, and 26 output units –The system was developed over a year The DEC-talk system consists of hand-coded linguistic rules for speech pronunciation –developed over approximately 10 years DEC-talk outperforms NETtalk but DEC-talk required significantly more development time
NetTalk (from 319) "This exemplifies the utility of neural networks; they are easy to construct and can be used even when a problem is not fully understood. However, rule-based algorithms usually out-perform neural networks when enough understanding is available” »Hertz, Introduction to the Theory of Neural Networks, p. 133
NETtalk - General Feedforward network architecture NETtalk used text as input Text was moved over input units ("window") split text into fixed length input with some overlap between adjacent text windows Output represents controls for Speech Generator Training through backpropagation Training Patterns from human-made phonetic transcripts
NETtalk - Processing Unit
NETtalk - Network Architecture
NETtalk - Some Articulatory Features (Output)
NN - Caveats 1 often 3 layers necessary Perceptron, Minsky&Papert’s analysis linearly separable pattern classes position dependence visual pattern recognition can depend on position of pattern in input layer / matrix introduce feature vectors (pre-analysis yields features of patterns; features input to NN) time- and space invariance patterns may be stretched / squeezed in space / time dimension (visual objects, speech)
NN - Caveats 2 Recursive structures and functions not directly representable due to fixed architecture (fixed size) move window of input units over input (which is larger than input window) store information in hidden units ("context memory") and feedback into input layer use hybrid model Variable binding and value assignment simulation possible through simultaneously active, synchronized units (cf. Lokendra Shastri)
Additional References Haykin, Simon: Neural Networks – A Comprehensive Foundation, Prentice-Hall, 1995. Rumelhart, McClelland & The PDP Research Group: Parallel Distributed Processing. Explorations into the Microstructures of Cognition, The MIT Press, 1986.
Neural Networks Web Pages The neuroinformatics Site (incl. Software etc.) http://www.neuroinf.org/ Neural Networks incl. Software Repository at CBIIS (Connectionist-Based Intelligent Information Systems), University of Otago, New Zealand http://divcom.otago.ac.nz/infosci/kel/CBIIS.html Kohonen Feature Map - Demo http://rfhs8012.fh-regensburg.de/~saj39122/begrolu/ kohonen.html
Neurophysiology / Neurobiology Web Pages Animated diagram of an Action Potential (Neuroscience for Kids - featuring the giant axon of the squid) http://faculty.washington.edu/chudler/ap.html Adult explanation of processes involved in information transmission on the cell level (with diagrams but no animation) http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/E/Exci tableCells.html Similar to above but with animation and partially spanish http://www.epub.org.br/cm/n10/fundamentos/pot2_i.htm
Neurophysiology / Neurobiology Web Pages Kandel's Nobel Lecture "Molecular Biology of Memory Storage: A Dialogue Between Genes and Synapses," December 8, 2000 http://www.nobel.se/medicine/laureates/2000/kandel- lecture.html The Molecular Sciences Institute, Berkeley http://www.molsci.org/Dispatch The Salk Institute for Biological Studies, San Diego http://www.salk.edu/