2 Previously in ‘Statistical Methods’... Agents can handle uncertainty by using the methods of probability and decision theoryBut first they must learn their probabilistic theories of the world from experience...
3 Previously in ‘Statistical Methods’... Key Concepts :Data : evidence, i.e., instantiation of one or more random variables describing the domainHypotheses : probabilistic theories of how the domain works
4 Previously in ‘Statistical Methods’... OutlineBayesian learningMaximum a posteriori and maximum likelihood learningInstance-based learningNeural networks...
5 Outline Some slides from last week... Network structure Perceptrons Multilayer Feed-Forward Neural NetworksLearning Networks?
13 So First... Neural Networks According to Robert Hecht-Nielsen, a neural network is simply “a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs” Simply...We skip the biology for nowAnd provide the bare basics
14 Network StructureInput unitsHidden unitsOutput units
15 Network Structure Feed-forward networks Recurrent networks Feedback from output units to input
16 Feed-Forward NetworkFeed-forward network = a parameterized family of nonlinear functionsg is activation functionWs are weights to be adapted, i.e., the learning
17 Activation FunctionsOften have form of a step function [a threshold] or sigmoidN.B. thresholding = ‘degenerated’ sigmoid
18 Perceptrons Single-layer neural network Expressiveness Perceptron with g = step function can learn AND, OR, NOT, majority, but not XOR
19 Learning in Sigmoid Perceptrons The idea is to adjust the weights so as to minimize some measure of error on the training setLearning is optimization of the weightsThis can be done using general optimization routines for continuous spaces
20 Learning in Sigmoid Perceptrons The idea is to adjust the weights so as to minimize some measure of error on the training setError measure most often used for NN is the sum of squared errors
21 Learning in Sigmoid Perceptrons Error measure most often used for NN is the sum of squared errorsPerform optimization search by gradient descentWeight update rule [ is learning rate]
23 Some Remarks[Thresholded] perceptron learning rule converges to a consistent function for any linearly separable data set[Sigmoid] perceptron output can be interpreted as conditional probabilityAlso interpretation in terms of maximum likelihood [ML] estimation possible
24 Multilayer Feed-Forward NN Network with hidden unitsAdding hidden layers enlarges the hypothesis spaceMost common : single hidden layer
26 ExpressivenessWith a single, sufficiently large, hidden layer it is possible to approximate any continuous functionWith two layers, discontinuous functions can be approximated as wellFor particular networks it is hard to say what exactly can be represented
27 Learning in Multilayer NN Back-propagation is used to perform weight updates in the networkSimilar to perceptron learningMajor difference is that output error is clear, but how to measure the error at the nodes in the hidden layers?Additionally, should deal with multiple outputs
28 Learning in Multilayer NN At output layer weight-update rule is the similar as for perceptron [but then for multiple outputs i] whereIdea of back-propagation : every hidden unit contributes some fraction to the error of the output node to which it connects
29 Learning in Multilayer NN [...] contributes some fraction to the error of the output node to which it connectsThus errors are divided according to connection strength [or weights]Update rule :
30 E.g.Training curve for 100 restaurant examples : exact fit
31 Learning NN Structures? How to find the best network structure?Too big results in ‘lookup table’ behavior / overtrainingToo small in ‘undertraining’ / not exploiting the full expressivenessPossibility : try different structures and validate using, for example, cross-validationBut which different structures to consider?Start with fully connected network and remove nodes : optimal brain damageGrowing larger networks [from smaller ones], e.g. tiling and NEAT
32 Learning NN Structures : Topic for later Lecture?
33 Finally, Some RemarksNN = possibly complex nonlinear function with many parameters that have to be tunedProblems : slow convergence, local minimaBack-propagation explained, but other optimization schemes are possiblePerceptron can handle linear separable functionsMultilayer NN can represent any kind of functionHard to come up with optimal networkLearning rate, initial weights, etc. have to be setNN : not much magic there... “Keine Hekserei, nur Behändigkeit!”
34 And with that Disappointing Message... We take a break...