# LOGO Classification III Lecturer: Dr. Bo Yuan

## Presentation on theme: "LOGO Classification III Lecturer: Dr. Bo Yuan"— Presentation transcript:

LOGO Classification III Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Overview  Artificial Neural Networks 2

Biological Motivation 3  10 11 : The number of neurons in the human brain  10 4 : The average number of connections of each neuron  10 -3 : The fastest switching times of neurons  10 -10 : The switching speeds of computers  10 -1 : The time required to visually recognize your mother

Biological Motivation  The power of parallelism  The information processing abilities of biological neural systems follow from highly parallel processes operating on representations that are distributed over many neurons.  The motivation of ANN is to capture this kind of highly parallel computation based on distributed representations.  Sequential machines vs. Parallel machines  Group A  Using ANN to study and model biological learning processes.  Group B  Obtaining highly effective machine learning algorithms, regardless of how closely these algorithms mimic biological processes. 4

Neural Network Representations 5

Robot vs. Human 6

Perceptrons 7 ∑ x1x1 x2x2 xnxn w1w1 w2w2 wnwn...... w0w0 x 0 =1

Power of Perceptrons 8 -0.8 0.5 -0.3 0.5 Input ∑ Output 00-0.80 01-0.30 10 0 110.31 Input ∑ Output 00-0.30 010.21 10 1 110.71 ANDOR

Error Surface 9 Error w1w1 w2w2

Gradient Descent 10 Learning Rate Batch Learning

Delta Rule 11

Batch Learning GRADIENT_DESCENT (training_examples, η)  Initialize each w i to some small random value.  Until the termination condition is met, Do  Initialize each Δw i to zero.  For each in training_examples, Do Input the instance x to the unit and compute the output o For each linear unit weight w i, Do –Δw i ← Δw i + η(t-o)x i  For each linear unit weight w i, Do w i ← w i + Δw i 12

Stochastic Learning 13 For example, if x i =0.8, η=0.1, t=1 and o=0 Δw i = η(t-o)x i = 0.1 × (1-0) × 0.8 = 0.08 + - + + + - - - + + - -

Stochastic Learning: NAND 14 Input Tar get Initial Weights Output ErrorCorrection Final Weights IndividualSum Final Output x0x0 x1x1 x2x2 tw0w0 w1w1 w2w2 C0C0 C1C1 C2C2 SoERw0w0 w1w1 w2w2 x 0 · w 0 x 1 · w 1 x 2 · w 2 C0+C1+C2C0+C1+C2 t-oLR x E 1001000000001+0.10.100 1011 00 00 01+0.10.200.1 11010.200.10.200 01+0.10.30.1 11100.30.1 0.30.1 0.50000.30.1 10010.30.1 0.300 01+0.10.40.1 10110.40.1 0.400.10.501+0.10.50.10.2 11010.50.10.20.50.100.61000.50.10.2 11100.50.10.20.50.10.20.81-0.10.400.1 10010.400.10.400 01+0.10.500.1................. 1101 0.8-.2-.10.8-.200.61000.8-.2-.1 threshold=0.5 learning rate=0.1

Multilayer Perceptron 15

XOR 16 - - InputOutput 000 011 101 110 Cannot be separated by a single line. + + p q

XOR 17 OR NAND AND OR NAND - + + - p q pq

Hidden Layer Representations 18 pq ORNAND AND 00010 01111 10111 11100 InputHiddenOutput -++ - AND OR NAND

The Sigmoid Threshold Unit 19 ∑ x1x1 x2x2 xnxn w1w1 w2w2 wnwn...... w0w0 x 0 =1

Backpropagation Rule 20 x ji = the i th input to unit j w ji = the weight associated with the i th input to unit j net j = ∑w ji x ji (the weighted sum of inputs for unit j ) o j = the output of unit j t j = the target output of unit j σ = the sigmoid function outputs = the set of units in the final layer Downstream (j ) = the set of units directly taking the output of unit j as inputs j i

Training Rule for Output Units 21

Training Rule for Hidden Units 22 k j

BP Framework  BACKPROPAGATION (training_examples, η, n in, n out, n hidden )  Create a network with n in inputs, n hidden hidden units and n out output units.  Initialize all network weights to small random numbers.  Until the termination condition is met, Do  For each in training_examples, Do Input the instance x to the network and computer the output o of every unit. For each output unit k, calculate its error term δ k For each hidden unit h, calculate its error term δ h Update each network weight w ji 23

More about BP Networks …  Convergence and Local Minima  The search space is likely to be highly multimodal.  May easily get stuck at a local solution.  Need multiple trials with different initial weights.  Evolving Neural Networks  Black-box optimization techniques (e.g., Genetic Algorithms)  Usually better accuracy  Can do some advanced training (e.g., structure + parameter).  Xin Yao (1999) “Evolving Artificial Neural Networks”, Proceedings of the IEEE, pp. 1423-1447.  Representational Power  Deep Learning 24

More about BP Networks …  Overfitting  Tend to occur during later iterations.  Use validation dataset to terminate the training when necessary.  Practical Considerations  Momentum  Adaptive learning rate Small: slow convergence, easy to get stuck Large: fast convergence, unstable 25 Time Error Training Validation Weight Error

Beyond BP Networks 26 Elman Network XOR 0 1 1 0 0 0 1 1 0 1 0 1 … ? 1 ? ? 0 ? ? 0 ? ? 1 ? … In Out

Beyond BP Networks 27 Hopfield NetworkEnergy Landscape of Hopfield Network

Beyond BP Networks 28

When does ANN work?  Instances are represented by attribute-value pairs.  Input values can be any real values.  The target output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes.  The training samples may contain errors.  Long training times are acceptable.  Can range from a few seconds to several hours.  Fast evaluation of the learned target function may be required.  The ability to understand the learned function is not important.  Weights are difficult for humans to interpret. 29

Reading Materials  Text Book  Richard O. Duda et al., Pattern Classification, Chapter 6, John Wiley & Sons Inc.  Tom Mitchell, Machine Learning, Chapter 4, McGraw-Hill.  http://page.mi.fu-berlin.de/rojas/neural/index.html.html  Online Demo  http://neuron.eng.wayne.edu/software.html  http://www.cbu.edu/~pong/ai/hopfield/hopfield.html  Online Tutorial  http://www.autonlab.org/tutorials/neural13.pdf  http://www.cs.cmu.edu/afs/cs.cmu.edu/user/mitchell/ftp/faces.html  Wikipedia & Google 30

Review  What is the biological motivation of ANN?  When does ANN work?  What is a perceptron?  How to train a perceptron?  What is the limitation of perceptrons?  How does ANN solve non-linearly separable problems?  What is the key idea of Backpropogation algorithm?  What are the main issues of BP networks?  What are the examples of other types of ANN? 31

Next Week’s Class Talk  Volunteers are required for next week’s class talk.  Topic 1: Applications of ANN  Topic 2: Recurrent Neural Networks  Hints:  Robot Driving  Character Recognition  Face Recognition  Hopfield Network  Length: 20 minutes plus question time 32

Assignment  Topic: Training Feedforward Neural Networks  Technique: BP Algorithm  Task 1: XOR Problem  4 input samples 0 0  0 1 0  1  Task 2: Identity Function  8 input samples 10000000  10000000 00010000  00010000 …  Use 3 hidden units  Deliverables:  Report  Code (any programming language, with detailed comments)  Due: Sunday, 14 December  Credit: 15% 33