Presentation is loading. Please wait.

Presentation is loading. Please wait.

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Similar presentations


Presentation on theme: "MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)"— Presentation transcript:

1 MACHINE LEARNING 12. Multilayer Perceptrons

2 Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Networks of processing units (neurons) with connections (synapses) between them  Large number of neurons: 10 10  Large connectitivity: 10 5  Parallel processing  Distributed computation/memory  Robust to noise, failures

3 Perceptron Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Learn weight for given task

4 Perceptron  Weights define a hyperplane (line in 2D)  Split space into 2 parts (above/below hyperplane)  Can be used to split data into 2 classes (linear discriminant)  Threshold function Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4

5 K Outputs classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5

6 Online Learning  Batch training  All data given at once  Online training  Sample by Sample  Update parameters on each new instance  Save memory  Data changing in time Baesed on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6

7 Online Learning: Regression  Error  Update  Gradually decreasing learning factor Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7

8 Online Learning : Classification  Input: (y,r) where y is a data instance and r=1 if from class C 1  Option I: Using threshold function  Difficulties to treat analytically (gradient?) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8

9 Online Learning: Classification  Replace threshold with sigmoid  Sigmoid output=>probability of a class  True(training) probability is either 0 or 1 (r)  Error measure: Cross-Entropy Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9

10 Online Learning: Classification  Reduce cross-entropy error  Gradient descent rule  Update=LearningFactor*(DesiredOutput-ActualOutput)*Input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10

11 Example: Learning Boolean Function  Two Boolean inputs (0 or 1)  Single Boolean output (0 or 1)  Can be seen as classification problem  Not all Boolean function separable (can be learned) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11

12 Learning Boolean AND Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12

13 XOR  No w 0, w 1, w 2 satisfy: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 (Minsky and Papert, 1969)

14 Multilayer Perceptron (MLP)  Can learn only linear discriminant or linear regression  Introduce intermediate/hidden layers  Non linear “activation” function in hidden layers Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14

15 Multilayer Perceptrons Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15

16 Multilayer Perceptron  Hidden layer  Non-linear transformation from d-dimensional to H- dimensional  Second output layer implement linear combination of non-linear basis functions  Can add more hidden layers Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16

17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 x 1 XOR x 2 = (x 1 AND ~x 2 ) OR (~x 1 AND x 2 )

18 Universal Approximator  MLP with one hidden layer and arbitrary number of hidden perceptrons can learn any nonlinear function of input Baesed on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18

19 Backpropagation: Intro Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19

20 Backpropagation: Regression  Model  Error  Update second layer Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20

21 Backpropoagation: Regression  Use chain rule to obtain first layer Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21

22 Backpropogation: Regression  Update first layer using old values(v) of the second layer  Update second layer using new values of Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22

23 23

24 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 whx+w0whx+w0 zhzh vhzhvhzh

25  One sigmoid output  y t for P(C 1 |x t ) and P(C 2 |x t ) ≡ 1-y t Two-Class Discrimination Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25

26 Multiple Hidden Layers Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26  MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks

27 Gradient Descent  Simple  Local (change in weight requires only input and output of single Perceptron)  Can be used with online training( data no storage)  Can be implemented in hardware  May converge slowly  Frequently used techniques to improve convergence Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27

28 Momentum  At each parameter update, successive Δ w values may be so different that large oscillations may occur and slow convergence  Running average by taking previous value Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28

29 Adaptive Learning Rate Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29  Keep learning rate increasing when error decrease  Going to right direction  Decrease learning rate when error increase  Missed the minimum

30 Overfitting/Overtraining  Overfitting: Too many parameters/hidden nodes  Fits to the noise present at training sample  Poor generalization error  Use cross-validation  Overtraining  Error function have several minima  Most important weights become non-zero first  As training continues, more weights move away from zero, adding more parameters  Stop earlier using cross-validation 30

31 Overfitting 31 Number of weights: H (d+1)+(H+1)K

32 32

33 Structuring the network  In some applications input has a “local structure”  Nearby pixels correlated  Speech samples close in time  Hidden units are not connected to all input  Define a windows over input  Number of parameters is reduced  Connect nodes only to a subset of nodes in other layers  Hierarchical cone: features get more complex Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 33

34 Structured MLP 34

35 Hints  Have some prior expert knowledge  Hint: properties of the target function known independent of training examples  For instance: Invariance to rotation 35

36 Hints 36  Create Virtual examples  Generated multiple rotated copies of the example and feed them to MLP  Preprocessing State  Scale and align all inputs  Incorporate into structure  Local structure

37 Tuning Network Size  Networks with unnecessary large number of parameters generalize poorly  Too many possible configuration (nodes/connections) to try for cross-validation  Use structural adaptation  Destructive approach: gradually remove unnecessary nodes and connections  Constructive approach: gradually add new nodes/connections Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 37

38 Destructive method: Weight decay  Punish for large connection weight  Connections with weight close to zero “disappears” Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 38


Download ppt "MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)"

Similar presentations


Ads by Google