Download presentation
Presentation is loading. Please wait.
Published byShannon Berry Modified over 8 years ago
1
MACHINE LEARNING 12. Multilayer Perceptrons
2
Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 10 10 Large connectitivity: 10 5 Parallel processing Distributed computation/memory Robust to noise, failures
3
Perceptron Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Learn weight for given task
4
Perceptron Weights define a hyperplane (line in 2D) Split space into 2 parts (above/below hyperplane) Can be used to split data into 2 classes (linear discriminant) Threshold function Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4
5
K Outputs classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5
6
Online Learning Batch training All data given at once Online training Sample by Sample Update parameters on each new instance Save memory Data changing in time Baesed on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6
7
Online Learning: Regression Error Update Gradually decreasing learning factor Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7
8
Online Learning : Classification Input: (y,r) where y is a data instance and r=1 if from class C 1 Option I: Using threshold function Difficulties to treat analytically (gradient?) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8
9
Online Learning: Classification Replace threshold with sigmoid Sigmoid output=>probability of a class True(training) probability is either 0 or 1 (r) Error measure: Cross-Entropy Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9
10
Online Learning: Classification Reduce cross-entropy error Gradient descent rule Update=LearningFactor*(DesiredOutput-ActualOutput)*Input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10
11
Example: Learning Boolean Function Two Boolean inputs (0 or 1) Single Boolean output (0 or 1) Can be seen as classification problem Not all Boolean function separable (can be learned) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11
12
Learning Boolean AND Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12
13
XOR No w 0, w 1, w 2 satisfy: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 (Minsky and Papert, 1969)
14
Multilayer Perceptron (MLP) Can learn only linear discriminant or linear regression Introduce intermediate/hidden layers Non linear “activation” function in hidden layers Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14
15
Multilayer Perceptrons Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15
16
Multilayer Perceptron Hidden layer Non-linear transformation from d-dimensional to H- dimensional Second output layer implement linear combination of non-linear basis functions Can add more hidden layers Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16
17
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 x 1 XOR x 2 = (x 1 AND ~x 2 ) OR (~x 1 AND x 2 )
18
Universal Approximator MLP with one hidden layer and arbitrary number of hidden perceptrons can learn any nonlinear function of input Baesed on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18
19
Backpropagation: Intro Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19
20
Backpropagation: Regression Model Error Update second layer Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20
21
Backpropoagation: Regression Use chain rule to obtain first layer Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21
22
Backpropogation: Regression Update first layer using old values(v) of the second layer Update second layer using new values of Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22
23
23
24
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 whx+w0whx+w0 zhzh vhzhvhzh
25
One sigmoid output y t for P(C 1 |x t ) and P(C 2 |x t ) ≡ 1-y t Two-Class Discrimination Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25
26
Multiple Hidden Layers Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks
27
Gradient Descent Simple Local (change in weight requires only input and output of single Perceptron) Can be used with online training( data no storage) Can be implemented in hardware May converge slowly Frequently used techniques to improve convergence Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27
28
Momentum At each parameter update, successive Δ w values may be so different that large oscillations may occur and slow convergence Running average by taking previous value Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28
29
Adaptive Learning Rate Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29 Keep learning rate increasing when error decrease Going to right direction Decrease learning rate when error increase Missed the minimum
30
Overfitting/Overtraining Overfitting: Too many parameters/hidden nodes Fits to the noise present at training sample Poor generalization error Use cross-validation Overtraining Error function have several minima Most important weights become non-zero first As training continues, more weights move away from zero, adding more parameters Stop earlier using cross-validation 30
31
Overfitting 31 Number of weights: H (d+1)+(H+1)K
32
32
33
Structuring the network In some applications input has a “local structure” Nearby pixels correlated Speech samples close in time Hidden units are not connected to all input Define a windows over input Number of parameters is reduced Connect nodes only to a subset of nodes in other layers Hierarchical cone: features get more complex Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 33
34
Structured MLP 34
35
Hints Have some prior expert knowledge Hint: properties of the target function known independent of training examples For instance: Invariance to rotation 35
36
Hints 36 Create Virtual examples Generated multiple rotated copies of the example and feed them to MLP Preprocessing State Scale and align all inputs Incorporate into structure Local structure
37
Tuning Network Size Networks with unnecessary large number of parameters generalize poorly Too many possible configuration (nodes/connections) to try for cross-validation Use structural adaptation Destructive approach: gradually remove unnecessary nodes and connections Constructive approach: gradually add new nodes/connections Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 37
38
Destructive method: Weight decay Punish for large connection weight Connections with weight close to zero “disappears” Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 38
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.