Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Lecture Slides for.

Similar presentations


Presentation on theme: "INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Lecture Slides for."— Presentation transcript:

1 INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Lecture Slides for

2 CHAPTER 11: Multilayer Perceptrons

3 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Neural Networks Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 10 10 Large connectitivity: 10 5 Parallel processing Distributed computation/memory Robust to noise, failures

4 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Understanding the Brain Levels of analysis (Marr, 1982) 1. Computational theory 2. Representation and algorithm 3. Hardware implementation Reverse engineering: From hardware to theory Parallel processing: SIMD vs MIMD Neural net: SIMD with modifiable local memory Learning: Update by training/experience

5 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Perceptron (Rosenblatt, 1962)

6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 What a Perceptron Does Regression: y=wx+w 0 Classification: y=1(wx+w 0 >0) w w0w0 y x x 0 =+1 w w0w0 y x s w0w0 y x

7 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 K Outputs Classification : Regression :

8 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Training Online (instances seen one by one) vs batch (whole sample) learning:  No need to store the whole sample  Problem may change in time  Wear and degradation in system components Stochastic gradient-descent: Update after a single pattern Generic update rule (LMS rule):

9 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Training a Perceptron: Regression Regression (Linear output):

10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Classification Single sigmoid output K>2 softmax outputs

11 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Learning Boolean AND

12 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 XOR No w 0, w 1, w 2 satisfy: (Minsky and Papert, 1969)

13 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Multilayer Perceptrons (Rumelhart et al., 1986)

14 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 x 1 XOR x 2 = (x 1 AND ~x 2 ) OR (~x 1 AND x 2 )

15 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Backpropagation

16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Regression Forward Backward x

17 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Regression with Multiple Outputs zhzh v ih yiyi xjxj w hj

18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18

19 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19

20 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20 whx+w0whx+w0 zhzh vhzhvhzh

21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21 Two-Class Discrimination One sigmoid output y t for P(C 1 |x t ) and P(C 2 |x t ) ≡ 1-y t

22 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22 K>2 Classes

23 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23 Multiple Hidden Layers MLP with one hidden layer is a universal approximator (Hornik et al., 1989), but using multiple layers may lead to simpler networks

24 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Improving Convergence Momentum Adaptive learning rate

25 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25 Overfitting/Overtraining Number of weights: H (d+1)+(H+1)K

26 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26

27 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27 Structured MLP (Le Cun et al, 1989)

28 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28 Weight Sharing

29 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29 Hints Invariance to translation, rotation, size Virtual examples Augmented error: E’=E+ λ h E h If x’ and x are the “same”: E h =[g(x| θ )- g(x’| θ )] 2 Approximation hint: (Abu-Mostafa, 1995)

30 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30 Tuning the Network Size Destructive Weight decay: Constructive Growing networks (Ash, 1989) (Fahlman and Lebiere, 1989)

31 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 31 Bayesian Learning Consider weights w i as random vars, prior p(w i ) Weight decay, ridge regression, regularization cost=data-misfit + λ complexity

32 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 32 Dimensionality Reduction

33 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 33

34 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 34 Learning Time Applications:  Sequence recognition: Speech recognition  Sequence reproduction: Time-series prediction  Sequence association Network architectures  Time-delay networks (Waibel et al., 1989)  Recurrent networks (Rumelhart et al., 1986)

35 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 35 Time-Delay Neural Networks

36 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 36 Recurrent Networks

37 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 37 Unfolding in Time


Download ppt "INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Lecture Slides for."

Similar presentations


Ads by Google