Download presentation

Presentation is loading. Please wait.

1
Artificial Intelligence Statistical learning methods Chapter 20, AIMA (only ANNs & SVMs)

2
Artificial neural networks The brain is a pretty intelligent system. Can we ”copy” it? There are approx. 10 11 neurons in the brain. There are approx. 23 10 9 neurons in the male cortex (females have about 15% less).

3
The simple model The McCulloch-Pitts model (1943) Image from Neuroscience: Exploring the brain by Bear, Connors, and Paradiso y = g(w 0 +w 1 x 1 +w 2 x 2 +w 3 x 3 ) w1w1 w2w2 w3w3

4
Neuron firing rate Firing rate Depolarization

5
Transfer functions g(z) The Heaviside function The logistic function

6
The simple perceptron With {-1,+1} representation Traditionally (early 60:s) trained with Perceptron learning.

7
Perceptron learning Repeat until no errors are made anymore 1.Pick a random example [x(n),f(n)] 2.If the classification is correct, i.e. if y(x(n)) = f(n), then do nothing 3.If the classification is wrong, then do the following update to the parameters (, the learning rate, is a small positive number) Desired output

8
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 Initial values: = 0.3

9
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 This one is correctly classified, no action.

10
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 This one is incorrectly classified, learning action.

11
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 This one is incorrectly classified, learning action.

12
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 This one is correctly classified, no action.

13
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 This one is incorrectly classified, learning action.

14
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 This one is incorrectly classified, learning action.

15
Example: Perceptron learning The AND function x1x1 x2x2 x1x1 x2x2 f 00 01 10 11+1 Final solution

16
Perceptron learning Perceptron learning is guaranteed to find a solution in finite time, if a solution exists. Perceptron learning cannot be generalized to more complex networks. Better to use gradient descent – based on formulating an error and differentiable functions

17
Gradient search W E(W)E(W) “Go downhill” The “learning rate” () is set heuristically W(k)W(k) W(k+1) = W(k) + W(k)

18
The Multilayer Perceptron (MLP) Combine several single layer perceptrons. Each single layer perceptron uses a sigmoid function (C ) E.g. output input Can be trained using gradient descent

19
Example: One hidden layer Can approximate any continuous function (z) = sigmoid or linear, (z) = sigmoid.

20
Training: Backpropagation (Gradient descent)

21
Support vector machines

22
Linear classifier on a linearly separable problem There are infinitely many lines that have zero training error. Which line should we choose?

23
There are infinitely many lines that have zero training error. Which line should we choose? Choose the line with the largest margin. The “large margin classifier” margin Linear classifier on a linearly separable problem

24
There are infinitely many lines that have zero training error. Which line should we choose? Choose the line with the largest margin. The “large margin classifier” margin Linear classifier on a linearly separable problem ”Support vectors”

25
The plane separating and is defined by The dashed planes are given by margin Computing the margin w

26
Divide by b Define new w = w/b and = a/b margin Computing the margin w We have defined a scale for w and b

27
We have which gives margin Computing the margin w x x + w

28
w Maximizing the margin is equal to minimizing ||w|| subject to the constraints w T x(n) – +1 for all w T x(n) – -1 for all Quadratic programming problem, constraints can be included with Lagrange multipliers. Linear classifier on a linearly separable problem

29
Quadratic programming problem Minimize cost (Lagrangian) Minimum of L p occurs at the maximum of (the Wolfe dual) Only scalar product in cost. IMPORTANT!

30
Linear Support Vector Machine Test phase, the predicted output Where is determined e.g. by looking at one of the support vectors. Still only scalar products in the expression.

31
How deal with nonlinear case? Project data into high-dimensional space Z. There we know that it will be linearly separable (due to VC dimension of linear classifier). We don’t even have to know the projection...!

32
Scalar product kernel trick If we can find kernel such that Then we don’t even have to know the mapping to solve the problem...

33
Valid kernels (Mercer’s theorem) Define the matrix If K is symmetric, K = K T, and positive semi-definite, then K[x(i),x(j)] is a valid kernel.

34
Examples of kernels First, Gaussian kernel. Second, polynomial kernel. With d=1 we have linear SVM. Linear SVM often used with good success on high dimensional data (e.g. text classification).

35
Example: Robot color vision (Competition 1999) Classify the Lego pieces into red, blue, and yellow. Classify white balls, black sideboard, and green carpet.

36
What the camera sees (RGB space) Yellow Green Red

37
Mapping RGB (3D) to rgb (2D)

38
Lego in normalized rgb space Input is 2D x1x1 x2x2 Output is 6D: {red, blue, yellow, green, black, white}

39
MLP classifier E_train = 0.21% E_test = 0.24% 2-3-1 MLP Levenberg- Marquardt Training time (150 epochs): 51 seconds

40
SVM classifier E_train = 0.19% E_test = 0.20% SVM with = 1000 Training time: 22 seconds

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google