Download presentation

Presentation is loading. Please wait.

Published byTate Tooke Modified about 1 year ago

1
Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland, from Romain Thibaux (regression slides), and Moshe Sipper Longin Jan Latecki Temple University

2
Stephen Marsland McCulloch and Pitts Neurons x2x2 xmxm w1w1 w2w2 wmwm o h x1x1 Greatly simplified biological neurons Sum the inputs If total is less than some threshold, neuron fires Otherwise does not

3
Stephen Marsland McCulloch and Pitts Neurons for some threshold The weight w j can be positive or negative Inhibitory or exitatory Use only a linear sum of inputs Use a simple output instead of a pulse (spike train)

4
Stephen Marsland Neural Networks Can put lots of McCulloch & Pitts neurons together Connect them up in any way we like In fact, assemblies of the neurons are capable of universal computation Can perform any computation that a normal computer can Just have to solve for all the weights w ij

5
Stephen Marsland Training Neurons Adapting the weights is learning How does the network know it is right? How do we adapt the weights to make the network right more often? Training set with target outputs Learning rule

6
2.2 The Perceptron Definition from Wikipedia: The perceptron is a binary classifier which maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix: In order to not explicitly write b, we extend the input vector x by one more dimension that is always set to -1, e.g., x=(-1,x_1, …, x_7) with x_0=-1, and extend the weight vector to w=(w_0,w_1, …, w_7). Then adjusting w_0 corresponds to adjusting b. The perceptron is considered the simplest kind of feed-forward neural network.

7
Stephen Marsland Bias Replaces Threshold InputsOutputs

8

9
Stephen Marsland Perceptron Decision = Recall Outputs are: For example, y=(y_1, …, y_5)=(1, 0, 0, 1, 1) is a possible output. We may have a different function g in the place of sign, as in (2.4) in the book.

10
Stephen Marsland Perceptron Learning = Updating the Weights We want to change the values of the weights Aim: minimise the error at the output If E = t-y, want E to be 0 Use: Learning rate Error Input

11
Example 1: The Logical OR X_1X_2t Initial values: w_0(0)=-0.05, w_1(0) =-0.02, w_2(0)=0.02, and =0.25 Take first row of our training table: y_1= sign( -0.05× × ×0 ) = 1 w_0(1) = ×(0-1)×-1=0.2 w_1(1) = ×(0-1)×0=-0.02 w_2(1) = ×(0-1)×0=0.02 We continue with the new weights and the second row, and so on We make several passes over the training data. W_0 W_1 W_2

12
Stephen Marsland Decision boundary for OR perceptron

13
Perceptron Learning Applet html/index.html html/index.html

14
Stephen Marsland Example 2: Obstacle Avoidance with the Perceptron LS RS LM RM w1w2 w3 w4 = 0.3 = LS RS LM RM

15
Stephen Marsland Obstacle Avoidance with the Perceptron LSRSLMRM XX

16
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS LMRM w1w2 w3 w4 w1=0+0.3 * (1-1) * 0 = 0

17
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS RM w1w2 w3 w4 w2=0+0.3 * (1-1) * 0 = 0 And the same for w3, w4 LM

18
Stephen Marsland Obstacle Avoidance with the Perceptron LSRSLMRM XX

19
Stephen Marsland Example 1: Obstacle Avoidance with the Perceptron LS RS RM w1w2 w3 w4 w1=0+0.3 * (-1-1) * 0 = 0 LM

20
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS RM w1w2 w3 w4 w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 LM

21
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS RM w1w2 w3 w4 w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w3=0+0.3 * (-1-1) * 1 = -0.6 LM

22
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS RM w1w2 w3 w4 w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w3=0+0.3 * (-1-1) * 1 = -0.6 w4=0+0.3 * ( 1-1) * 1 = 0 LM

23
Stephen Marsland Obstacle Avoidance with the Perceptron LSRSLMRM XX

24
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS LMRM w1w2 w3 w4 w1=0+0.3 * ( 1-1) * 1 = 0 w2=0+0.3 * (-1-1) * 1 = -0.6 w3= * ( 1-1) * 0 = -0.6 w4=0+0.3 * (-1-1) * 0 = 0

25
Stephen Marsland Obstacle Avoidance with the Perceptron LS RS LM RM

26
Stephen Marsland 2.3 Linear Separability Outputs are: where and is the angle between vectors x and w.

27
Stephen Marsland Geometry of linear Separability w The equation of a line is w_0 + w_1*x + w_2*y=0 It also means that point (x,y) is on the line This equation is equivalent to w x = (w_0, w_1,w_2) (1,x,y) = 0 If w x > 0, then the angle between w and x is less than 90 degree, which means that w and x lie on the same side of the line. Each output node of perceptron tries to separate the training data Into two classes (fire or no-fire) with a linear decision boundary, i.e., straight line in 2D, plane in 3D, and hyperplane in higher dim.

28
Stephen Marsland Linear Separability The Binary AND Function

29
29 Gradient Descent Learning Rule Consider linear unit without threshold and continuous output o (not just –1,1) – y=w 0 + w 1 x 1 + … + w n x n Train the w i ’s such that they minimize the squared error – E[w 1,…,w n ] = ½ d D (t d -y d ) 2 where D is the set of training examples

30
30 Supervised Learning Training and test data sets Training set; input & target

31
31 Gradient Descent D={,,, } Gradient: E[w]=[ E/ w 0,… E/ w n ] (w 1,w 2 ) (w 1 + w 1,w 2 + w 2 ) w=- E[w] w i =- E/ w i / w i 1/2 d (t d -y d ) 2 = d / w i 1/2(t d - i w i x i ) 2 = d (t d - y d )(-x i )

32
Stephen Marsland Gradient Descent Error w i =- E/ w i

33
33 Incremental Stochastic Gradient Descent Batch mode : gradient descent w=w - E D [w] over the entire data D E D [w]=1/2 d (t d -y d ) 2 Incremental mode: gradient descent w=w - E d [w] over individual training examples d E d [w]=1/2 (t d -y d ) 2 Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if is small enough

34
34 Gradient Descent Perceptron Learning Gradient-Descent(training_examples, ) Each training example is a pair of the form where (x 1,…,x n ) is the vector of input values, and t is the target output value, is the learning rate (e.g. 0.1) Initialize each w i to some small random value Until the termination condition is met, Do – For each in training_examples Do Input the instance (x 1,…,x n ) to the linear unit and compute the output y For each linear unit weight w i Do – w i = (t-y) x i – For each linear unit weight wi Do w i =w i + w i

35
Stephen Marsland Linear Separability The Exclusive Or (XOR) function. ABOut Limitations of the Perceptron

36
Stephen Marsland Limitations of the Perceptron ? W 1 > 0 W 2 > 0 W 1 + W 2 < 0

37
Stephen Marsland Limitations of the Perceptron? In 1 In 2 In 3 ABCOut

38
2.4 Linear regression Temperature Given examples Predict given a new point

39
Temperature Linear regression Prediction

40
Ordinary Least Squares (OLS) Error or “residual” Prediction Observation Sum squared error

41
Minimize the sum squared error Sum squared error Linear equation Linear system

42
Alternative derivation n d Solve the system (it’s better not to invert the matrix)

43
Beyond lines and planes everything is the same with still linear in

44
Geometric interpretation [Matlab demo]

45
Ordinary Least Squares [summary] n d Let For example Let Minimize by solving Given examples Predict

46
Probabilistic interpretation Likelihood

47
Summery Perceptron and regression optimize the same target function In both cases we compute the gradient (vector of partial derivatives) In the case of regression, we set the gradient to zero and solve for vector w. As the solution we have a closed formula for w such that the target function obtains the global minimum. In the case of perceptron, we iteratively go in the direction of the minimum by going in the direction of minus the gradient. We do this incrementally making small steps for each data point.

48
Homework 1 (Ch ) Implement perceptron in Matlab and test it on the Pmia Indian Dataset from UCI Machine Learning Repository: (Ch ) Implementing linear regression in Matlab and apply it to auto-mpg dataset.

49
Stephen Marsland From Ch. 3: Testing How do we evaluate our trained network? Can’t just compute the error on the training data - unfair, can’t see overfitting Keep a separate testing set After training, evaluate on this test set How do we check for overfitting? Can’t use training or testing sets

50
Stephen Marsland Validation Keep a third set of data for this Train the network on training data Periodically, stop and evaluate on validation set After training has finished, test on test set This is coming expensive on data!

51
Stephen Marsland Hold Out Cross Validation Inputs Targets … Training Validation

52
Stephen Marsland Hold Out Cross Validation Partition training data into K subsets Train on K-1 of subsets, validate on Kth Repeat for new network, leaving out a different subset Choose network that has best validation error Traded off data for computation Extreme version: leave-one-out

53
Stephen Marsland Early Stopping When should we stop training? Could set a minimum training error Danger of overfitting Could set a number of epochs Danger of underfitting or overfitting Can use the validation set Measure the error on the validation set during training

54
Stephen Marsland Early Stopping Error Training Number of epochs Validation Time to stop training

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google