Presentation is loading. Please wait.

Presentation is loading. Please wait.

Longin Jan Latecki Temple University

Similar presentations


Presentation on theme: "Longin Jan Latecki Temple University"— Presentation transcript:

1 Longin Jan Latecki Temple University latecki@temple.edu
Ch. 2: Linear Discriminants Stephen Marsland, Machine Learning: An Algorithmic Perspective.  CRC 2009 based on slides from Stephen Marsland, from Romain Thibaux (regression slides), and Moshe Sipper Longin Jan Latecki Temple University

2 McCulloch and Pitts Neurons
x1 w1 x2 w2 h o wm xm Greatly simplified biological neurons Sum the inputs If total is less than some threshold, neuron fires Otherwise does not Stephen Marsland

3 McCulloch and Pitts Neurons
for some threshold  The weight wj can be positive or negative Inhibitory or exitatory Use only a linear sum of inputs Use a simple output instead of a pulse (spike train) Stephen Marsland

4 Neural Networks Can put lots of McCulloch & Pitts neurons together
Connect them up in any way we like In fact, assemblies of the neurons are capable of universal computation Can perform any computation that a normal computer can Just have to solve for all the weights wij Stephen Marsland

5 Training Neurons Adapting the weights is learning
How does the network know it is right? How do we adapt the weights to make the network right more often? Training set with target outputs Learning rule Stephen Marsland

6 2.2 The Perceptron The perceptron is considered the simplest kind of feed-forward neural network. Definition from Wikipedia: The perceptron is a binary classifier which maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix: In order to not explicitly write b, we extend the input vector x by one more dimension that is always set to -1, e.g., x=(-1,x_1, …, x_7) with x_0=-1, and extend the weight vector to w=(w_0,w_1, …, w_7). Then adjusting w_0 corresponds to adjusting b.

7 Bias Replaces Threshold
-1 Inputs Outputs Stephen Marsland

8

9 Perceptron Decision = Recall
Outputs are: For example, y=(y_1, …, y_5)=(1, 0, 0, 1, 1) is a possible output. We may have a different function g in the place of sign, as in (2.4) in the book. Stephen Marsland

10 Perceptron Learning = Updating the Weights
We want to change the values of the weights Aim: minimise the error at the output If E = t-y, want E to be 0 Use: Input Learning rate Error Stephen Marsland

11 Example 1: The Logical OR
-1 X_1 X_2 t 1 W_0 W_1 W_2 Initial values: w_0(0)=-0.05, w_1(0) =-0.02, w_2(0)=0.02, and =0.25 Take first row of our training table: y_1= sign( -0.05× × ×0 ) = 1 w_0(1) = ×(0-1)×-1=0.2 w_1(1) = ×(0-1)×0=-0.02 w_2(1) = ×(0-1)×0=0.02 We continue with the new weights and the second row, and so on We make several passes over the training data.

12 Decision boundary for OR perceptron
Stephen Marsland

13 Perceptron Learning Applet

14 Example 2: Obstacle Avoidance with the Perceptron
LS RS LM RM w1 w2 w3 w4  = 0.3  = -0.01 LS RS LM RM Stephen Marsland

15 Obstacle Avoidance with the Perceptron
LS RS LM RM 1 -1 X Stephen Marsland

16 Obstacle Avoidance with the Perceptron
LS RS w1 w2 w4 w1=0+0.3 * (1-1) * 0 = 0 w3 LM RM Stephen Marsland

17 Obstacle Avoidance with the Perceptron
LS RS w2=0+0.3 * (1-1) * 0 = 0 w1 w2 w4 And the same for w3, w4 w3 LM RM Stephen Marsland

18 Obstacle Avoidance with the Perceptron
LS RS LM RM 1 -1 X Stephen Marsland

19 Example 1: Obstacle Avoidance with the Perceptron
LS RS w1=0+0.3 * (-1-1) * 0 = 0 w1 w2 w4 w3 LM RM Stephen Marsland

20 Obstacle Avoidance with the Perceptron
LS RS w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w1 w2 w4 w3 LM RM Stephen Marsland

21 Obstacle Avoidance with the Perceptron
LS RS w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w3=0+0.3 * (-1-1) * 1 = -0.6 w1 w2 w4 w3 LM RM Stephen Marsland

22 Obstacle Avoidance with the Perceptron
LS RS w1=0+0.3 * (-1-1) * 0 = 0 w2=0+0.3 * ( 1-1) * 0 = 0 w3=0+0.3 * (-1-1) * 1 = -0.6 w4=0+0.3 * ( 1-1) * 1 = 0 w1 w2 w4 w3 LM RM Stephen Marsland

23 Obstacle Avoidance with the Perceptron
LS RS LM RM 1 -1 X Stephen Marsland

24 Obstacle Avoidance with the Perceptron
LS RS w1=0+0.3 * ( 1-1) * 1 = 0 w2=0+0.3 * (-1-1) * 1 = -0.6 w3= * ( 1-1) * 0 = -0.6 w4=0+0.3 * (-1-1) * 0 = 0 w1 w2 w4 w3 LM RM Stephen Marsland

25 Obstacle Avoidance with the Perceptron
LS RS -0.6 -0.6 -0.01 -0.01 LM RM Stephen Marsland

26 2.3 Linear Separability Outputs are: where
and  is the angle between vectors x and w. Stephen Marsland

27 Geometry of linear Separability
The equation of a line is w_0 + w_1*x + w_2*y=0 It also means that point (x,y) is on the line This equation is equivalent to wx = (w_0, w_1,w_2) (1,x,y) = 0 If wx > 0, then the angle between w and x is less than 90 degree, which means that w and x lie on the same side of the line. w Each output node of perceptron tries to separate the training data Into two classes (fire or no-fire) with a linear decision boundary, i.e., straight line in 2D, plane in 3D, and hyperplane in higher dim. Stephen Marsland

28 Linear Separability The Binary AND Function Stephen Marsland

29 Gradient Descent Learning Rule
Consider linear unit without threshold and continuous output o (not just –1,1) y=w0 + w1 x1 + … + wn xn Train the wi’s such that they minimize the squared error E[w1,…,wn] = ½ dD (td-yd)2 where D is the set of training examples

30 Supervised Learning Training and test data sets
Training set; input & target

31 Gradient Descent D={<(1,1),1>,<(-1,-1),1>,
<(1,-1),-1>,<(-1,1),-1>} (w1,w2) Gradient: E[w]=[E/w0,… E/wn] w=- E[w] (w1+w1,w2 +w2) wi=- E/wi /wi 1/2d(td-yd)2 = d /wi 1/2(td-i wi xi)2 = d(td- yd)(-xi)

32 Gradient Descent Error wi=- E/wi Stephen Marsland

33 Incremental Stochastic Gradient Descent
Batch mode : gradient descent w=w -  ED[w] over the entire data D ED[w]=1/2d(td-yd)2 Incremental mode: gradient descent w=w -  Ed[w] over individual training examples d Ed[w]=1/2 (td-yd)2 Incremental Gradient Descent can approximate Batch Gradient Descent arbitrarily closely if  is small enough

34 Gradient Descent Perceptron Learning
Gradient-Descent(training_examples, ) Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector of input values, and t is the target output value,  is the learning rate (e.g. 0.1) Initialize each wi to some small random value Until the termination condition is met, Do For each <(x1,…xn),t> in training_examples Do Input the instance (x1,…,xn) to the linear unit and compute the output y For each linear unit weight wi Do wi=  (t-y) xi wi=wi+wi

35 Limitations of the Perceptron
The Exclusive Or (XOR) function. Linear Separability A B Out 1 Stephen Marsland

36 Limitations of the Perceptron
W1 > 0 W2 > 0 W1 + W2 < 0 ? Stephen Marsland

37 Limitations of the Perceptron?
In2 A B C Out 1 In1 In3 Stephen Marsland

38 2.4 Linear regression Temperature Given examples Predict
40 26 24 Temperature 20 22 20 30 40 20 30 10 20 Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 10 20 10 Given examples Predict given a new point

39 Linear regression Temperature Prediction Prediction
10 20 30 40 22 24 26 40 Temperature 20 Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 20 Prediction Prediction

40 Ordinary Least Squares (OLS)
Error or “residual” Observation Prediction Figure 1: scatter(1:20,10+(1:20)+2*randn(1,20),'k','filled'); a=axis; a(3)=0; axis(a); 20 Sum squared error

41 Minimize the sum squared error
Linear equation Linear system

42 Alternative derivation
Solve the system (it’s better not to invert the matrix)

43 Beyond lines and planes
10 20 40 still linear in everything is the same with

44 Geometric interpretation
20 10 400 300 -10 200 100 10 20 [Matlab demo]

45 Ordinary Least Squares [summary]
Given examples Let For example Let n d Minimize by solving Predict

46 Probabilistic interpretation
20 Likelihood

47 Summery Perceptron and regression optimize the same target function
In both cases we compute the gradient (vector of partial derivatives) In the case of regression, we set the gradient to zero and solve for vector w. As the solution we have a closed formula for w such that the target function obtains the global minimum. In the case of perceptron, we iteratively go in the direction of the minimum by going in the direction of minus the gradient. We do this incrementally making small steps for each data point.

48 Homework 1 (Ch ) Implement perceptron in Matlab and test it on the Pmia Indian Dataset from UCI Machine Learning Repository: (Ch ) Implementing linear regression in Matlab and apply it to auto-mpg dataset.

49 From Ch. 3: Testing How do we evaluate our trained network?
Can’t just compute the error on the training data - unfair, can’t see overfitting Keep a separate testing set After training, evaluate on this test set How do we check for overfitting? Can’t use training or testing sets Stephen Marsland

50 Validation Keep a third set of data for this
Train the network on training data Periodically, stop and evaluate on validation set After training has finished, test on test set This is coming expensive on data! Stephen Marsland

51 Hold Out Cross Validation
Inputs Targets Training Validation Stephen Marsland

52 Hold Out Cross Validation
Partition training data into K subsets Train on K-1 of subsets, validate on Kth Repeat for new network, leaving out a different subset Choose network that has best validation error Traded off data for computation Extreme version: leave-one-out Stephen Marsland

53 Early Stopping When should we stop training?
Could set a minimum training error Danger of overfitting Could set a number of epochs Danger of underfitting or overfitting Can use the validation set Measure the error on the validation set during training Stephen Marsland

54 Early Stopping Time to stop training Error Validation Training
Number of epochs Stephen Marsland


Download ppt "Longin Jan Latecki Temple University"

Similar presentations


Ads by Google