Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

Similar presentations


Presentation on theme: "The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:"— Presentation transcript:

1 The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

2 What We Got in the Dual Form Perceptron Algorithm?  The number of updates equals:  implies that the training point has been misclassified in the training process at least once.  implies that removing the training point will not affect the final results  The training data only appear in the algorithm through the entries of the Gram matrix, which is defined below:

3 The Margin Slack Variable of with respect to For a fixed value called the target margin, we define the margin slack variable of training point with respect to the hyperplane and as If then is misclassified by the hyperplane

4

5 Bound of Mistakes of a for loop for the Perceptron Algorithm Theorem 2.7 (Freund & Schapir) Let be a non-trivial training set with no duplicate examples, with Let be any hyperplane with, and and define Then the number of mistakes in the first execution of the for loop of the Perceptron Alg. on is bounded by

6 Fisher’s Linear Discriminator the data is maximally separated. We maximize: Finding the hyperplane on which the projection of where and are respectively the mean and standard deviation of the function output values

7 Multi-class Classification  Extension of binary classification  Equivalent to our previous rule:  For multi-class (one against rest):

8 Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set: Find a linear function: where is determined by solving the minimization problem:  The function is called the square loss function

9

10 Linear Regression (Cont.)  Different measures of loss are possible  1-norm loss function  -insensitive loss function  Huber’s regression  Ridge regression where

11 The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize the square loss function using gradient descent  Dual form exists (i.e. ) (Typo on textbook!)

12 Gradient and Hessian  Let be a differentiable function. The gradient of functionat a point is defined as  If is a twice differentiable function. The Hessian matrix ofat a point is defined as

13 Solution of the Least Squares Problem The Normal Equations Notation: Find such that has the smallest value, i.e. This is a quadratic unconstrained minimization problem is the optimal solution if and only if

14 The Normal Equations of LSQ Letting we have the normal equations of LSQ: If is inversable then Note: The above result is based on the First Order Optimality Conditions (necessary & sufficient for differentiable convex minimization problems) is singular ? What if

15 Ridge Regression (Guarantee Exist) where


Download ppt "The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:"

Similar presentations


Ads by Google