The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

What We Got in the Dual Form Perceptron Algorithm?  The number of updates equals:  implies that the training point has been misclassified in the training process at least once.  implies that removing the training point will not affect the final results  The training data only appear in the algorithm through the entries of the Gram matrix, which is defined below:

The Margin Slack Variable of with respect to For a fixed value called the target margin, we define the margin slack variable of training point with respect to the hyperplane and as If then is misclassified by the hyperplane

Bound of Mistakes of a for loop for the Perceptron Algorithm Theorem 2.7 (Freund & Schapir) Let be a non-trivial training set with no duplicate examples, with Let be any hyperplane with, and and define Then the number of mistakes in the first execution of the for loop of the Perceptron Alg. on is bounded by

Fisher’s Linear Discriminator the data is maximally separated. We maximize: Finding the hyperplane on which the projection of where and are respectively the mean and standard deviation of the function output values

Multi-class Classification  Extension of binary classification  Equivalent to our previous rule:  For multi-class (one against rest):

Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set: Find a linear function: where is determined by solving the minimization problem:  The function is called the square loss function

Linear Regression (Cont.)  Different measures of loss are possible  1-norm loss function  -insensitive loss function  Huber’s regression  Ridge regression where

The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize the square loss function using gradient descent  Dual form exists (i.e. ) (Typo on textbook!)

Gradient and Hessian  Let be a differentiable function. The gradient of functionat a point is defined as  If is a twice differentiable function. The Hessian matrix ofat a point is defined as

Solution of the Least Squares Problem The Normal Equations Notation: Find such that has the smallest value, i.e. This is a quadratic unconstrained minimization problem is the optimal solution if and only if

The Normal Equations of LSQ Letting we have the normal equations of LSQ: If is inversable then Note: The above result is based on the First Order Optimality Conditions (necessary & sufficient for differentiable convex minimization problems) is singular ? What if

Ridge Regression (Guarantee Exist) where

The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

Similar presentations

Presentation on theme: "The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:

Similar presentations

Presentation on theme: "The Perceptron Algorithm (Dual Form) Given a linearly separable training setand Repeat: until no mistakes made within the for loop return:"— Presentation transcript:

Similar presentations

About project

Feedback