Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on the random selection of  Find a bound of the trail of the distribution of in the form r.v.  is a function of and,where is the confidence level of the error bound which is given by learner

Probably Approximately Correct  We assert:  The error made by the hypothesis then the error bound will be less that is not depend on the unknown distribution or

Find the Hypothesis with Minimum Expected Risk?  Let the trainingexamples chosen i.i.d. according to withthe probability density be  The expected misclassification error made by is  The ideal hypothesis should has the smallest expected risk Unrealistic !!!

Empirical Risk Minimization (ERM)  Find the hypothesis with the smallest empirical risk and are not needed)(  Replace the expected risk over by an average over the training example  The empirical risk:  Only focusing on empirical risk will cause overfitting

VC Confidence (The Bound between )  The following inequality will be held with probability C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998), p.121-167

Capacity (Complexity) of Hypothesis Space :VC-dimension  A given training set is shattered by if for every labeling of with this labeling if and only consistent  Three (linear independent) points shattered by a hyperplanes in

Shattering Points with Hyperplanes in Theorem: Consider some set of m points in. Choose a point as origin. Then the m points can be shattered by oriented hyperplanes if and only if the position vectors of the rest points are linearly independent. Can you always shatter three points with a line in ?

Definition of VC-dimension (A Capacity Measure of Hypothesis Space )  The Vapnik-Chervonenkis dimension,, of hypothesis spacedefined over the input space is the size of the (existent) largest finite subset shattered by  If arbitrary large finite set of can be shattered by, then of  Let then

Optimization Problem Formulation Problem setting: Given functions and, defined on a domain subject to whereis called the objective function and are called constraints.

Definitions and Notation  Feasible region: where  A solution of the optimization problem is a point such thatfor which and is called a global minimum.

Definitions and Notation  A point is called a local minimum of the optimization problem if such that  At the solution, an inequality constraint is said to be active if, otherwise it is called an inactive constraint.  where is called the slack variable

Definitions and Notation  Remove an inactive constraint in an optimization problem will NOT affect the optimal solution  Very useful feature in SVM  If then the problem is called unconstrained minimization problem  SSVM formulation is in this category  Difficult to find the global minimum without convexity assumption  Least square problem is in this category

Gradient and Hessian  Let be a differentiable function. The gradient of functionat a point is defined as  If is a twice differentiable function. The Hessian matrix ofat a point is defined as

Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in the n dimensional real space  Represented by an matrix or  Membership of each point in the classes is specified by an diagonal matrix D : if and if  Separate and by two bounding planes such that:  More succinctly:, where

Robust Linear Programming (RLP) where : nonnegative slack (error) vector  The term, 1-norm measure of error vector, is called the training error. s.t. (LP)  For the linearly separable case, at solution of (LP): Preliminary Approach to Support Vector Machines

Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures of margin will lead to different SVM formulations.  Margin is maximized by minimizing reciprocal of margin.

Linear Program and Quadratic Program  An optimization problem in which the objective function and all constraints are linear functions is called a linear programming problem  If the objective function is convex quadratic while the constraints are all linear then the problem is called convex quadratic programming problem  Standard SVM formulation is in this category formulation is in this category 

The Most Important Concept in Optimization (minimization)  A point is said to be an optimal solution of a unconstrained minimization if there exists no decent direction  A point is said to be an optimal solution of a constrained minimization if there exists no feasible decent direction  There might exist decent direction but move along this direction will leave out the feasible region

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Similar presentations

Presentation on theme: "Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Similar presentations

Presentation on theme: "Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on."— Presentation transcript:

Similar presentations

About project

Feedback