Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Similar presentations


Presentation on theme: "Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on."— Presentation transcript:

1 Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on the random selection of  Find a bound of the trail of the distribution of in the form r.v.  is a function of and,where is the confidence level of the error bound which is given by learner

2 Probably Approximately Correct  We assert:  The error made by the hypothesis then the error bound will be less that is not depend on the unknown distribution or

3 Find the Hypothesis with Minimum Expected Risk?  Let the trainingexamples chosen i.i.d. according to withthe probability density be  The expected misclassification error made by is  The ideal hypothesis should has the smallest expected risk Unrealistic !!!

4 Empirical Risk Minimization (ERM)  Find the hypothesis with the smallest empirical risk and are not needed)(  Replace the expected risk over by an average over the training example  The empirical risk:  Only focusing on empirical risk will cause overfitting

5 VC Confidence (The Bound between )  The following inequality will be held with probability C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998), p.121-167

6 Capacity (Complexity) of Hypothesis Space :VC-dimension  A given training set is shattered by if for every labeling of with this labeling if and only consistent  Three (linear independent) points shattered by a hyperplanes in

7 Shattering Points with Hyperplanes in Theorem: Consider some set of m points in. Choose a point as origin. Then the m points can be shattered by oriented hyperplanes if and only if the position vectors of the rest points are linearly independent. Can you always shatter three points with a line in ?

8 Definition of VC-dimension (A Capacity Measure of Hypothesis Space )  The Vapnik-Chervonenkis dimension,, of hypothesis spacedefined over the input space is the size of the (existent) largest finite subset shattered by  If arbitrary large finite set of can be shattered by, then of  Let then

9 Optimization Problem Formulation Problem setting: Given functions and, defined on a domain subject to whereis called the objective function and are called constraints.

10 Definitions and Notation  Feasible region: where  A solution of the optimization problem is a point such thatfor which and is called a global minimum.

11 Definitions and Notation  A point is called a local minimum of the optimization problem if such that  At the solution, an inequality constraint is said to be active if, otherwise it is called an inactive constraint.  where is called the slack variable

12 Definitions and Notation  Remove an inactive constraint in an optimization problem will NOT affect the optimal solution  Very useful feature in SVM  If then the problem is called unconstrained minimization problem  SSVM formulation is in this category  Difficult to find the global minimum without convexity assumption  Least square problem is in this category

13 Gradient and Hessian  Let be a differentiable function. The gradient of functionat a point is defined as  If is a twice differentiable function. The Hessian matrix ofat a point is defined as

14 Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in the n dimensional real space  Represented by an matrix or  Membership of each point in the classes is specified by an diagonal matrix D : if and if  Separate and by two bounding planes such that:  More succinctly:, where

15 Robust Linear Programming (RLP) where : nonnegative slack (error) vector  The term, 1-norm measure of error vector, is called the training error. s.t. (LP)  For the linearly separable case, at solution of (LP): Preliminary Approach to Support Vector Machines

16 Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures of margin will lead to different SVM formulations.  Margin is maximized by minimizing reciprocal of margin.

17 Linear Program and Quadratic Program  An optimization problem in which the objective function and all constraints are linear functions is called a linear programming problem  If the objective function is convex quadratic while the constraints are all linear then the problem is called convex quadratic programming problem  Standard SVM formulation is in this category formulation is in this category 

18 The Most Important Concept in Optimization (minimization)  A point is said to be an optimal solution of a unconstrained minimization if there exists no decent direction  A point is said to be an optimal solution of a constrained minimization if there exists no feasible decent direction  There might exist decent direction but move along this direction will leave out the feasible region


Download ppt "Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on."

Similar presentations


Ads by Google