Presentation is loading. Please wait.

Presentation is loading. Please wait.

Support Vector Machines

Similar presentations


Presentation on theme: "Support Vector Machines"— Presentation transcript:

1 Support Vector Machines
主講人:虞台文

2 Content Introduction The VC Dimension & Structure Risk Minimization
Linear SVM  The Separable case Linear SVM  The Non-Separable case Lagrange Multipliers

3 Support Vector Machines
Introduction

4 Learning Machines A machine to learn the mapping Defined as
Learning by adjusting this parameter?

5 Generalization vs. Learning
How a machine learns? Adjusting the parameters so as to partition the pattern (feature) space for classification. How to adjust? Minimize the empirical risk (traditional approaches). What the machine learned? Memorize the patterns it sees? or Memorize the rules it finds for different classes? What does the machine actually learn if it minimizes empirical risk only?

6 Risks Expected Risk (test error) Empirical Risk (training error)

7 More on Empirical Risk How can make the empirical risk arbitrarily small? To let the machine have very large memorization capacity. Does a machine with small empirical risk also get small expected risk? How to avoid the machine to strain to memorize training patterns, instead of doing generalization, only? How to deal with the straining-memorization capacity of a machine? What the new criterion should be?

8 Structure Risk Minimization
Goal: Learn both the right ‘structure’ and right `rules’ for classification. Right Structure: E.g., Right amount and right forms of components or parameters are to participate in a learning machine. Right Rules: The empirical risk will also be reduced if right rules are learned.

9 New Criterion Risk due to the structure of the learning machine
Total Risk Empirical Risk = +

10 Support Vector Machines
The VC Dimension & Structure Risk Minimization

11 The VC Dimension VC: Vapnik Chervonenkis
Consider a set of function f (x,)  {1,1}. A given set of l points can be labeled in 2l ways. If a member of the set {f ()} can be found which correctly assigns the labels for all labeling, then the set of points is shattered by that set of functions. The VC dimension of {f ()} is the maximum number of training points that can be shattered by {f ()}.

12 The VC Dimension for Oriented Lines in R2

13 More on VC Dimension In general, the VC dimension of a set of oriented hyperplanes in Rn is n+1. VC dimension is a measure of memorization capability. VC dimension is not directly related to number of parameters. Vapnik (1995) has an example with 1 parameter and infinite VC dimension.

14 Bound on Expected Risk Expected Risk Empirical Risk VC Confidence

15 Bound on Expected Risk Consider small  (e.g.,   0.5). VC Confidence

16 Bound on Expected Risk Consider small  (e.g.,   0.5).
Structure risk minimization want to minimize the bound Traditional approaches minimize empirical risk only

17 VC Confidence Amongst machines with zero empirical risk, choose the one with smallest VC dimension How to evaluate VC dimension?  =0.05 and l=10,000

18 Structure Risk Minimization
h4 h3 h2 h1 Nested subset of functions with different VC dimensions.

19 Support Vector Machines
The Linear SVM  The Separable Case

20 The Linear Separability
Linearly separable Not linearly separable

21 The Linear Separability
Linearly Separable w wx + b = +1 wx + b = 1 wx + b = 0 Linearly separable

22 Margin Width d w wx + b = +1 wx + b = 1 wx + b = 0
How about maximize the margin? O wx + b = 0 What is the relation btw. the margin width and VC dimension?

23 Maximum Margin Classifier
How about maximize the margin? O What is the relation btw. the margin width and VC dimension? Supporters

24 Building SVM Minimize Subject to
This requires the knowledge about Lagrange Multiplier.

25 The Method of Lagrange Minimize Subject to The Lagrangian:
Minimize it w.r.t w, while maximize it w.r.t. .

26 What value of i should be if it is feasible and nonzero?
The Method of Lagrange How about if it is zero? Minimize Subject to What value of i should be if it is feasible and nonzero? The Lagrangian: Minimize it w.r.t w, while maximize it w.r.t. .

27 The Method of Lagrange Minimize Subject to The Lagrangian:

28 Duality Minimize Subject to Maximize Subject to

29 Duality Maximize Subject to

30 Duality Maximize

31 Duality Maximize

32 Duality Minimize The Primal Subject to Maximize The Dual Subject to

33 The Solution Find * by … Quadratic Programming Maximize Subject to
The Dual Subject to

34 The Karush-Kuhn-Tucker
The Solution Call it a support vector is i > 0. Find * by … The Karush-Kuhn-Tucker Conditions The Lagrangian:

35 The Karush-Kuhn-Tucker Conditions

36 Classification

37 Classification Using Supporters
The weight for the ith support vector. Bias The similarity measure btw. input and the ith support vector.

38 Demonstration

39 Support Vector Machines
The Linear SVM  The Non-Separable Case

40 The Non-Separable Case
wx + b = 0 wx + b = +1 wx + b = 1 We require that wx + b = +1  i

41 Mathematic Formulation
For simplicity, we consider k = 1. Mathematic Formulation Minimize Subject to

42 The Lagrangian Minimize Subject to

43 Duality Minimize Subject to Maximize Subject to

44 Duality Maximize Subject to

45 Duality Maximize this

46 Duality Maximize this

47 Duality Minimize The Primal Subject to Maximize The Dual Subject to

48 The Karush-Kuhn-Tucker Conditions

49 The Solution Find * by … Quadratic Programming Maximize The Dual
Subject to

50 Call it a support vector is 0 < i < C.
The Solution Call it a support vector is 0 < i < C. Find * by … The Lagrangian:

51 The Solution Find * by …
Call it a support vector is 0 < i < C. Find * by … A false classification pattern if i > 0.

52 Classification

53 Classification Using Supporters
The weight for the ith support vector. Bias The similarity measure btw. input and the ith support vector.

54 Support Vector Machines
Lagrange Multipliers

55 Lagrange Multiplier “Lagrange Multiplier Method” is a powerful tool for constraint optimization. Contributed by Riemann.

56 Milkmaid Problem

57 Milkmaid Problem

58 Milkmaid Problem Goal: (x, y) (x1, y1) g(x, y) = 0 (x2, y2) Minimize
Subject to g(x, y) = 0 (x2, y2)

59 At the extreme point, say, (x*, y*)
Observation Goal: (x*, y*) Minimize (x1, y1) Subject to At the extreme point, say, (x*, y*) f (x*, y*) = g(x*, y*). g(x, y) = 0 (x2, y2)

60 Optimization with Equality Constraints
Goal: Min/Max Subject to Lemma: At an extreme point, say, x*, we have

61 Proof r(t) r(t0) Let x* be an extreme point.
r(t) be any differentiable path on surface g(x)=0 such that r(t0)=x*. is a vector tangent to the surface g(x)=0 at x*. Since x* be an extreme point,

62 Proof r(t) r(t0) This is true for any r pass through x* on surface g(x)=0. It implies that where  is the tangential plane of surface g(x)=0 at x*.

63 Optimization with Equality Constraints
Goal: Min/Max Subject to Lemma: At an extreme point, say, x*, we have Lagrange Multiplier

64 The Method of Lagrange Goal: x: dimension n. Min/Max Subject to
Find the extreme points by solving the following equations. n + 1 equations with n + 1 variables

65 Lagrangian Goal: Min/Max Subject to Define Solve Lagrangian Constraint
Optimization Subject to Define Lagrangian Unconstraint Optimization Solve

66 Optimization with Multiple Equality Constraints
Min/Max Subject to Define Lagrangian Solve

67 Optimization with Inequality Constraints
Minimize Subject to You can always reformulate your problems into the about form.

68 Lagrange Multipliers Minimize Subject to Lagrangian:

69 Lagrange Multipliers Minimize Subject to Lagrangian:
negative for feasible solutions Subject to 0 for feasible solutions Lagrangian:

70 Duality Let x* be a local extreme. Define Maximize it w.r.t. L, M

71 Duality What are we doing? Define Maximize it w.r.t. L, M
To minimize the Lagrangian w.r.t x, while to maximize it w.r.t.  and . Let x* be a local extreme. What are we doing? Define Maximize it w.r.t. L, M

72 Saddle Point Determination

73 Duality Minimize Subject to The primal Maximize Subject to The dual

74 The Karush-Kuhn-Tucker Conditions


Download ppt "Support Vector Machines"

Similar presentations


Ads by Google