Presentation is loading. Please wait.

Presentation is loading. Please wait.

10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

Similar presentations


Presentation on theme: "10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,"— Presentation transcript:

1 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric Authentication: A Machine Learning Approach, Prentice Hall, to appear. 2.S.R. Gunn, 1998. Support Vector Machines for Classification and Regression. (http://www.isis.ecs.soton.ac.uk/resources/svminfo/) 3. Bernhard Schölkopf. Statistical learning and kernel methods. MSR-TR 2000-23, Microsoft Research, 2000. (ftp://ftp.research.microsoft.com/pub/tr/tr-2000-23.pdf) 4.For more resources on support vector machines, see http://www.kernel-machines.org/http://www.kernel-machines.org/

2 10/18/2015 2 Support Vector MachinesM.W. Mak Introduction l SVMs were developed by Vapnik in 1995 and are becoming popular due to their attractive features and promising performance. l Conventional neural networks are based on empirical risk minimization where network weights are determined by minimizing the mean squares error between the actual outputs and the desired outputs. l SVMs are based on the structural risk minimization principle where parameters are optimized by minimizing classification error. l SVMs have been shown to posses better generalization capability than conventional neural networks.

3 10/18/2015 3 Support Vector MachinesM.W. Mak Introduction (Cont.) l Given N labeled empirical data: where X is the set of input data in and y i are the labels. Domain X (1)

4 10/18/2015 4 Support Vector MachinesM.W. Mak Introduction (Cont.) l We construct a simple classifier by computing the means of the two classes where N 1 and N 2 are the number of data in the class with positive and negative labels, respectively. l We assign a new point x to the class whose mean is closer to it. l To achieve this, we compute (2)

5 10/18/2015 5 Support Vector MachinesM.W. Mak Introduction (Cont.) l Then, we determine the class of x by checking whether the vector connecting x and c encloses an angle smaller than  /2 with the vector Domain X x where

6 10/18/2015 6 Support Vector MachinesM.W. Mak Introduction (Cont.) l In the special case where b = 0, we have l This means that we use ALL data point x i, each being weighted equally by 1/N 1 or 1/N 2, to define the decision plane. (3)

7 10/18/2015 7 Support Vector MachinesM.W. Mak Introduction (Cont.) Domain X x Decision plan

8 10/18/2015 8 Support Vector MachinesM.W. Mak Introduction (Cont.) l However, we might want to remove the influence of patterns that are far away from the decision boundary, because their influence is usually small. l We may also select only a few important data point (called support vectors) and weight them differently. l Then, we have a support vector machine.

9 10/18/2015 9 Support Vector MachinesM.W. Mak Introduction (Cont.) Domain X x Decision plane Support vectors Margin l We aim to find a decision plane that maximizes the margin.

10 10/18/2015 10 Support Vector MachinesM.W. Mak Linear SVMs l Assume that all training data satisfy the constraints: which means l Training data points for which the above equality holds lie on hyperplanes parallel to the decision plane. (4) (5)

11 10/18/2015 11 Support Vector MachinesM.W. Mak Linear SVMs (Conts.) Margin: d l Therefore, maximizing the margin is equivalent to minimizing ||w|| 2.

12 10/18/2015 12 Support Vector MachinesM.W. Mak Linear SVMs (Lagrangian) l We minimize ||w|| 2 subject to the constraint that l This can be achieved by introducing Lagrange multipliers and a Lagrangian l The Lagrangian has to be minimized with respect to w and b and maximized with respect to (6) (7)

13 10/18/2015 13 Support Vector MachinesM.W. Mak Linear SVMs (Lagrangian) l Setting l We obtain l Patterns for which are called Support Vectors. These vectors lie on the margin and satisfy where S contains the indexes to the support vectors. (8) l Patterns for which are considered to be irrelevant to the classification.

14 10/18/2015 14 Support Vector MachinesM.W. Mak Linear SVMs (Wolfe Dual) l Substituting (8) into (7), we obtain the Wolfe dual: l The hyper-decision plane is thus (9)

15 10/18/2015 15 Support Vector MachinesM.W. Mak Linear SVMs (Example) l Analytical example (3-point problem): l Objective function:

16 10/18/2015 16 Support Vector MachinesM.W. Mak Linear SVMs (Example) l We introduce another Lagrange multiplier λ to obtain the Lagrangian l Differentiating F(α, λ) with respect to λ and α i and set the results to zero, we obtain

17 10/18/2015 17 Support Vector MachinesM.W. Mak Linear SVMs (Example) l Substitute the Lagrange multipliers into Eq. 8

18 10/18/2015 18 Support Vector MachinesM.W. Mak Linear SVMs (Example) l 4-point linear separable problem: 4 SVs 3 SVs

19 10/18/2015 19 Support Vector MachinesM.W. Mak Linear SVMs (Non-linearly separable) l Non-linearly separable: patterns that cannot be separated by a linear decision boundary without incurring classification error. Data that causes classification error in linear SVMs

20 10/18/2015 20 Support Vector MachinesM.W. Mak Linear SVMs (Non-linearly separable) l We introduce a set of slack variables with l The slack variables allow some data to violate the constraints defined for the linearly separable case (Eq. 6): l Therefore, for some we have

21 10/18/2015 21 Support Vector MachinesM.W. Mak Linear SVMs (Non-linearly separable) l E.g. because x 10 and x 19 are inside the margins, i.e. they violate the constraint (Eq. 6).

22 10/18/2015 22 Support Vector MachinesM.W. Mak Linear SVMs (Non-linearly separable) l For non-separable cases: where C is a user-defined penalty parameter to penalize any violation of the margins. l The Lagrangian becomes

23 10/18/2015 23 Support Vector MachinesM.W. Mak Linear SVMs (Non-linearly separable) l Wolfe dual optimization: l The output weight vector and bias term are

24 10/18/2015 24 Support Vector MachinesM.W. Mak 2. Linear SVMs (Types of SVs) l Three types of support vectors 1.On the margin: 2. Inside the margin: 3. Outside the margin:

25 10/18/2015 25 Support Vector MachinesM.W. Mak 2. Linear SVMs (Types of SVs)

26 10/18/2015 26 Support Vector MachinesM.W. Mak 2. Linear SVMs (Types of SVs) Swapping Class 1 and Class 2

27 10/18/2015 27 Support Vector MachinesM.W. Mak 2. Linear SVMs (Types of SVs) l Effect of varying C: C = 0.1 C = 100

28 10/18/2015 28 Support Vector MachinesM.W. Mak 3. Non-linear SVMs l In case the training data X are not linearly separable, we may use a kernel function to map the data from the input space to a feature space where data become linearly separable. Input Space (Domain X) Decision boundary Feature Space

29 10/18/2015 29 Support Vector MachinesM.W. Mak 3. Non-linear SVMs (Conts.) l The decision function becomes (a)

30 10/18/2015 30 Support Vector MachinesM.W. Mak 3. Non-linear SVMs (Conts.)

31 10/18/2015 31 Support Vector MachinesM.W. Mak 3. Non-linear SVMs (Conts.) l The decision function becomes l For RBF kernels l For polynomial kernels

32 10/18/2015 32 Support Vector MachinesM.W. Mak 3. Non-linear SVMs (Conts.) l The decision function becomes l The optimization problem becomes: (9)

33 10/18/2015 33 Support Vector MachinesM.W. Mak 3. Non-linear SVMs (Conts.) l The effect of varying C on RBF-SVMs: C = 10 C = 1000

34 10/18/2015 34 Support Vector MachinesM.W. Mak 3. Non-linear SVMs (Conts.) l The effect of varying C on Polynomial-SVMs: C = 10 C = 1000


Download ppt "10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,"

Similar presentations


Ads by Google