Download presentation

1
**Reduced Support Vector Machine**

Nonlinear Classifier: (i) Choose a random subset matrix of entire data matrix (ii) Solve the following problem by the Newton’s method min (iii) The nonlinear classifier is defined by the optimal solution in step (ii): Using gives lousy results!

2
**Reduced Set: Plays the Most Important Role in RSVM**

It is natural to raise two questions: Is there a way to choose the reduced set other than random selection so that RSVM will have a better performance? Is there a mechanism to determine the size of reduced set automatically or dynamically?

3
**Reduced Set Selection According to the Data Scatter in Input Space**

Choose reduced set randomly but only keep the points in the reduced set that are more than a certain minimal distance apart Expected these points to be representative sample

4
**Data Scatter in Input Space is NOT Good Enough**

An example is given as following: 1 2 3 5 4 6 7 8 9 11 10 12 Training data analogous to XOR problem

5
**Mapping to Feature Space**

Map the input data via nonlinear mapping： Equivalent to polynomial kernel with degree 2:

6
**Data Points in the Feature Space**

36 25 14 8 11 9 12 7 10 1 2 3 5 4 6 7 8 9 11 10 12

7
**The Polynomial Kernel Matrix**

8
Experiment Result 1 2 3 5 4 6 7 8 9 11 10 12

9
**Express the Classifier as Linear Combination of Kernel Functions**

is a linear combination of a set of kernel functions In SSVM, the nonlinear separating surface is: In RSVM, the nonlinear separating surface is: is a linear combination of a set of kernel functions

10
**Motivation of IRSVM The Strength of Weak Ties**

If the kernel functions are very similar, the space spanned by these kernel functions will be very limited. The strength of weak ties Mark S. Granovetter, The American Journal of Sociology, Vol. 78, No. 6 , May, 1973

11
**Incremental Reduced SVMs**

Start with a very small reduced set , then add a new data point only when the kernel vector is dissimilar to the current function set This point contributes the most extra information for generating the separating surface Repeat until several successive points cannot be added

12
**How to measure the dissimilarity?**

the kernel vector to the column space of is greater than a threshold Add a point into the reduced set if the distance from

13
**Solving Least Squares Problems**

This distance can be determined by solving a least squares problem The LSP has a unique solution if and

14
**IRSVM Algorithm pseudo-code (sequential version)**

1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For each data point not in the reduced set Computes its kernel vector Computes the distance from the kernel vector to the column space of the current reduced kernel matrix If its distance exceed a certain threshold 8 Add this point into the reduced set and form the new reduced kernel matrix 9 Until several successive failures happened in line 7 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

15
**Speed up IRSVM We have to solve the LSP many times and the**

complexity is The main cost depends on but not on Take this advantage this, we examine a batch data points at the same

16
**IRSVM Algorithm pseudo-code (Batch version)**

1 Randomly choose two data from the training data as the initial reduced set 2 Compute the reduced kernel matrix 3 For a batch data point not in the reduced set Computes their kernel vectors Computes the corresponding distances from these kernel vector to the column space of the current reduced kernel matrix For those points’ distance exceed a certain threshold 8 Add those point into the reduced set and form the new reduced kernel matrix 9 Until no data points in a batch were added in line 7,8 10 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel 11 A new data point is classified by the separating surface

17
**IRSVM on Four Public Datasets**

18
**IRSVM on UCI Adult datasets**

19
**Time comparison on Adult datasets**

20
**IRSVM 10 Runs Average on 6414 Points Adult Training Set**

21
**Empirical Risk Minimization (ERM)**

and are not needed) Replace the expected risk over by an average over the training example The empirical risk: Find the hypothesis with the smallest empirical risk Only focusing on empirical risk will cause overfitting

22
**VC Confidence (The Bound between )**

The following inequality will be held with probability C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998), p

23
**Why We Maximize the Margin? (Based on Statistical Learning Theory)**

The Structural Risk Minimization (SRM): The expected risk will be less than or equal to empirical risk (training error)+ VC (error) bound

24
**Bioinformatics Challenge**

Learning in very high dimensions with very few samples Colon cancer dataset: 2000 # of gene vs. 62 samples Acute leukemia dataset: 7129 # of gene vs. 72 samples Feature selection will be needed

25
**Feature Selection Approaches**

Filter model: the attribute set is filtered to produce the most promising subset before learning commences Weight score approach Wrapper model: the learning algorithm is wrapped into the selection procedure 1-norm SVM IRSVM

26
**Feature Selection –Filter Model Using Weight Score Approach**

27
**Filter Model – Weight Score Approach**

where and are the mean and standard deviation of feature for training examples of positive or negative class.

28
**Filter Model – Weight Score Approach**

is defined as the ratio between the difference of the means of expression levels and the sum of standard deviation in two classes. Selecting genes with largest as our top features. The weight score is calculated with the information about a single feature. The highly linear correlated features might be selected by this approach.

29
**Wrapper Model – IRSVM Find a Linear Classifier:**

Randomly choose a very small feature subset from the input features as the initial feature reduced set. Select a feature vector not in the current feature reduced set and computing the distance between this vector and the space spanned by current feature reduced set. If the distance is larger than a given gap, then we add this feature vector to the feature reduced set. Repeat step II and step III until there are no feature can be added to the current feature reduced set. Features in the resulting feature reduced set is our final result of feature selection.

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google