Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems,

Similar presentations


Presentation on theme: "A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems,"— Presentation transcript:

1 A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems, Vol.17,MIT Press,2004

2 2006-12-13S.Y.C.2 Abstract Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems It is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs.

3 2006-12-13S.Y.C.3 Abstract (Cont.) The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset.

4 2006-12-13S.Y.C.4 Outline Introduction Introduction to SVM A New Conditional Mixture of SVMs Experiments Conclusion

5 2006-12-13S.Y.C.5 Introduction SVMs require to solve a quadratic optimization problem which needs resources that are at least quadratic in the number of training examples, and it is thus hopeless to try solving problems having millions of examples using classical SVMs.

6 2006-12-13S.Y.C.6 Introduction (Cont.) We propose a mixture of several SVMs, each of them trained only on a part of the dataset.

7 2006-12-13S.Y.C.7 Introduction to SVM training data ( x i, y i ), i =1,…,l.  x i, y i , x i  R n y i = 1 x i  class 1 x i  class 2

8 2006-12-13S.Y.C.8 Introduction to SVM (Cont.) Making decision boundary

9 2006-12-13S.Y.C.9 Introduction to SVM (Cont.) Large-margin Decision Boundary

10 2006-12-13S.Y.C.10 A New Conditional Mixture of SVMs The output of the mixture for an input vector x is computed as follows: In the proposed model, the gater is trained to minimize the cost function

11 2006-12-13S.Y.C.11 A New Conditional Mixture of SVMs (Cont.) To train this model algorithm –Divide the training set into M random subsets of size near N/M. –Train each expert separately over one of these subsets. –Keeping the experts fixed, train the gater to minimize on the whole training set.

12 2006-12-13S.Y.C.12 A New Conditional Mixture of SVMs (Cont.) –Reconstruct M subsets –If a termination criterion is not fullled (such as a given number of iterations or a validation error going up), goto step 2.

13 2006-12-13S.Y.C.13 Experiments A Toy Problem

14 2006-12-13S.Y.C.14 Experiments (Cont.) A Large-Scale Realistic Problem –Kept a separate test set of 50,000 examples –Used a validation set of 10,000 examples to select the best mixture of SVMs, –Trained our models on dierent training sets. –The mixtures had from 10 to 50 expert SVMs with Gaussian kernel and the gater was an MLP with between 25 and 500 hidden units.

15 2006-12-13S.Y.C.15 Experiments (Cont.) Comparison of performance

16 2006-12-13S.Y.C.16 Experiments (Cont.) Comparison of the validation error of different mixtures of SVMs with various number of hidden units and experts.

17 2006-12-13S.Y.C.17 Experiments (Cont.) Verication on Another Large-Scale Problem

18 2006-12-13S.Y.C.18 Experiments (Cont.)

19 2006-12-13S.Y.C.19 Conclusion In this paper we have presented a new algorithm to train a mixture of SVMs We conjecture it to be the case for very large data sets, then the whole method is clearly sub- quadratic in training time with respect to the number of training examples.


Download ppt "A Parallel Mixture of SVMs for Very Large Scale Problems Ronan Collobert Samy Bengio Yoshua Bengio Prepared : S.Y.C. Neural Information Processing Systems,"

Similar presentations


Ads by Google