Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International Conference on Machine Learning (ICML-03), August 2003 Presented by Despina Kontos CIS 525 Neural Computation Spring 2004 Instructor: S.Vucetic

Overview Introduction  Motivation and the main idea. Background and related work  A little bit about Kernel Machines (KMs) and previous work. Methodology  The Nearest Support Vectors (NSVs).  Some enhancements. Experiments and results Discussion

Introduction Why Kernel Machines??  They overcome the “curse of dimensionality”, using kernel functions, while exploring large nonlinear feature spaces. What is the problem??  The tradeoff for this power is that a KM's query-time complexity scales linearly with the number of Support Vectors, making KMs often orders of magnitude more expensive at query-time than other popular machine learning alternatives.  KM costs are identical for each query, even for “easy” ones that alternatives (e.g. decision trees) can classify much faster than harder ones.

Introduction So, what would be an ideal approach?  Use a simple linearclassifier for the (majority of) queries it is likely to correctly classify.  Implement the query-time cost exact KM only for those queries for which such precision likely matters.  For the rest of the cases, use something in between with complexity proportional to the difficulty of the query. A new idea!!  One can often achieve the same classification as the exact KM by using only small fraction of the nearest support vectors (SVs) of a query.  Approximate the exact KM with a k nearest-neighbor (k-NN) KM, whose output sums only over the (weighted) kernel values involving the k nearest (according to some distance) selected SVs.

Background Kernel Machines Summary  Binary SVM classifier is trained by optimizing an n-by-1 weighting vector to satisfy the Quadratic Programming (QP) dual form:  The kernel avoids curse of dimensionality by projecting any two d-dimensional example vectors into feature space vectors returning their dot product:  Popular kernels include:  The exact KM output f(x) is computed via:

Some related work Early methods compressed a KM's SVs into a reduced set, in order to reduce the query time costs. When small ρ ≈ 0 can be achieved with n z «n speedups with little loss of classification accuracy have been reported. Problem: A key problem with all such reduced set approaches is that they do not provide any guarantees or control concerning how much classification error might be introduced by such approximations.

Methodology The intuition behind the NEW idea:  Order the SVs for each query using a distance metric and use the k nearest-neighboring (w.r.t the query sample) SVs. The largest terms tend to get added first.  During incremental computation of the KM, once the partial KM output leans “strong enough” either positively or negatively, it will not be able to completely change sign as remaining β i K(X i,x) terms are added.  Small k nearest-neighbor classifiers can often classify well, but that the best k will vary from query to query.

Methodology Nearest Support Vectors (NSV)  Let NSV’s distance like scoring be defined as: The β i K(Xi,x) terms corresponding to the NNscore-ordered SVs tend to follow a steady progression, such that soon the remaining terms become too small to overcome any strong leanings.

Methodology The main algorithm:

Methodology Statistical thresholds for NSV  Derive thresholds L k and H k by running the algorithm over a large representative sample of pre-query data.  Compute L k as the minimum value of g k (x) over all x such that g k (x) 0. This identifies L k as the worst-case wrong-way leaning of any sample that the exact KM classifies as positive. Similarly, H k is assigned the maximum g k (x) such that g k (x) > 0 and f(x) < 0.  In practice, the test and training data distributions will not be identical. We can replace each H k (L k ) with the maximum (minimum) of all threshold values over adjacent steps k-w through k+w (variation using a window w).

Methodology Sorting NSVs by NNscore i (x) leads to relatively wide and skewed thresholds whenever there is imbalance in the number of positive SVs versus negative SVs.  Adjusting the NNscore-based ordering so that the cumulative sums of the positive β and the negative β at each step k are as equal as possible. Full linear scan for searching the k-nearest neighbors can be very computationally expensive even when using indexing techniques.  Perform pre-query principal component analysis (PCA) on the matrix of SVs. Use these small k-dimensional vectors, to approximate kernels and to order NSVs for each Q as needed.

Methodology Some enhancements  Use a linear SVM as an initial filter. Compute the threshold bounds as before, except using the linear SVM’s output for the first step of the computation.  Generate additional “difficult” data in order to obtain better threshold levels from the representative sample.

Experiments and results Data: MNIST dataset (digit recognition) large input dimensionality large number of SVs

Experiments and results Speedup advantage compared to accuracy loss

Conclusions A new Kernel Machine at query time implementing a k nearest neighbor approach to improve performance. The approach is applicable to any form of Kernel Machine classifier, regardless of the way it is trained. Some exciting speedup results are reported without significant loss in accuracy. Future work toward combining the machine learning methods of kernels, nearest-neighbors and decision trees.

Any questions???.....THANK YOU!!!!

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Similar presentations

Presentation on theme: "Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Similar presentations

Presentation on theme: "Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International."— Presentation transcript:

Similar presentations

About project

Feedback