Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

A Fast Iterative Algorithm for Fisher Discriminant using Heterogeneous Kernels
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi Computer Aided Diagnosis, Siemens Medical Solutions

Outline Linear Fisher’s Discriminant (LFD)
Traditional Formulation Mathematical Programming Formulation Kernel Fisher’s Discriminant Automatic heterogeneous kernel selection Formulation Automatic kernel selection KFD Algorithm Numerical experience Publicly available datasets Real life CAD Colon Cancer detection dataset Conclusions and Outlook

Linear Fisher’s Discriminant (LFD)
Taken from:

Notation Given m points in n dimensional space
Represented by an m-by-n matrix A Membership of each data point (row of A) in class +1 or –1. We want to find a separating hyperplane: Such that

LFD Classical Formulation
We want to find the projection that maximizes the quotient: Where, Which are the the between and within class scatter matrices is the mean of class and is an dimensional vector of ones.

LFD Mathematical programming formulation
The LFD problem can be also be formulated as a Quadratic programming program (QP) Where The variable is a positive constant introduced in (Mika et al., 2000) to address the problem of ill-conditioning of the estimated covariance matrices.

Kernel Fischer Discriminant (KFD)
From the KKT conditions for the LFD mathematical programming formulation we obtain the following relation: Thus, we have: Applying the “kernel trick”:

Kernel Functions The nonlinear separating surface is given by:
Commonly used kernels: Gaussian: Polynomial: It is well known that kernels are very powerful but difficult to choose and tune for an specific classification task.

Heterogeneous Kernels
Instead of using a predefined kernel a suitable kernel the problem of choosing a kernel can be incorporated in the optimization problem. In this work we consider nonnegative linear combinations of kernels belonging to a given family of positive semidefinite kernels: The set can be seen as as a predefined set of initial ``guesses" of the kernel matrix. Note that S could contain very different kernel matrix models, e.g., linear, Gaussian, polynomial, all with different parameter values.

Heterogeneous kernels KFD Formulation
Instead of pre-selecting and tuning the kernel we optimize the set of values in order to obtain a PSD linear combination of elements of S. The problem becomes: For capacity control we add an extra regularization term for the coefficients : The above formulation is a convex optimization problem

Heterogeneous kernels KFD as a biconvex program (I).
The optimization can be seen as a biconvex program of the form Where: and

Heterogeneous kernels KFD as a biconvex program (II).
When is fixed the problem becomes With Which is an unconstrained quadratic optimization problem that can be solved by solving a simple system of linear equations.

Heterogeneous kernels KFD as a biconvex program (III).
When is fixed the problem becomes With Which is a constrained quadratic optimization problem (QP) in only k variables.

Heterogeneous KFD algorithm
Input For Calculate: Solve unconstrained convex QP: For Calculate: Solve second QP convex problem: Output

Heterogeneous KFD algorithm: Convergence
The Heterogeneous KFD algorithm can be seen as an Alternate Optimization problem (AO) (Fuzzy c-means clustering is another example of AO problems) Our algorithm inherits the convergence properties and characteristics of (AO) problems. Local q-linear convergence, in practice is very fast and it converges in a few iterations. Can converge to a saddle point (local minimizer in a subset of the variables) but it is very unlikely to happen.

Classification accuracy on five publicly available datasets
m x n Heterogeneous KFD KFD + Kernel tuning P-value Ionosphere 351 x 34 94.7% 92.7% 0.03 Housing 506 x 13 89.9% 89.4% 0.40 Cleveland Heart 297 x 13 79.7% 82.2% 0.04 Pima Indians 768 x 8 74.1% 74.4% 0.7 BUPA Liver 345 x 6 70.9% 70.5% 0.75

Performance time on five publicly available datasets
m x n Heterogeneous KFD Secs. KFD + Kernel tuning (Secs.) Ionosphere 351 x 34 55.3 350.0 Housing 506 x 13 134.4 336.9 Cleveland Heart 297 x 13 39.7 109.2 Pima Indians 768 x 8 341.5 598.4 BUPA Liver 345 x 6 48.2 81.7

Siemens Colon CAD System
Colorectal cancer is the third most common cancer in both men and women. Recent studies (Yee et al., 2003) have estimated that in 2003, nearly 150,000 cases of colon and rectal cancer would be diagnosed in the US, and more than 57,000 people would die from the disease, accounting for about 10% of all cancer deaths. A polyp is an small tumor that projects from the inner walls of the intestine or rectum. Early detection of polyps in the colon is critical because polyps can turn into cancerous tumors if they are not detected in the polyp stage. dataset consisting of 300 candidates, 145 candidates are labeled as a polyp and 155 as non-polyps. Each candidate is represented by a vector of 14 features that have the most discriminating power according to a feature selection pre-processing stage.

Colon CAD CAD marker only

Numerical results: Colon CAD dataset
The standard KFD performed in an average time of seconds over ten runs and an average test set correctness of 73.4 %. The A-KFD performed in an average time of seconds with an average test set correctness of 72.4 %. paired t-test at 95% confidence indicates that there is no significant difference between both methods in this dataset.

Conclusions We proposed a simple procedure for generating heterogeneous KFD classifier where the kernel model is defined to be a linear combination of members of a potentially larger pre-defined family of heterogeneous kernels our proposed algorithm only requires solving: a simple nonsingular system of linear equations of the size of the number of training points m solving a quadratic programming problem that is usually very small since it depends on the predefined number of kernels on the kernel family (5 in our experiments). Empirical results show that the proposed method compared to the standard KFD where the kernel is selected by a cross-validation tuning procedure, is several times faster with no significant impact on generalization performance.

Future Directions Extension to regularized networks: SVM, LS-SVM,KFD
Generalized convergence analysis Extension to transduction

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Similar presentations

Presentation on theme: "Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Similar presentations

Presentation on theme: "Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi"— Presentation transcript:

Similar presentations

About project

Feedback