Presentation is loading. Please wait.

Presentation is loading. Please wait.

SISTA seminar Feb 28, 2002 Preoperative Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1,

Similar presentations


Presentation on theme: "SISTA seminar Feb 28, 2002 Preoperative Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1,"— Presentation transcript:

1 SISTA seminar Feb 28, 2002 Preoperative Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, D. Timmerman 2, I. Vergote 2 1 Department of Electrical Engineering, Katholieke Universiteit Leuven, Leuven, Belgium, 2 Department of Obstetrics and Gynecology, University Hospitals Leuven, Leuven, Belgium

2 SISTA seminar Feb 28, 2002 Overview Introduction Data Exploration LS-SVM and Bayesian evidence framework LS-SVM classifier Bayesian evidence framework Input Selection Sparse Approximation Model Building and Model Evaluation Conclusions

3 SISTA seminar Feb 28, 2002 Introduction Problem ovarian masses: a common problem in gynecology (1/70 women). ovarian cancer : high mortality rate early detection of ovarian cancer is difficult treatment and management of different types of ovarian tumors differs greatly. develop a reliable diagnostic tool to preoperatively discriminate between benign and malignant tumors. assist clinicians in choosing the appropriate treatment. techniques for preoperative evaluation Serum tumor maker: CA125 blood test Transvaginal ultrasonography Color Doppler imaging and blood flow indexing

4 SISTA seminar Feb 28, 2002 Logistic Regression Artificial neural networks Support Vector Machines Introduction Attempts to automate the diagnosis Risk of malignancy Index (RMI) (Jacobs et al) RMI= score morph × score meno × CA125 Methematical models Bayesian blief network Hybrid Methods Least Squares SVM Bayesian Framework

5 SISTA seminar Feb 28, 2002 Introduction Data Patient data collected at Univ. Hospitals Leuven, Belgium, 1994~ records, 25 features. 291 benign tumors, 134 (32%) malignant tumors

6 SISTA seminar Feb 28, 2002 Introduction Development Process Exploratory Data Analysis Data preprocessing, univariate analysis, PCA, factor analysis… Input Selection Model training Model evaluation Performance measures: Receiver operating characteristic (ROC) analysis Goal: High sensitivity for malignancy low false positive rate. Providing probability of malignancy for individual. ROC curves constructed by plotting the sensitivity versus the 1- specificity, or false positive rate, for varying probability cutoff level. visualization of the relationship between sensitivity and specificity of a test. Area under the ROC curves (AUC) measures the probability of the classifier to correctly classify events and nonevents.

7 SISTA seminar Feb 28, 2002 Data exploration Univariate analysis: preprocessing: e.g. CA_125->log, color_score {1,2,3,4} -> 3 design variables {0,1}.. descriptive statistics, histograms… Demographic, serum marker, color Doppler imaging and morphologic variables

8 SISTA seminar Feb 28, 2002 Data exploration Multivariate analysis: factor analysis biplots Fig. Biplot of Ovarian Tumor data. The observations are plotted as points (o - benign, x - malignant), the variables are plotted as vectors from the origin. - visualization of the correlation between the variables - visualization of the relations between the variables and clusters.

9 SISTA seminar Feb 28, 2002 LS-SVM & Bayesian Framework LS-SVM Kernel based method maps n-dimensional input vector into a higher dimensional feature space where a linear algorithm can be applied. The learning problem: Feature space Mercer’s theorem K(x, z) = Dual space attracting features: good generalization performance, the existing of unique solution, statistical learning theory Positive definite kernel K(.,.) RBF kernel: Linear kernel:

10 SISTA seminar Feb 28, 2002 where the input data x->  (x) are projected to a higher dimensional feature space. One considers the following optimization problem: subject to The lagrangian is defined as where  are Lagrange multipliers. LS-SVM LS-SVM classifier (Suykens & Vandewalle,1999) Given {(x i, y i )} i=1,..,N, with input data x i  R p, and the corresponding output data y i  {-1, 1}. The following model is taken:

11 SISTA seminar Feb 28, 2002 Taking the Kuhn-Tucker conditions for optimality, providing a set of linear equations, eliminating w and e, the solutions are obtained: withY=[y 1 ; …; y N ], 1 v =[1;…;1],  =[  1 ; …,  N ], and  ij = y i y j = y i y j K(x i, x j ) for i, j = 1, …, N The resulting LS-SVM model for classification is LS-SVM LS-SVM classifier (c.t.) Some parameters need to be tuned: Regularization parameter , determine the tradeoff between the minimizing training errors and minimizing the model complexity. Kernel parameters, e.g.  for an RBF kernel. Popular ways for choosing hyper parameters: cross-validation, utilize an upper bound on the generalization error. Our approach: Bayesian method.

12 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework Bayesian Evidence Framework (MacKay 1993) Probability theory and Occam’s razor Bayesian probability theory provides a unifying framework for data modeling. Occam’s razor is needed for model comparison. Each model H i is assumed to have: a vector of parameters w; a prior distribution P(w | H i ); a set of probability distributions one for each value of w, defining the predictions P(D | w, H i ) that the model makes about the data.

13 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework Probability theory and Occam’s razor Model H i are ranked by evaluating the evidence (1) Model fitting (2) Model comparison Assuming choosing equal priors P(H i ) to alternative models, evidence evaluate most probable values for w MP, and summarize the posterior distribution by w MP, and error bars; evaluating the Hessian at w MP, The posterior can be locally approximated as Gaussian with covariance matrix A -1 Evaluating the evidence if the posterior is well approximated by a Gaussian, then

14 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM A Bayesian framework for LS-SVM classifiers (VanGestel and Suykens, 2001) Starting from the feature space formulation, analytic expression are obtained in the dual space on the three levels of Bayesian inference. Posterior class probabilities  marginalizing over the model parameters. subject to with regularization term and sum of squares error while amount of regularization determined by For classification problem with binary target y i =±1, LS-SVM cost function can also be formulized as

15 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM Probability interpretation of LS-SVM classifier (Level1) Applying Bayes rule, the first level of inference is obtained: Assume: data points are independent, target has Gaussian noise e i, the noise level is defined as  2 =1/  Assume: separate Gaussian prior for w and b,  w 2 =1/ , and  b  (uniform distribution) w MP and b MP are obtained by solving a standard LS-SVM in dual space. The posterior probability of model parameter w and b is given by

16 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM Posterior class probability for LS-SVM classifier (Level1) the class probability with Calculated at dual space where Marginalizing over w, yield a Gaussian distributed e ± with mean m e± and variance  e± 2 conditional probability incorporate prior class probability or misclassification cost In our experiments, the prior P(y=+1)=2/3, P(y=-1)=1/3

17 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM Inference of Hyperparameters (Level 2) Applying Bayes rule, the second level of inference is obtained: Assume: uniform distribution in log  and log . Evidence in level 1 The eigenvalue problem A practical way to find  MP,  MP the is to solve first the scalar minimization problem in  The number of effective parameters with

18 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM Bayesian model comparison (Level 3) Applying Bayes rule, the third level of inference is obtained: Assume: uniform distribution Models are ranked by evidence Evidence

19 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM - design Preprocess the data Normalize the training data into zero mean, and variance 1. Test set follows the same normalization as training set. Hyperparameter tuning Select the model H i by choosing a kernel type K i and kernel parameter, e.g.  in RBF kernels. Then the optimal regularization parameter  for model H i is estimated on the second level of inference. The corresponding  MP,  MP and the number of effective parameters  eff can also be estimated. Compute the model evidence P(D|H i ) at the third level of inference. For a kernel K i with tuning parameters, refine the tuning parameters (e.g.  ), such that a higher model evidence P(D|H i ) is obtained.

20 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM - design Input selection under the Bayesian evidence framework Given a certain type of kernel Performs a forward selection (greedy search). Starting from zero variables, the variable which gives the greatest increase in the current model evidence is chosen at each iteration step. The selection is stopped when the adding of any remaining variable can no longer increase the model evidence. 10 variables were selected based on the training set (first treated 265 patient data), using an RBF kernel. l_ca125, pap, sol, colsc3, bilat, meno, asc, shadows, colsc4, irreg

21 SISTA seminar Feb 28, 2002 Bayesian Evidence Framework for LS-SVM - design Sparse approximation Due to the choice of 2-norm in cost function, LS-SVM lost the sparseness compared with standard SVMs. Sparseness can be imposed to LS-SVM by a pruning procedure based upon the support values  i =  e i. We propose to prune the data points which have negative support values. Intuitively, pruning of easy examples will focus the model on the harder cases which lie around the decision boundary. Iteratively prune the data with negative  i, the hyper parameters are retuned several times based on the reduced data set using the Bayesian evidence framework. Stop when no more support values are negative.

22 SISTA seminar Feb 28, 2002 Model Evaluation - Temporal Validation Training set : data from the first treated 265 patients Test set : data from the latest treated 160 patients -- LSSVMrbf -- LSSVMlin -- LR -- RMI ROC curve on training set -- LSSVMrbf -- LSSVMlin -- LR -- RMI ROC curve on test set -- LSSVMrbf -- LSSVMlin -- LR -- RMI ROC curve on test set Performance on Test set * Probability cutoff value: 0.4 and 0.3

23 SISTA seminar Feb 28, 2002 randomly separating training set (n=265) and test set (n=160) Stratified, #malignant : #benign ~ 2:1 for each training and test set. Repeat 30 times Model Evaluation - Randomized Cross-validation Averaged Performance on 30 runs of validations * Probability cutoff value: 0.5 and 0.4 Expected ROC curve on validation

24 SISTA seminar Feb 28, 2002Conclusions Summary Data exploratory analysis helps to analyze the data set. Under the Bayesian evidence framework, choosing of the model regularization and kernel parameters for LS-SVM classifier can be done in a unified way, without the need of selecting additional validation set. A forward input selection procedure which tries to maximize the model evidence has been proved to be able to identify the subset of important variables for model building. A sparse approximation can further improve the generalization performance of the LS-SVM classifiers. LS-SVMs have the potential to give reliable preoperative prediction of malignancy of ovarian tumors. Future work A larger scale validation is still needed. Hybrid methodology, e.g. combine the Bayesian network with the learning of LS-SVM, might be more promising


Download ppt "SISTA seminar Feb 28, 2002 Preoperative Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1,"

Similar presentations


Ads by Google