Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola.

Similar presentations


Presentation on theme: "Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola."— Presentation transcript:

1 Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola Ancona a a)Istituto di Studi sui Sistemi Intelligenti per l’Automazione, CNR, Via Amendola 122/D-I, 70126 Bari, Italy, b)Istituto di Tecnologie Biomediche-Sezione di Bari,CNR, Via Amendola 122/D, 70126 Bari Italy c)Dipartimento Scienze Biomolecolari e Biotecnologie, Università di Milano, Via Caloria 26, 20133 Milano, Italy Abstract SVM[1] are the state-of-the-art supervised learning techniques for cancer classification. Other machine learning approaches such as RLS[2] classifiers may represent highly suitable alternative for their simplicity and reliability. We compared the performances of the RLS classifiers with SVM on three different benchmark data sets, also with respect to the number of selected genes and different gene selection strategies. We show that RLS classifiers have performances comparable to SVM classifiers expressed in terms of the LOO-error. The main advantage of RLS machines is that for solving a classification problem they use a linear system of order equal to the number of training examples. Moreover RLS machines allow to get an exact measure of the LOO error with just one training. Benchmark Data set description Leukemia data set [3]. 25 examples of Acute Myeloid Leukemia (AML) vs 47 examples of Acute Lymphoblastic one (ALL), divided into training and test set; Each sample consists of 7129 human gene expression levels (see www.genome.wi.mit.edu/MPR). Colon data set [4]. 40 examples of Tumor Colon tissue vs 22 Normal Colon tissue samples. Each sample consists of 2000 human gene expression levels (see www.molbio.princeton.edu/colondata). Multi-cancer data set [5]. 190 examples relative to Cancer tissues, spanning 14 common tumor types, vs 90 Normal tissue samples; each example consists of the expression levels of 16063 genes (see www.genome.wi.mit.edu/MPR/GCM.html). SVMRLS LOO error on Leukemia training set22 Leukemia test error33 LOO error on Leukemia data set12 LOO error on Colon data set89 LOO error on Multi-Cancer data set8890 RLS computes the LOO error in just one training by using all the training exmples GENE SELECTION strategies Two techniques are used to rank the genes and a not parametric permutation test is used to determine how many genes are really important for classifying a given specimen: 999 genes in the Leukemia data set, 500 in the Colon one and 1400 in the Multi-Cancer one. S2N StatisticNRFE Statistic with j=1, 2, …., number of genes Visualization of the Statistic S2N 47 examples ALL25 examples AML HP HN Observed T S2N (j) distribution computed on the Leukemia data set compared to randomly permutated class distinctions. S2N Statistic LeukemiaColonMulti-Cancer genesSVMRLSgenesSVMRLSgenesSVMRLS 999125004614005346 99124005610005047 4911300565005242 3922200763005143 2922100872005045 193350871006640 9111088505637 524579106559 NRFE Statistic LeukemiaColonMulti-Cancer genesSVMRLSgenesSVMRLSgenesSVMRLS 999005004314004637 99004004310004139 4900300435003230 39112003330029 001003320027 193350331005135 969101112505243 5611515141070 Conclusions The RLS classifiers have performances comparable to the ones of SVM classifiers for the problem of cancer classification by gene expression data and are a valuable alternative to SVM because they enjoy several interesting properties. RLS machines are fast and easy to implement and, more important, they allow to measure the exact LOO error performing one training only. References [1] Vapnik, V. Statistical Learning Theory, John Wiley & Sons, INC.,1998. [2] Tikhonov, A.N. Arsenin, V. Y. Solutions of ill-posed problems, W.H. Winston Washington D.C., 1977 [3]Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caliguri, M.A., Bloomfield, C.D., Lander, E.S., (1999) Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science, 286, 531-537. [4]Alon,U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.(1999) Broad patterns of gene expression revealed by clustering analysis of tumor and colon tissues probed by oligonucleotide arrays, PNAS, 96,6745-6750. [5]Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.R. (2001) Multi-class cancer diagnosis using tumor gene expression signatures PNAS, 98,15149-15154.


Download ppt "Cancer classification by Regularized Least Square Classifiers Annarita D’Addabbo a, Rosalia Maglietta a, Sabino Liuni b, Graziano Pesole b,c and Nicola."

Similar presentations


Ads by Google