Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Support Vector Machines
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
More On Preprocessing Javier Cabrera. Outline 1.Transform the data into a scale suitable for analysis. 2.Remove the effects of systematic and obfuscating.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Classification and Diagnostic of Cancers
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
Reduced Support Vector Machine
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Feature Selection Lecture 5
DIMACS Workshop on Machine Learning Techniques in Bioinformatics 1 Cancer Classification with Data-dependent Kernels Anne Ya Zhang (with Xue-wen.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Chapter 5 Data mining : A Closer Look.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
Gene based diagnostic prediction of cancers by using Artificial Neural Network Liya Wang ECE/CS/ME539.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
Matlab Matlab Sigmoid Sigmoid Perceptron Perceptron Linear Linear Training Training Small, Round Blue-Cell Tumor Classification Example Small, Round Blue-Cell.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
The Broad Institute of MIT and Harvard Classification / Prediction.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Classification of microarray samples Tim Beißbarth Mini-Group Meeting
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Today Ensemble Methods. Recap of the course. Classifier Fusion
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Classification of tissues and samples 指導老師:藍清隆 演講者:張許恩、王人禾.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks From Nature Medicine 7(6) 2001 By Javed.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Chapter 7. Classification and Prediction
Mammogram Analysis – Tumor classification
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Trees, bagging, boosting, and stacking
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Gene Expression Classification
Molecular Classification of Cancer
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning Week 1.
Support Vector Machines
OVERVIEW OF BIOLOGICAL NEURONS
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Somi Jacob and Christian Bach
Physics-guided machine learning for milling stability:
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
What is Artificial Intelligence?
Presentation transcript:

Classification of multiple cancer types by multicategory support vector machines using gene expression data

Support Vector Machine A classification method which successfully diagnosis cancer problems A classification method which successfully diagnosis cancer problems Two types Two types Binary SVM: optimal extension to more than two classes not seen therefore limitation on its application to multiple tumor types Binary SVM: optimal extension to more than two classes not seen therefore limitation on its application to multiple tumor types Multicategory SVM: (recently proposed) Demonstrated on leukemia data and small round blue cells of childhood tumor. Multicategory SVM: (recently proposed) Demonstrated on leukemia data and small round blue cells of childhood tumor.

DNA microarray techonology This method measures the relative amount of mRNA in isolated cells or biosped tissues This method measures the relative amount of mRNA in isolated cells or biosped tissues Uses SVM, solves a series of binary problems- DAG SVM algorithm Uses SVM, solves a series of binary problems- DAG SVM algorithm MSVM is applied to two gene expression data sets MSVM is applied to two gene expression data sets

Features Effectiveness Effectiveness Prediction strength Prediction strength Effect of data preprocessing Effect of data preprocessing Gene selection Gene selection Dimension reduction Dimension reduction

Binary SVM

MSVM

Procedure- 3 class problem Gene expression was monitored for classification of 2 leukemias ALL acute lymphoblastic leukemia) and AML ( acute myeloid leukemia) Gene expression was monitored for classification of 2 leukemias ALL acute lymphoblastic leukemia) and AML ( acute myeloid leukemia) ALL ALL B-cell B-cell T-cell T-cell

Procedure conc. Number of genes 7129 Number of genes samples- training set 38 samples- training set 34 samples- test set 34 samples- test set Preprocessing steps performed Preprocessing steps performed Thresholding(floor-100, ceiling 16000) Thresholding(floor-100, ceiling 16000) Filtering of genes (max/min <= 5 and max- min< =500) Filtering of genes (max/min <= 5 and max- min< =500) Base 10 logarithmic transformation Base 10 logarithmic transformation

Procedure conc. Standardization of each variable Standardization of each variable Variable selection Variable selection Prescreening measure – ratio of between classes sum of squares to within class sum of squares for each gene( largest ratios taken) Prescreening measure – ratio of between classes sum of squares to within class sum of squares for each gene( largest ratios taken)

Heat Map of 40 most important genes in training set

Small round blue cell tumors data (SRBCTs) 4 types 4 types Neuroblastoma (NB) Neuroblastoma (NB) Rhabdomyosarcoma (RMS) Rhabdomyosarcoma (RMS) Non Hodgkin lymphoma (NHL) Non Hodgkin lymphoma (NHL) Ewing family of tumors ( EWS) Ewing family of tumors ( EWS)

Used Artificial Neural Networks (ANN) Used Artificial Neural Networks (ANN) Training set – 63 samples Training set – 63 samples Test set – 20 samples Test set – 20 samples Nearest Neighbor, weighted voting, linear SVM was applied to data Nearest Neighbor, weighted voting, linear SVM was applied to data MSVM was applied for comparison MSVM was applied for comparison Logarithm base 10 of expression levels Logarithm base 10 of expression levels

Predicted decision vectors

SANN For multiclass classification For multiclass classification Classification results superior to ANN Classification results superior to ANN ANN uses back propagation algorithm ANN uses back propagation algorithm Why ? Why ? Non linear connections Non linear connections Inclusion of interactions within independent variables input) Inclusion of interactions within independent variables input) Independence from conventional processes Independence from conventional processes

Limitations Learned knowledge is contained 100’s-1000’s weights (synapses) Learned knowledge is contained 100’s-1000’s weights (synapses) Cannot be analyzed in a single regression formula Cannot be analyzed in a single regression formula

Combining several ANNs Through ensembles of networks Through ensembles of networks An ensemble: collection of finite number of different classifiers Cascading ANNs Cascading ANNs

Two level ANN Two level ANN Task : Chest Radiograms Task : Chest Radiograms Lung Nodules( Class A) Lung Nodules( Class A) Without Lung Nodules( Class B) Without Lung Nodules( Class B)

Two level architecture carrying lower level and higher level concepts Task: differentiate (higher level) Task: differentiate (higher level) Normal cells (class A) Normal cells (class A) From malignant cells (class B) (lower level) From malignant cells (class B) (lower level) Class B_1 Class B_1 Class B_2 Class B_2 Class B_3 Class B_3 Class B_4 Class B_4

One vs. all Used with SVM Used with SVM K binary classes- distinguish one class from all lumped together K binary classes- distinguish one class from all lumped together Sample assigned to classifier achieving greatest output activity Sample assigned to classifier achieving greatest output activity

ALL Pairs approach Builds K(K-1)/2 Binary classifiers Builds K(K-1)/2 Binary classifiers K-1 binary classifiers distinguish from other classifiers K-1 binary classifiers distinguish from other classifiers Output activities summed up –class with greatest activity is the winning class Output activities summed up –class with greatest activity is the winning class

SANN Oriented to human decision making Oriented to human decision making Exclusion performed- preferences narrowed down Exclusion performed- preferences narrowed down Classification made by first ANN is a preselection for second successive ANN Classification made by first ANN is a preselection for second successive ANN

References 3Dec02.pdf 3Dec02.pdf 3Dec02.pdf 3Dec02.pdf