Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Basic Gene Expression Data Analysis--Clustering
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Cluster Analysis: Basic Concepts and Algorithms
Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro Tom Khabaza Sridhar Ramaswamy Presented briefly by Joey.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
M. Kathleen Kerr “Design Considerations for Efficient and Effective Microarray Studies” Biometrics 59, ; December 2003 Biostatistics Article Oncology.
Wenting Zhou, Weichen Wu, Nathan Palmer, Emily Mower, Noah Daniels, Lenore Cowen, Anselm Blumer Tufts University Microarray Data.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
T. R. Golub, D. K. Slonim & Others Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer.
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
Microarrays Dr Peter Smooker,
Mutual Information Mathematical Biology Seminar
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Transcriptional Control in Eukaryotes Background Information Microarrays.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
1 Microarray Cancer Data Visualization Analysis in Relation to Pharmacogenomics By Ngozi Nwana.
. Differentially Expressed Genes, Class Discovery & Classification.
Discrimination Methods As Used In Gene Array Analysis.
Fuzzy K means.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
Computational Biology Algorithmic Techniques & Medical Applications CSE 590YA August 15, 2001.
1 April, 2005 Chapter C4.1 and C5.1 DNA Microarrays and Cancer.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Gene expression profiling identifies molecular subtypes of gliomas
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
Sp’10Bafna/Ideker Classification (SVMs / Kernel method)
Gene expression & Clustering (Chapter 10)
CZ5225: Modeling and Simulation in Biology Lecture 6, Microarray Cancer Classification Prof. Chen Yu Zong Tel:
Whole Genome Expression Analysis
1 A Presentation of ‘Bayesian Models for Gene Expression With DNA Microarray Data’ by Ibrahim, Chen, and Gray Presentation By Lara DePadilla.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Analysis of Microarray Data Analysis of images Preprocessing of gene expression data Normalization of data –Subtraction of Background Noise –Global/local.
Microarrays.
Microarray - Leukemia vs. normal GeneChip System.
1 Critical Review of Published Microarray Studies for Cancer Outcome and Guidelines on Statistical Analysis and Reporting Authors: A. Dupuy and R.M. Simon.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Statistical Analysis of DNA Microarray. An Example of HDLSS in Genetics.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Examples of Classifying Expression Data / 7.90 Computational Functional Genomics Spring 2002.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Classifiers!!! BCH364C/394P Systems Biology / Bioinformatics
Molecular Classification of Cancer
Volume 1, Issue 2, Pages (March 2002)
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Presentation transcript:

Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring

Overview Motivation Microarray Background Our Test Case Class Prediction Class Discovery

Motivation Importance of cancer classification Cancer classification has historically relied on specific biological insights We will discuss a systematic and unbiased approach for recognizing tumor subtypes

Microarray Background Microarrays enable simultaneous measurement of the expression levels of thousands of genes in a sample Microarray: –Glass slide with a matrix of thousands of spots printed on to it –Each spot contains probes which bind to a specific gene

Microarray Background (cont.) The process: –DNA samples are taken from the test subjects –Samples are dyed with fluorescent colors and placed on the Microarray –Hybridization of DNA and cDNA The result: –Spots in the array are dyed in shades of red to green

Microarray Background (cont.) Microarray data is translated into an n x p table (p – number of genes, n – number of samples) Gene Gene Gene Gene 1 Sample 2Sample 1

Demonstration

Our Test Case 38 bone marrow samples from acute leukemia patients (27 ALL, 11 AML) RNA from the samples was hybridized to microarrays containing probes for 6817 human genes For each gene, an expression level was obtained

Class Prediction Initial collection of samples belonging to known classes Goal: create a “ class predictor ” to classify new samples –Look for “informative genes” –Make a prediction based on these genes –Test the validity of the predictor

Informative genes Genes whose expression pattern is strongly correlated with the class distinction strongly correlated poorly correlated

Neighborhood Analysis Are the observed correlations stronger than would be expected by chance? C* is a random permutation of C. Represents a random class distinction C represents the AML/ALL class distinction

Application to the Test Case Roughly 1100 genes were more highly correlated with the AML-ALL class distinction than would be expected by chance

Make a Prediction Use a fixed subset of “informative genes” (most correlated with the class distinction) Make a prediction on the basis of the expression level of these genes in a new sample

Prediction Algorithm Each gene G i votes, depending on whether its expression level X i in the sample is closer to µ AML or µ ALL The magnitude of the vote is W i V i –W i reflects how well the gene is correlated with the class distinction – reflects the deviation of X i from the average of µ AML and µ ALL

Prediction Algorithm (cont.) The votes for each class are summed to obtain total votes V AML and V ALL

Prediction Algorithm (cont.) The prediction strength is calculated: The sample is assigned to the winning class provided that the PS exceeds a predetermined threshold (0.3 in the test case)

Testing the Validity of Class Predictors Cross Validation –withhold a sample –build a predictor based on the remaining samples –predict the class of the withheld sample –repeat for each sample Assess accuracy on an independent set of samples

Application to the Test Case 50 genes most highly correlated with the AML-ALL distinction were chosen A class predictor based on these genes was built

Application to the Test Case Performance in cross validation: –Out of 38 samples there were 36 predictions and 2 uncertainties (PS < 0.3) –100% accuracy –PS median 0.77

Application to the Test Case (cont.) Performance on an independent set of samples: –Out of 34 samples there were 29 predictions and 5 uncertainties (PS < 0.3) –100% accuracy –PS median 0.73

Genes useful for cancer class prediction may also provide insight into cancer pathogenesis and pharmacology Comments Why 50 genes? –Large enough to be robust against noise –Small enough to be readily applied in a clinical setting –Predictors based on between 10 to 200 genes all performed well

Comments (cont.) Creation of a new predictor involves expression analysis of thousands of genes Application of the predictor then requires only monitoring the expression level of few informative genes

Class Discovery Cluster tumors by gene expression –Apply a clustering technique to produce presumed classes Evaluation of the Classes: –Are the classes meaningful? –Do they reflect true structure?

Clustering Technique - SOMs SOMs – Self Organizing Maps Well suited for identifying a small number of prominent classes –Find an optimal set of “centroids” –Partition the data set according to the centroids –Each centroid defines a cluster consisting of the data points nearest to it We won't go into details about the calculation of SOMs

Application of a two-cluster SOM to the test case Class A1: 24 ALL, 1 AML Class A2: 10 AML, 3 AML Quite effective at automatically discovering the two types of leukemia Not perfect

Evaluation of the Classes How can we evaluate such classes if the “right” answer is not already known? Hypothesis: class discovery can be tested by class prediction –If the classes reflect true structure, then a class predictor based on them should perform well Let’s test this hypothesis...

Validity of Predictors Based on A1 and A2 Predictors based on different numbers of informative genes performed well For example: a 20-gene predictor

Validity of Predictors Based on A1 and A2 cont. Performance on independent samples: –PS median 0.61 –Prediction made for 74% of samples

Validity of Predictors Based on A1 and A2 cont. Performance in cross validation: –34 accurate predictions with high prediction strength –One error –Three uncertains

the one cross validation error 2 of the 3 cross validation uncertains

Iterative Procedure Use a SOM to initially cluster the data Construct a predictor Remove samples that are not correctly predicted in cross-validation Use the remaining samples to generate an improved predictor Test on an independent data set

Performance: –Poor accuracy in cross validation –Low PS on independent samples Validity of Predictors Based on Random Clusters

Conclusion The AML-ALL distinction could have been automatically discovered and confirmed without previous biological knowledge

Application of a 4-cluster SOM to the Test Case

Evaluation of the Classes Complement approach: –Construct class predictors to distinguish each class from its complement Pair-wise approach: –Construct class predictors to distinguish between each pair of classes C i,C j –Perform cross validation only on samples in C i and C j

Evaluation of the Classes Class predictors distinguished the classes from one another, with the exception of B3 versus B4

Conclusion The results suggest the merging of classes B3 and B4 The distinction corresponding to AML, B-ALL and T-ALL was confirmed

Uses of Class Discovery Identify fundamental subtypes of any cancer Search for fundamental mechanisms that cut across distinct types of cancers

Questions? Thank you for listening