Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
September 2002 Center for Statistics, transnational University Limburg, Hasselt, Belgium and J&J PRD, Janssen Pharmaceutica, Beerse, Belgium 1 Graphical.
BASIC METHODOLOGIES OF ANALYSIS: SUPERVISED ANALYSIS: HYPOTHESIS TESTING USING CLINICAL INFORMATION (MLL VS NO TRANS.) IDENTIFY DIFFERENTIATING GENES Basic.
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
The Broad Institute of MIT and Harvard Clustering.
T. R. Golub, D. K. Slonim & Others Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Transcriptional Control in Eukaryotes Background Information Microarrays.
Bio277 Lab 2: Clustering and Classification of Microarray Data Jess Mar Department of Biostatistics Quackenbush Lab DFCI
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
MACHINE LEARNING TECHNIQUES IN BIO-INFORMATICS
Introduction to Bioinformatics - Tutorial no. 12
DIMACS Workshop on Machine Learning Techniques in Bioinformatics 1 Cancer Classification with Data-dependent Kernels Anne Ya Zhang (with Xue-wen.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Gene expression profiling identifies molecular subtypes of gliomas
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek.
Sp’10Bafna/Ideker Classification (SVMs / Kernel method)
Presented By Wanchen Lu 2/25/2013
CZ5225: Modeling and Simulation in Biology Lecture 6, Microarray Cancer Classification Prof. Chen Yu Zong Tel:
Whole Genome Expression Analysis
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
Self-organizing Maps Kevin Pang. Goal Research SOMs Research SOMs Create an introductory tutorial on the algorithm Create an introductory tutorial on.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
The Broad Institute of MIT and Harvard Classification / Prediction.
Microarrays.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Examples of Classifying Expression Data / 7.90 Computational Functional Genomics Spring 2002.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Big data classification using neural network
Cluster Analysis II 10/03/2012.
Lab 4.1 From Database to Data mining
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Classifiers!!! BCH364C/394P Systems Biology / Bioinformatics
Gene Expression Classification
Molecular Classification of Cancer
Clustering vs. Classification
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Presentation transcript:

Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)

Introduction Why is Identification of Cancer Class (tumor sub- type) important?  Cancers of Identical grade can have widely variable clinical courses (i.e. acute lymphoblastic leukemia, or Acute myeloid leukemia). Tradition Method:  Morphological appearance.  Enzyme-based histochemical analyses.  Immunophenotyping.  Cytogenetic analysis.

Topics of Discussion Class Prediction (supervised learning). Class Discovery (unsupervised learning).

Class Prediction How could one use an initial collection of samples belonging to know classes to create a class Predictor?  Identification of Informative Genes via Neighborhood Analysis.  Weighted Vote

Neighborhood Analysis Why do we want to start with informative genes?  To be readily applied in a clinical setting.  Highly instructive

Neighborhood Analysis 1. v(g) = (e 1, e 2,..., e n ) 2. c = (c 1, c 2,..., c n ) 3. Compute the correlation between v(g) and c. 1. Euclidean distance 2. Pearson correlation coefficient. 3. P(g,c) = [µ 1 (g) - µ 2 (g)]/[ σ 1 (g) + σ 2 (g)]

Neighborhood Analysis

Class Predictor via Gene Voting 1. Parameters (a g, b g ) are defined for each informative gene 2. a g = P(g,c) 3. b g = [µ 1 (g) + µ 2 (g)]/2 4. v g = a g (x g - b g ) 5. V 1 = ∑ | V g |; for V g > 0 6. V 2 = ∑ | V g |; for V g < 0 7. PS = (V win - V lose )/(V win + V lose ) 8. The sample was assigned to the winning class for PS > threshold.

Class Predictor via Gene Voting

Data Initial Sample: 38 Bone Marrow Samples (27 ALL, 11 AML) obtained at the time of diagnosis. Independent Sample: 34 leukemia consisted of 24 bone marrow and 10 peripheral blood samples (20 ALL and 14 AML).

Neighborhood Analysis

Validation of Gene Voting Initial Samples: 36 of the 38 samples as either AML or ALL and two as uncertain. All 36 samples agrees with clinical diagnosis. Independent Samples: 29 of 34 samples are strongly predicted with 100% accuracy.

Validation of Gene Voting

Class Discovery Can cancer classes be discovered automatically based on gene expression?  Cluster tumors by gene expression  Determine whether the putative classes produced are meaningful.

Cluster tumors  Self-organization Map (SOM)  Mathematical cluster analysis for recognizing and clasifying feautres in complex, multidimensional data (similar to K-mean approach)  Chooses a geometry of “nodes”  Nodes are mapped into K-dimensional space, initially at random.  Iteratively adjust the nodes.

Adjusting the nodes Randomly select a data point P. Move the nodes in the direction of P. The closest node N p is moved the most. Other nodes are moved depending on their distance from N p in the initial geometry.

SOM

Validation of SOM Prediction based on cluster A1 and A2:  24/25 of the ALL samples from initial dataset were clustered in group A1  10/13 of the AML samples from initial dataset were clustered in group A2

Validation of SOM How could one evaluate the putative cluster if the “right” answer were not known?  Assumption: class discovery could be tested by class prediction.  Testing of Assumption: Construct Predictors based on clusters A1 and A2. Construct Predictors based on random clusters

Validation of SOM Predictions using predictors based on clusters A1 and A2 yields 34 accurate predictions, one error and three uncertains.

Validation of SOM

Searching for Finder Class Use SOM to divide the initial samples into four clusters (denoted B1 to B4) B1 corresponds to AML, B2 corresponds to T- lineage ALL, B3 and B4 corresponds to B-lineage ALL.