Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong

Similar presentations


Presentation on theme: "Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong"— Presentation transcript:

1 Recursive Partitioning for Tumor Classification with Gene Expression Microarray Data
Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong Presented by Weihua Huang

2 Data used in the article
Expression profiles of 2,000 genes using an Affymetrix oligonucleotide array in 22 normal and 40 colon cancer tissues The response is binary indicating normal or cancer tissue and the predictor variables are the 2000 genes

3 Classification Tree Using Recursive Partitioning
Goal: To partition the feature space into disjoint regions by growing a tree so that the group in the same region are homogeneous in terms of response. Algorithm: Start with a root node containing the study sample and split it into smaller and smaller nodes according to whether a particular selected predictor is above a chosen cutoff value. At each splitting step, the selected predictor and its corresponding level are chosen to maximize the reduction in node impurity ΔI= P(A)I(A) –P(AL)I(AL) –P(AR)I(AR)

4 Classification Tree using Recursive Partitioning
Node impurity: One example of node impurity is measured by entropy function: - P log(P) - (1-P) log(1-P), where P is the probability of a tissue being normal within the node Minimum impurity ( =0 ) When all tissues are of the same type within the node ( P = 0 or 1) Maximum impurity ( = log2) When half normal tissues and half cancer tissues are within the node (P=0.5)

5 Results From Classification Tree on the Data Fig 1
Results From Classification Tree on the Data Fig 1. Classification tree for tissue types by using expression data from three genes ( M26383, R15447, M28214)

6 Another Way to Visualize the Recursive Partitioning Fig 3
Another Way to Visualize the Recursive Partitioning Fig 3. A scatterplot of expression data from R15447 and M28214 for a subset of tissues (node 3 in Fig. 1).

7 Results from Recursive partitioning
Quality of the tree-based classification: Using localized 5-fold cross validation error rate: The same genes to the same nodes Randomly divide the 40 cancer tissues into 5 subsamples of 8, and the 22 normal tissues into 5 subsamples of 4,4,4,5, and 5; four subsamples each from the cancer and normal tissues were used to choose the cutoff values for the three splits. The remaining samples were used to count the misclassified tissues as a result of new cutoff values. The error rate is between 6-8% from two runs of cross validation, which is much better than that obtained by existing analysis.

8 Correlation Analysis on Genes
Functional expressions from various genes are correlated. Examine the correlation patterns of the three selected genes in Fig. 1.

9 Correlation Between the Three Selected Genes and the Remaining Expression Data

10 Another Tree Based on a Different Set of Three Genes Fig. 6
Another Tree Based on a Different Set of Three Genes Fig. 6. Classification tree for tissue types using expression data from three genes (R87126, T62947, X15183)

11 Correlation Matrix Among Genes in Fig.1 and Fig. 6

12 Advantages of the Classification Tree
1. Efficient with large number of genes 2. Automatically selects valuable and user-friendly genes as predictors More precise than some other classification methods such as support vector machine and linear discriminant analysis

13 Conclusions: 1. It is likely that the information contained in a large number of genes can be captured by a small optimal set of genes without significant loss of information The precision of classification of recursive partitioning is important for clinical application.


Download ppt "Heping Zhang, Chang-Yung Yu, Burton Singer, Momian Xiong"

Similar presentations


Ads by Google