Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.

Similar presentations


Presentation on theme: "Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering."— Presentation transcript:

1 Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 tsengsm@mail.ncku.edu.tw Dept. Computer Science and Information Engineering National Cheng Kung University Taiwan, R.O.C. August 13, 2001

2 2 Outline Microarray Techniques Goal of Microarray Data Mining Clustering Methods Efficient Microarray Data Mining Conclusions

3 3 Current Status Human genome project is at finishing stage, revealing that there are about 30,000 functional genes in a human cell For more than 90% of the genes, we know little about their real functions

4 4 Microarray Techniques Main Advantage of Microarray Techniques allow simultaneous studies of the expression of thousands of genes in a single experiment Microarray Process Arrayer Experiments: Hybridization Image Capturing of Results Analysis

5 5 Goal of Microarray Mining gene test 1 2 3 4.. 1000 A 0.6 0.2 0 0.7.. 0.3 B 0.4 0.9 0 0.5.. 0.8 C 0.2 0.8 0.3 0.2.. 0.7 … … …. … Multi-Conditions Expression Analysis

6 6 Goal of Microarray Mining gene test 1 2 3 4.. 1000 A 0.6 0.2 0 0.7.. 0.3 B 0.4 0.9 0 0.5.. 0.8 C 0.2 0.8 0.3 0.2.. 0.7 … … …. … Multi-Conditions Expression Analysis

7 7 Sample Clustering Results

8 8 Clustering Methods Types of Clustering Methods Partitioning : K-Means, K-Medoids, PAM, CLARA … Hierarchical : HAC 、 BIRCH 、 CURE 、 ROCK Density-based : CAST, DBSCAN 、 OPTICS 、 CLIQUE… Grid-based : STING 、 CLIQUE 、 WaveCluster… Model-based : COBWEB 、 SOM 、 CLASSIT 、 AutoClass…

9 9 Clustering Methods (cont.) Partitioning Hierarchical

10 10 Clustering Methods (cont.) Density-basedGrid-based

11 11 CAST Clustering Input S : a symmetic n × n Similarity Matrix , S(i, j) ∈ [0, 1] t : Affinity Threshold (0 < t < 1) Method 1. Choose a seed for generating a new cluster 2. ADD: add qualified items to the cluster 3. REMOVE: remove unqualified items from the stable cluster 4. Repeat Steps 1-3 till no more clusters can be generated

12 12 Similarity Measurements : Correlation Coefficients The most popular correlation coefficient is Pearson correlation coefficient (1892) correlation between X={X 1, X 2, …, X n } and Y={Y 1, Y 2, …, Y n } : where

13 13 Similarity Measurements : Correlation Coefficients (cont.) It captures the similarity of the ‘‘shapes’’ of two expression profiles, and ignores differences between their magnitudes.

14 14 Problems in Microarray Mining How to cluster microarray data with the following requirements met simultaneously ? Efficiency Accuracy Automation

15 15 Problems in Microarray Mining (cont.) How to cluster microarray data with the following requirements met simultaneously ? Efficiency Accuracy Automation Good Clustering Methods + Validation Techniques

16 16 Efficient Microarray Mining Improved CAST algorithm for clustering Hubert’s Γ statistic for validation Iterative sampled computation for automatic clustering

17 17 Reduce the Computation 1. Narrow down the threshold range 2. Split and Conquer: find “nearly-best” result m = 4 threshold 0100% LM RM LM: Left Margin RM: Right Margin

18 18 Experimental Results Dataset Source : Lawrence Berkeley National Lab (LBNL) Michael Eisen's Lab (http://rana.lbl.gov/EisenData.htm ) Microarray expression data of yeast saccharomyces cerevisiae, containing 6221 genes with 80 conditions Similarity matrix was obtained in advance

19 19 Experimental Results (cont.) Without Range Narrow down Executions : 19 Execution Time : 246 sec Γ statistic : 0.5138 With Range Narrow down Executions : 13 Execution Time : 27 sec Γ statistic : 0.5137

20 20 Experimental Results (cont.) Comparison Method Execution Time (Sec) Cluster Number Best Γ Statistic Our Method27570.51 K-means (k= 3 ~ 21) 40450.45 K-means (k= 3 ~ 39) 109250.45

21 21 Conclusions Microarray data analysis is an emerging field needing support of data mining techniques Accuracy Efficiency Automation


Download ppt "Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering."

Similar presentations


Ads by Google