Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Discovery from Microarray Images 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital Project#: 93-EC-17-A-19-S1-0016.

Similar presentations


Presentation on theme: "Gene Discovery from Microarray Images 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital Project#: 93-EC-17-A-19-S1-0016."— Presentation transcript:

1 Gene Discovery from Microarray Images 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital cchen@cs.nthu.edu.tw cykao@csie.ntu.edu.tw Project#: 93-EC-17-A-19-S1-0016

2 Motivation and Data Acquisition Parts of our current works attempt to investigate and discover “a subset of genes” related to some specific diseases such as Hepatoma and Gastric Cancers by microarray experiments. Hence, we collect data from cDNA microarray images which are “spot signal intensities” via a sequence of biological experiments

3 A Paradigm for Microarray Image Data Analysis

4 Outline Microarray Image Data Acquisition Gridding for Image Segmentation Normalization from MA-Plot Finding Differentially Expressed Genes Finding Discriminative Genes Performance Evaluation by Dendrogram and K-means Algorithms

5 A Look at a Microarray Slide

6

7 Examples of Microarray Images

8

9 Gridding for Spot Segmentation

10 Gridding for a Block of 30*9 Spots

11 Spot Feature Computation Cy3 (for Column 1) 639 54879 5980 1984 324 910 2153 236 Cy5 (for Column 6) 104 52858 567 189 36 1489 5083 407

12 M-A plot and Piecewise Normalization

13 Normalized Ratio from MA-Plot

14 Pre-Processing / Normalization Due to the process of measurements or some unavoidable factors, “Raw Data” directly collected from experiments may contain noise and may have different scales, or have missing items. Thus, a pre-processing step for filtering out some inappropriate data, or normalization may be done.

15 Spot Features for Gene Discovery Cy3 Cy5 201 67 520 153 28276 21747 4072 6324 14807 690 1058 1451 572 524 M=(log 2 Cy3 − log 2 Cy5) A= (log 2 Cy3+log 2 Cy5)/2 Program compustt.c computes spot features and pieceline.c does normalization and maplot.c does M-A plot

16 Microarray Pattern Analysis Microarrays consisting of 13574 effected genes from 18564 in a chip with tumor dyed in Cy3 and normal dyed in Cy5 12 HCV, 27 HBV, 1 HCV+HBV, 4 neither HCV nor HBV patients Criterion for Differentially Expressed is defined as log 2 (Lowess normalized ratio of Cy3/Cy5) is greater than T (↑) or less than -T (↓)

17

18

19

20 Feature Selection/Extraction (1) Given a set of N patterns from K categories (K=2, a problem of dichotomy) with N i, 1≤ i ≤ K, patterns belonging to category i, each pattern consists of M redundant features, e.g., a microarray can be represented as a pattern consisting of 13574 features corresponding to 13574 effected genes. The goal is to select a small subset of features for “Recognition”

21 Feature Selection/Extraction (2) Given a set of N patterns from K categories (K=2, a problem of dichotomy) with N i, 1≤ i ≤ K, patterns belonging to category i. The goal of extraction is to transform an M-dimensional pattern into an m-dimensional pattern with m<<M for classification. A selected feature preserves the original meaning but an extraction usually does not preserve the original one.

22

23 16 Most Discriminative Genes to distinguish HCV from HBV [YCT39] Index Accession# 13796 U35376 7197 BG259957 2918 BI520001 8495 AJ012159 11189 AB008549 11087 BC006496 9443 CAC51145 9546 X52125 Index Accession# 16144 AK024601 16496 Y00083 17213 BC007437 14579 BC011568 587 AF386492 113 Y16961 17215 AF195766 16760 AI022747

24 Next 16 Most Discriminative Genes to distinguish HCV from HBV Index Accession# 5947 BG207354 4885 AK021818 11291 AF155110 1262 BI861005 8055 AJ224741 10965 AAF36120 4164 NM_000423 8088 BC000187 Index Accession# 7353 AF070641 5434 AB050785 12727 AB062987 14993 AA974308 4182 AI970531 5341 X65882 10052 AB011542 8140 AK026068

25 32 Discriminative Genes by Fisher’s Ratios for a Dendrogram

26 32 Discriminative Genes by Chuang+Kao’s for a Dendrogram

27 Dendrogram from Chen’s 32 Most Discriminative Genes [CC39]

28 Dendrogram from Genasia’s 32 Most Discriminative Genes

29 K-means Clustering Results by using 32 Best Discriminative Genes G45 from Genasia: distortion 341.26 1222221222 2211111111 111111111111111111 X47 from C. Chen: distortion 302.33 1222221222 2211111111 112111111111111111 Y48 by Fisher’s Ratio on YCT39: distortion 307.49 1222221222 2211111111 112111111111111111 PY50 by Chuang+Kao’s on YCT39: distortion 290.06 2222222222 2211211111 112111111111111111 Leave-one-out errors by 1-nn : 4, 3, 2, 1 (/39) Leave-one-out errors by Fisher : 15, 7, 8, 9 (/39)

30 Up (Down) Regulated Genes for Gastric Cancers 5 Advanced and 5 Early Stage of Patients with Gastric Cancer We find the following genes which can completely discriminate Patients of “Advanced Stage” from “Early Stage” under clinical diagnosis

31 Dengrogram for Gastric Patients

32 Top 16 Discriminative Genes for Advanced and Early Stages Index Accession# 15843 AF316855 12994 BF868865 18370 BC002996 2070 AK021788 1118 BC000249 9661 AP000350 2017 U53530 1128 AF035281 Index Accession# 8728AL591713 494 AB014526 10990L77570 342 BC007848 10425BG745129 6052AF073362 170 AK000278 1016BF526386

33 Thank You http://www.bioinfo.ntu.edu.tw http://www.cs.nthu.edu.tw/~cchen Tel: (02) 2312 3456 ~ 5917 Tel: (02) 2362 5336 ~ 418 Tel: (03) 573 1078


Download ppt "Gene Discovery from Microarray Images 陳朝欽、 高成炎、張春梵 ARCNTU, NTU-Hospital Project#: 93-EC-17-A-19-S1-0016."

Similar presentations


Ads by Google