Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yeast Dataset Analysis Hongli Li 91.580 Final Project Computer Science Department UMASS Lowell.

Similar presentations


Presentation on theme: "Yeast Dataset Analysis Hongli Li 91.580 Final Project Computer Science Department UMASS Lowell."— Presentation transcript:

1 Yeast Dataset Analysis Hongli Li 91.580 Final Project Computer Science Department UMASS Lowell

2 Outline Gene Ontology Annotation Gene Ontology Annotation Data Preprocessing Data Preprocessing Cluster Cluster Results Results Conclusion Conclusion

3 GO Annotations Total Number of Gene: 799 Total Number of Gene: 799 327 Gene has GO at level 3 of Biological Process 327 Gene has GO at level 3 of Biological Process Genes with GO but not at level 3: 272 Genes with GO but not at level 3: 272 Genes without GO: 200 Genes without GO: 200

4 GO Annotation

5 GO Anotation Of 327 genes with GO at level 3 Of 327 genes with GO at level 3 170 Genes belong to GO:0008152, the metabolism 170 Genes belong to GO:0008152, the metabolism 90 Genes belong to the GO:0007049 the Cell Cycle 90 Genes belong to the GO:0007049 the Cell Cycle 81 Genes belong to GO:0016043, the cell organization and biogenesis 81 Genes belong to GO:0016043, the cell organization and biogenesis 51 Genes belong to GO:0006810, the transport 51 Genes belong to GO:0006810, the transport

6 Data Preprocessing Dataset: 799 Cell Cycle Regulated Genes Dataset: 799 Cell Cycle Regulated Genes Filter: Minimum Exiting value over 85% Filter: Minimum Exiting value over 85% Impute Missing Values Using KNN Impute Missing Values Using KNN Standardize Patterns (mean = 0 and standard deviation =1) Standardize Patterns (mean = 0 and standard deviation =1)

7 Cluster SOTA – Self-Organizing Tree Algorithm SOTA – Self-Organizing Tree Algorithm Euclidean Distance Euclidean Distance Variability Threshold: 80% Variability Threshold: 80%

8 Result Cluster 61

9 67 Genes from 799 fall in Cluster 61 67 Genes from 799 fall in Cluster 61 24 out of 67 genes has GO 24 out of 67 genes has GO 10 out of 24 genes belongs to metabolism 10 out of 24 genes belongs to metabolism 14 belongs to Cell Cycle 14 belongs to Cell Cycle 8 belongs to S phase of mitotic cell cycle 8 belongs to S phase of mitotic cell cycle 8 belongs to DNA replication 8 belongs to DNA replication 4 belongs to G1/S transition of mitotic cell cycle 4 belongs to G1/S transition of mitotic cell cycle Only one genes that belongs to metabolism not in cell cycles Only one genes that belongs to metabolism not in cell cycles

10 Cluster 60 33 Genes in this Cluster 33 Genes in this Cluster 11 of 33 has GO 11 of 33 has GO 4 of 11 genes are in M-phase specific microtubule process which belongs to Cell Cycle 4 of 11 genes are in M-phase specific microtubule process which belongs to Cell Cycle 7 in organelle organization and biogenesis which belongs to cell growth and/or maintenance 7 in organelle organization and biogenesis which belongs to cell growth and/or maintenance totally 8 in cell cycle totally 8 in cell cycle

11 Cluster 59 38 genes in this cluster 38 genes in this cluster 15 genes has anotation 15 genes has anotation 7 in metabolism 7 in metabolism 5 in cell cycle 5 in cell cycle M phase of mitotic cell cycle has 3 M phase of mitotic cell cycle has 3 Nuclear division has 3 Nuclear division has 3 No gene in these two classes are same No gene in these two classes are same

12 Conclusion & Future Work Cluster #61 has strong relations with cell cycle, next is cluster #60 and #59 Cluster #61 has strong relations with cell cycle, next is cluster #60 and #59 Sub-Cluster the cluster #59, #60, #61 Sub-Cluster the cluster #59, #60, #61 Analyze the gene expression data of those genes that are known belongs to GO cell cycle annotations Analyze the gene expression data of those genes that are known belongs to GO cell cycle annotations Analyze other clusters Analyze other clusters Do the same analyze to 6000 gene dataset Do the same analyze to 6000 gene dataset

13 Reference 1. http://gepas.bioinfo.cnio.es/index.html http://gepas.bioinfo.cnio.es/index.html 2. P. T. Spellman et al., Comprehensive identification of cell cycleregulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization Mol. Biol. Cell., vol. 9, pp. 3273--3297, 1998. 3. Raymond J Cho. A Genome-Wide Transcriptional Analysis of the Mitotic Cell Cycle. Mol. Biol. Cell., vol. 2, pp. 65--73, 1998. 4. Herrero, J., Valencia et al. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17(2), 126-136. 2001 5. Orly Alter. Singular value decomposition for genome-wide expression data processing and modeling. PNS, vol. 97, pp 10101-10106. 2000 6. http://www.cellsalive.com/cell_cycle.htm http://www.cellsalive.com/cell_cycle.htm 7. http://www.geneontology.org/ http://www.geneontology.org/ 8. http://fatigo.bioinfo.cnio.es/htdocs/helpFatiGO.html http://fatigo.bioinfo.cnio.es/htdocs/helpFatiGO.html


Download ppt "Yeast Dataset Analysis Hongli Li 91.580 Final Project Computer Science Department UMASS Lowell."

Similar presentations


Ads by Google