Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon.

Similar presentations


Presentation on theme: "Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon."— Presentation transcript:

1 Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon

2 Gene Set Enrichment Analysis GSEA is a computational method that determines whether defined set of genes shows statistically significant, differences between two phenotypes 3 Key Steps – Calculation of the Enrichment Score – Estimation of Significance Level of ES – Adjustment for multiple hypothesis testing

3 Broad Institute GSEA Tool We tried using the GSEA tool from the Broad Institute, where most of the original work for GSEA was done - http://www.broad.mit.edu/gsea/ Java web-start app that launches quickly and easily, lots of online documentation and tutorials. Unfortunately, we ran into some major issues getting our data to work with it.

4 Input to the GSEA Tool

5 Input to the GSEA Tool – Parameters Expression dataset – This is the expression data, in our case, sub-data extracted from clusters using T-MeV Gene sets database – databases of gene sets, downloadable through the tool, from Broad’s website – created by Broad and others Phenotype labels – an independent file of label data plus more, format specific to GSEA – created from original data Chip Platform – Chip data file that matches the data set from which the data was recorded.

6 What is a Phenotype? Simply put, a characteristic of an organism as a result of differing gene expression, plus possible environmental factors. In our data, the breast cancer classifications can be considered phenotypes. So the phenotype file is created from the breast cancer data using the class labels as phenotypes.

7 Folding@Home The most powerful computing cluster in the world One of the largest computing clusters as well Launched in 2000, It is managed by the Pande Group within Stanford's Chemistry Department Goal is “to understand protein folding, misfolding and related diseases” As of May 2009, 63 papers have been published utilizing Folding@Home

8 Folding@Home: Model Does not rely on a “super computer” for data processing Small client application installed on client hardware Leverages unused computing power on hardware As of April '09, from an estimated 400,000 machines, a peak speed of 4.5 Native PFLOPS More modern CPUs are now multi-core, so the Pande Group has explored Symmetrical Processing to leverage unused power

9 Folding@Home At a Glance

10 References 1)“Folding at Home”, http://folding.stanford.edu/http://folding.stanford.edu/ 2)Spanish Inquisition Image - http://roflrazzi.com/upcoming/?pid=12265http://roflrazzi.com/upcoming/?pid=12265 3)Subramanian, Aravind; Gene Set Enrichment Analysis: A Knowledged based approach for interpretting genome wide expression profiles; http://mootha.med.harvard.edu/PubPDFs/Subramanian2005.pdf http://mootha.med.harvard.edu/PubPDFs/Subramanian2005.pdf 4)GSEA, http://www.broad.mit.edu/gsea/


Download ppt "Final Project Week 3 - 5/7/09 GSEA and Cluster Computing in Protein Research Leon Kay, Yan Tran, Chris Thomas Yan Gary Chris Leon."

Similar presentations


Ads by Google