Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.

Similar presentations


Presentation on theme: "Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need."— Presentation transcript:

1 Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need to figure out what it all means Since we don't know much about function of most of the genes this is not easy Complicated further by the fact that the gene function is context-specific. Depends on the tissue, developmental stage of the organism and multiple other factors "Functional clustering" grouping genes with respect to their known function (ontology) Establishing statistical significance between groups of genes identified in the analysis and "Functional clusters"

2 Analyzing Microarray Data Experimental Design Universal Control Not Treated C 1 Treated Not Treated C 3 Treated Not Treated C 2 Treated Not Treated C 4 Treated Data Normalization – reducing technical variability Statistical Analysis (ANOVA): Identifying differentially expressed genes Factoring out variability sources Data Mining

3 Data Integration and Interpretation

4 Modeling Microarray Data Mathematical./ Statistical Models Computer Algorithms/ Software

5 Regulating Transcription -transcription factor itself does not need to be transcriptionally regulated

6 Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant) Suppose that x out of n significant genes and y out of N total genes were classified into a specific "Functional group" Q1: Is this "Functional group" significantly correlated with our group of significant genes? Q2: Are significant genes overrepresented in this functional group when compared to their overall frequency among all analyzed genes? Q3: What is the chance of getting x or more significant genes if we randomly draw y out of N genes "out of a hat" with assumption that each gene remaining in the hat has an equal chance of being drawn? ( H 0 : p(significant gene belonging to this category) = y/N Q3A: What is the p-value for rejecting this null hypothesis First step of making a story: Statistical significance of a particular "Functional cluster"

7 Statistical significance of a particular "Functional cluster" - cont g n+1 g1g1 gngn gNgN... g1g1 gxgx g x+1 gygy g n+y-x+1 g y+1 g n+y-x gNgN... Observed Removing Functional Classification Q: By randomly drawing y boxes to color their border blue, what is the chance to draw x or more red ones Outcome (o 1,...,o T ): A set of y genes with selected from the list of N genes Event of interest (E): Set of all outcomes for which the number of red boxes among the y boxes drawn is equal to x Since drawing is random all outcomes are equally probable

8 Statistical significance of a particular "Functional cluster" - cont Outcome (o 1,...,o T ): A set of y genes with selected from the list of N genes Event of interest (E): Set of all outcomes for which the number of red boxes among the y boxes drawn is equal to x All we have to do is calculating M and N where: T=number of different sets we can draw a set of y genes out of total of N genes M=number of different ways to obtain x red boxes (significant genes) when drawing y boxes (genes) out of total of N boxes (genes), x of which are red (significant) Comes from the fact that order in which we pick genes does not matter First pick x red boxes. For each such set of x red boxes pick a set of y-x non-red boxes

9 Statistical significance of a particular "Functional cluster" - p-value Fisher's exact test or the "hypergeometric" test P-value: Probability of observing x or more significant genes under the null hypothesis

10 381 genes that were differentially expressed after the treating a cell line with three different carcinogens: Dex and E2 and Irradiation Dex_Day1 Dex_Day2 Dex_Day3 E2_Day4 E2_Day7 E2_Day10 Irr_Day1 Irr_Day2 Irr_Day3

11 Up

12 Finding important functional groups for up-regulated genes Using the "Ease" annotation tool http://david.niaid.nih.gov/david/http://david.niaid.nih.gov/david/ We obtained following significant gene ontologies Up_DexANDNE2ANDirr_381_GO.htm Homework: 1) Download and install Ease 2) Select top 20 most-signficianly up-regulated genes in our W-C dataset and identify significantly over-represented categories (using the three-way ANOVA analysis) 3) Repeat the analysis with 30, 40, 50 and 100 up-regulated and down- regulated gene 4) Prepare questions for the next class regarding problems you run into


Download ppt "Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need."

Similar presentations


Ads by Google