Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang 11-20-2002.

Similar presentations


Presentation on theme: "Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang 11-20-2002."— Presentation transcript:

1 Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang 11-20-2002

2 Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

3 Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

4 Microarrays: sample scheme A B C D Genes mRNA B C Transcription Differential Gene Expression RNA-Isolation and synthesis of cDNA with labeled Nucleotides (reverse Transcription) B C labeled cDNA Hybridisation AB DC Fluorescense indicates that gene B and gene C are transcribed

5 Microarrays: comparative analysis sample tissue I 1,2,... tissue II 1,2,... gene 1meanmean => t-value gene 2meanmean => t-value gene 3meanmean => t-value... ranking ?

6 How to interprete the data? Long list of siginficant genes Which genes are of interest? Solution: pooling of genes into functional classes  provides a general overview Gene Ontology database provides such a functional classification

7 The Gene Ontology database

8 GO is a database of terms for genes Known genes are annotated to the terms Terms are connected as a directed acyclic graph Levels represent specifity of the terms

9 The Gene Ontology database Apoptotic protease activator Gene OntologyApoptosis regulatorEnzyme activatorApoptosis activatorProtease activatorMolecular function

10 The Gene Ontology database Every child-term is a member of its parent-term GO contains three different sub- ontologies:  Molecular function  Biological process  Cellular component Unique identfier for every term:  GO:0003673(root=Gene Ontology)

11 Gene Ontology and microarrays Hypothesis: Functionally related, differentially expressed genes should accumulate in the corresponding GO-group. Problem: Find a method, which scores accumulation of differential gene expression in a node of the Gene Ontology.

12 Gene Ontology and microarrays tissue type 1 2 GO:2 GO:3 GO:4 samples genes GO:1 P-value for every gene by a two-sample t-test

13 Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

14 GO: Scoring methods Number of significant genes in a GO- group Sum of negative logarithms of all p- values sup|P (n) -F (n) | according to Kolmogorov- Smirnov p-value Σ 1, 2, 3,... -log P ?

15 The p-value cdf: cummulative distribution function t t p = cdf t>0 => p = 1-cdf => p(0, 0.5] m(0, 1] m=2*p

16 Sum of log-score Pavalidis, Lewis, Noble 2001; Zien, Küffner, Zimmer, Lengauer 2000 2*p -> 1 => -log(2*p) -> 0 Small p-values, high score

17 Kolmogorov-Smirnov-Score empirical theoretical Hypothesis: the calculated p-values (multiplied by 2) are equally distributed between 0 and 1. 0 x x x x x xx xx x x x x 1 0 n 1 0 xxxx xx x x x x 1 0 n 1 S=sup|P (n) -F (n) | P (n) : p-values for genes that fall into a GO-group. F (n) : equally distributed values between 0 and 1.

18 Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

19 Null hypothesises The significant genes (according to Bonferoni: α=0.05/n) are distributed over the GO-groups by chance The existing differential gene expression is distributed over the GO-groups by chance There is no differential gene expression in a GO-group

20 Checking H 0 by permutation samples genes Permutation of rows Mapping of p-values into GO-groups is randomized. H 0 : Distribution of differential gene expression Permutation of columns Level of p-values is randomized. H 0 : No differential gene expression in a GO-group

21 Checking H 0 by permutation 1000 random permutations => background distributions  H 0 : Distr. of significant genes  Randomizing GO-groups (rows)  H 0 : Distr. of all p-values  Randomizing GO-groups (rows)  H 0 : Level of p-values  Permutation of columns

22 Methods (summary) Data P-values Number of significant genes Sum of –log Psup|P (n) -F (n) | Check against 1000 permutations of rows (GO-groups) Check against 1000 permutations of columns (samples => level of p-values)

23 Overview Microarrays and the Gene Ontology (GO) database Scoring differential gene- expression in GO groups Checking scores against different null hypothesises Sample data (two types of Breast Cancer) and results

24 Results: Data (Breast Cancer) Two major subclasses  Estrogen receptor postive (ER+)  Estrogen receptor negative (ER-) Estrogen receptor postive  Succeptible to Tamoxifen  Slightly better survival rate Great molecular differences between the two types

25 Results: Data (Breast Cancer) Data: 25 ER+, 24 ER- Array: Affymetrix HuGeneFL  ~ 7000 Genes  ~ 4000 annotated to GO-terms Data were normalized by variance stabilization (Heydebreck et. al 2001)

26 Results: Pre-conditions GO-group considered to be significant if less than 5% of the random permutations exceeds the score Only GO-groups with more than 5 and less than 1000 genes were taken into account

27 Results: Number of significant genes According to the pre-conditions 16 GO-groups were found

28 Results: Permutation of rows (distribution hypothesis) Sum of –log PKolmogorov-Smirnov

29 Results: Permutation of columns (differential gene-expression hypothesis) Sum of –log PKolmogorov-Smirnov

30 Results The column-permutation leads to a very low background distribution  Many „significant“ GO-groups  May help to find functional groups without differential gene- expression Different scoring methods seem to be complementary as indicated by the results of the row-permutation

31 Results: Permutation of the rows Sum of log: 44 GO-groups were found (5% cond.,...) KS-score: 77 GO-groups were found (5% cond.,...) GO:0000087 M-Phase of mitotic cell-cycle (37 genes)

32 Results: Comparing the scoring- methods (from the row-permutation) A: 16 B: 77 C: 43 A and B: 3 A and C: 13 C and B: 13 A, B and C: 3 C without A: 30 B without A: 74 C B A A: counting of significant genes in GO-groups B: Kolomogorov-Smirnov C: sum of logarithms

33 Browsing the results

34 Results: Interesting GO-term (M-Phase) Contains a couple of interesting proliferative genes (p-value ~5*10 -4 => „not significant“) E.g.: polo-like kinase  t-value: -3.45; p-value: 5.59*10 -4  would not been found by a single- gene approach  correlation with ER-Receptor could be found in literature (Wolf et al, 2000)

35 Summary/ outlook GO provides a general view on large-scale gene- expression data Less deregulated but very interesting genes could be found Third null hypothesis => differential gene expression over a wide range of genes (outlook: which GO-groups contain no differential gene- expression) No bias of scores by top-level genes (outlook: leaving out top-level genes for scoring) Possible modification of scoring-methods: up- and downregulation


Download ppt "Gene Ontology as a tool for the systematic analysis of large-scale gene-expression data Stefan Bentink Joint groupmeeting Klipp/Spang 11-20-2002."

Similar presentations


Ads by Google