Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarray statistical validation and functional annotation.

Similar presentations


Presentation on theme: "Microarray statistical validation and functional annotation."— Presentation transcript:

1 Microarray statistical validation and functional annotation

2 MicroarraysMicroarrays DNA microarray technology is an high throughput method for gaining information on gene function. DNA microarray technology is an high throughput method for gaining information on gene function. Microarray technology is based on the availability of gene sequences arrayed on a solid surface and it allows parallel expression analysis of thousands of genes. Microarray technology is based on the availability of gene sequences arrayed on a solid surface and it allows parallel expression analysis of thousands of genes. DNA microarray technology is an high throughput method for gaining information on gene function. DNA microarray technology is an high throughput method for gaining information on gene function. Microarray technology is based on the availability of gene sequences arrayed on a solid surface and it allows parallel expression analysis of thousands of genes. Microarray technology is based on the availability of gene sequences arrayed on a solid surface and it allows parallel expression analysis of thousands of genes.

3 MicroarraysMicroarrays Microarray can be a valuable tool Microarray can be a valuable tool –to define transcriptional signatures bound to a pathological condition –to rule out molecular mechanisms tightly bound to transcription Since our actual knowledge on genes function in high eukaryotes is quite limited Since our actual knowledge on genes function in high eukaryotes is quite limited –Microarray analysis frequently does not imply a final answer to a biological problem but allows the discovery of new research paths which let to explore it by a different perspective

4 MicroarraysMicroarrays A gold standard methodology to identify, with high sensitivity and precision, biologically meaningful differentially expressed genes is not yet available. A gold standard methodology to identify, with high sensitivity and precision, biologically meaningful differentially expressed genes is not yet available. –Therefore, various approaches are under development to optimize the extraction of data linked to the biology of the problem under study.

5 MicroarraysMicroarrays The principal steps of a microarray analysis are: The principal steps of a microarray analysis are: –Gene intensity measurements and data normalization. –Statistical validation of differential expression. –Functional data mining.

6 MicroarraysMicroarrays Statistical validation usually implies the selection from the user of statistical significance parameters. Statistical validation usually implies the selection from the user of statistical significance parameters. For example: For example: –SAM (Significance Analysis of Microarrays) always requires the input of a delta value which defines the threshold of false positive in the validated dataset. If the stringency of the statistical validation is too high biologically meaningful genes can be lost making more difficult to role out functional correlations between the differentially expressed genes. If the stringency of the statistical validation is too high biologically meaningful genes can be lost making more difficult to role out functional correlations between the differentially expressed genes. If the stringency of the statistical validation is too loose the increase of false positives creates background noise from which is difficult to extract trustful functional correlations between the differentially expressed genes. If the stringency of the statistical validation is too loose the increase of false positives creates background noise from which is difficult to extract trustful functional correlations between the differentially expressed genes.

7 MicroarraysMicroarrays

8 MicroarraysMicroarrays

9 MicroarraysMicroarrays Statistical validation implies the selection from the user of statistical significance parameters. Statistical validation implies the selection from the user of statistical significance parameters. For example: For example: –SAM (Significance Analysis of Microarrays) requires the definition of a delta value which defines the threshold of false positive in the validated dataset. –When Fishers test is used the definition of a threshold value is even more hard.

10 MicroarraysMicroarrays

11 MicroarraysMicroarrays It is important to remark that: It is important to remark that: –A statistical validation not always implies the selection of the most biologically meaningful dataset Therefore we are trying to integrate biologically important parameters, as Gene ontology, in the statistical validation. Therefore we are trying to integrate biologically important parameters, as Gene ontology, in the statistical validation.

12 MicroarraysMicroarrays Gene Ontology (GO) is a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. Gene Ontology (GO) is a dynamic controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing. GO might help to link differentially expressed genes to specific functional classes. GO might help to link differentially expressed genes to specific functional classes.

13 MicroarraysMicroarrays Molecular Function: Molecular Function: the tasks performed by individual gene, products; examples are transcription factor and DNA helicase.

14 MicroarraysMicroarrays Biological Process: Biological Process: broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions

15 MicroarraysMicroarrays Cellular Component: Cellular Component: subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex

16 MicroarraysMicroarrays Recently has been shown that: Recently has been shown that: There is a strong instability of the size and overlap of the gene lists that result from varying gene selection methods. There is a strong instability of the size and overlap of the gene lists that result from varying gene selection methods. (Hosack et al, Genome Biology 2003, 4:P4)

17 MicroarraysMicroarrays The percentage of genes overlapping in any two lists was highly variable, and ranged from 7% to 60%. and ranged from 7% to 60%. The percentage of genes overlapping in any two lists was highly variable, and ranged from 7% to 60%. and ranged from 7% to 60%.

18 MicroarraysMicroarrays In spite of this striking variation: In spite of this striking variation: The top five biological biologically themes linked to the data sets are the same. The top five biological biologically themes linked to the data sets are the same. This evidence suggests that the conversion of genes to themes favour the "biological result" of the experiment to be determined despite substantial differences in gene list content resulting from the use of various normalization, gene intensity and statistical selection methods. This evidence suggests that the conversion of genes to themes favour the "biological result" of the experiment to be determined despite substantial differences in gene list content resulting from the use of various normalization, gene intensity and statistical selection methods. (Hosack et al, Genome Biology 2003, 4:P4)

19 MicroarraysMicroarrays

20 MicroarraysMicroarrays Integrating GO in statistical validation: Integrating GO in statistical validation: –The number of GO classes are counted in the data set under statistical validation. –SAM analyses are performed using various delta parameters. –The GO classes present in the statistically validated subsets are counted. –The presence of enrichment of GO classes in the SAM validated sets is evaluated using a binomial test corrected for Type I errors. A score for each GO class is generated performing the log 2 (p- value * % hits) A score for each GO class is generated performing the log 2 (p- value * % hits) –The SAM subset showing the best compromise between number of enriched GO classes and number of HITs for each class is selected for further studies

21 CONCORDANT MORPHOLOGIC AND GENE EXPRESSION DATA SHOW THAT A VACCINE FREEZES HER-2/neu PRENEOPLASTIC LESIONS Atypical hyperplasia and in situ carcinomas 10 wks Lobular carcinoma 22 wks Cured mammary gland 22 wks (Quaglino et al submitted)

22 MicroarraysMicroarrays log 2 (p-value * %HITs)

23 MicroarraysMicroarrays We observed that: We observed that: – simple statistical validation and statistical validation mediated by GO classes analysis have strong overlap. However, some interesting differentially expressed genes can be only detected using GO mediated statistical validation. However, some interesting differentially expressed genes can be only detected using GO mediated statistical validation.

24 a b c d -3.03.01:1 e Ig-linked immuno response common to simple statistical analysis and GO-mediated statistical validation Cell-linked immuno response specific of GO-mediated statistical validation

25 Subsets of SAM validated genes (SSVG) Subsets of SAM validated genes (SSVG) Consensus program Consensus program Alignment matrices (AMs) Patser program Patser program Starting dataset (SD) Starting dataset (SD) SAM program SAM program Any AM is over-represented in SSVG? Any AM is over-represented in SSVG? Selected SSVG Yes No Discard Run SAM with at least 3 different threshold? Run SAM with at least 3 different threshold? No min(AMs specific p-value) min(AMs specific p-value) Patser program Patser program Filtering by AMs specific P-value (a) (b) (c) (d) (e) (f) (g) (h) (i) (l) (m) (n) We also observed that the previously described approach can also be used to improve data mining related to the transcriptional signature present in co-regulated gene


Download ppt "Microarray statistical validation and functional annotation."

Similar presentations


Ads by Google