Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.

Similar presentations


Presentation on theme: "Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit."— Presentation transcript:

1 Daniel Rico, PhD. drico@cnio.es Daniel Rico, PhD. drico@cnio.es ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit CNIO Bioinformatics Unit CNIO

2 ::: Schedule. 1.Biological (Functional) Databases 2.Threshold-based and threshold free methods 3.Threshold-based example: FatiGO. 4.Threshold free example 1: FatisScan.

3 Many of these slides have been taken and adapted from original slides by Fatima Al-Shahrour from Joaquin Dopazo’s group (Babelomics team). We are grateful for the material and for the great tools they have developed!!!! ACKNOWLEDGEMENTS

4 Arabidopsis thaliana Homo sapiens Mus musculus Rattus norvegicus Drosophila melanogaster Caenorhabditis elegans Saccharmoyces cerevisae Gallus gallus Danio rerio HGNC symbol EMBL acc RefSeq PDB Protein Id IPI…. Genes IDs Gene Ontology Biological Process Molecular Function Cellular Component UniProt/Swiss-Prot UniProtKB/TrEMBL Ensembl IDs EntrezGene Affymetrix Agilent KEGG pathways Regulatory elements miRNA CisRed Transcription Factor Binding Sites Biocarta pathways InterPro Motifs Bioentities from literature: Diseases terms Chemical terms Gene Expression in tissues Keywords Swissprot Biological databases

5 Gene Ontology CONSORTIUM http://www.geneontology.org The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products. These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them. The controlled vocabularies of terms are structured

6 GO structure The three categories of GO Molecular Function the tasks performed by individual gene products; examples are transcription factor and DNA helicase Biological Process broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex GO tree structure IS_A relation PART_OF relation

7 http://www.genome.ad.jp/kegg/pathway.html

8 http://www.biocarta.com/genes/index.asp

9 http://www.reactome.org/

10 http://www.pathwaycommons.org

11 http://www.whichgenes.org/

12 http://www.cisred.org/

13 ::: Schedule. 1.Biological (Functional) Databases 2.Threshold-based and threshold free methods 3.Threshold-based example: FatiGO. 4.Threshold free example 1: FatisScan.

14 The two-steps approach Genes of interest are selected using the experimental value. Selected genes are compared to the background. Threshold-based functional analysis Study the enrichment in functional terms in groups of genes defined by the experimental value. FatiGO GOminer DAVID Marmite Threshold-free functional analysis Select genes taking into account their functional properties. FatiScan GSEA MarmiteScan Under a systems biology perspective. Detect blocks of functionally related genes.

15 Class1 Class2 ttest cut-off FDR<0.05 Biological meaning? Threshold-based functional analysis

16 ES/NES statistic - + Class1 Class2 Gene Set 1 ttest cut-off Gene Set 2 Gene Set 3 Gene set 3 enriched in Class 2 Gene set 2 enriched in Class 1 Threshold-free functional analysis

17 ::: Schedule. 1.Biological (Functional) Databases 2.Threshold-based and threshold free methods 3.Threshold-based example: FatiGO. 4.Threshold free example 1: FatisScan.

18 http://babelomics.bioinfo.cipf.es/

19 ::: How the functional profiling should never be done It is not uncommon to find the following assertion in papers and talks: “then we examined our set of genes selected in this way (whatever) and we discover that 65% of them were related to metabolism, so we can conclude that our experiment activates metabolism genes”. Annotation is not a functional result!!!

20 ::: Exercise 1: FatiGO SEARCH 1. Select “FatiGO Search” ” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “KEGG pathways” and click “Run”

21 ::: Exercise 1: FatiGO SEARCH 1. Select “FatiGO Search” ” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “KEGG pathways” and click “Run” FatiGO-Search annotations

22 Testing the distribution of GO terms among two groups of genes (remember, we have to test hundreds of GOs) Biosynthesis 60%Biosynthesis 20% Sporulation 20% Group AGroup B Genes in group A have significantly to do with biosynthesis, but not with sporulation. Are this two groups of genes carrying out different biological roles? 84 No biosynthesis 26 Biosynthesis BA

23 Using FatiGO  List1: genes of interest (they are significantly over- or under- expressed when two classes of experiments are compared, co- located in the chromosomes, etc.)  List2:the background (typically the rest of genes).  Select suitable database, Run... List2 Remove genes repeated in list1 Remove genes repeated between both lists Remove genes repeated in list2 Extract functional terms Comparing groups of genes List1 “clean” List1 “clean” List2 BABELOMICS GO KEGG Interpro KW Bioentities Gene Expression TF Cisred 011000101010101 001...... 11001010........... 010001010........... 0110001010........... 1111001111............... Matrix of functional terms Fisher´s test Adjust p-value by FDR Significant functional terms

24 ttest cut-off FDR<0.05 List 1 List 2 (background) Class1 Class2 List 1b / List 2b

25 ::: Exercise 2: FatiGO COMPARE 1. Select “FatiGO Compare” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “Rest of Genome” as background. 4. Select “KEGG pathways” and click “Run”

26 ::: Exercise 2: FatiGO COMPARE 1. Select “FatiGO Compare” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “Rest of Genome” as background. 4. Select “KEGG pathways” and click “Run” Only “Apoptosis” is significant


Download ppt "Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit."

Similar presentations


Ads by Google