Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Annotation of Gene Function …and how thats useful to you.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Microarray Data Analysis Day 2
Asking translational research questions using ontology enrichment analysis Nigam Shah
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Pathways analysis Iowa State Workshop 11 June 2009.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Using Gene Ontology Models and Tests Mark Reimers, NCI.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
COG and GO tutorial.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Babelomics Functional interpretation of genome-scale experiments Barcelona, 28 November de 2007 Ignacio Medina David Montaner
CACAO - Remote training Gene Function and Gene Ontology Fall 2011
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Sequence-Structure-Function Sequence Structure Function Threading Ab initio BLAST Folding: impossible but for the smallest structures Function prediction.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
An update on ongoing projects within Biorange SP Biorange Project Meeting Leiden, September 15 Tim Hulsen.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Computational Approaches for Understanding Biological Significance of Microarray Data Liangjiang (LJ) Wang KSU Bioinformatics Center, Biology.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Lecture 4: Gene Annotation & Gene Ontology June 11, 2015.
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Using The Gene Ontology: Gene Product Annotation.
1 Identifying differentially expressed sets of genes in microarray experiments Lecture 23, Statistics 246, April 15, 2004.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Copyright OpenHelix. No use or reproduction without express written consent1.
Networks and Interactions Boo Virk v1.0.
Ontologies, data standards and controlled vocabularies.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Course on Functional Analysis
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Copyright OpenHelix. No use or reproduction without express written consent1.
UBio Training Courses Micro-RNA web tools Gonzalo
BIOINFORMATIK I UEBUNG 2 mRNA processing.
From genes to functional blocks in the study of biological systems Fátima Al-Shahrour, Joaquín Dopazo National Institute of Bioinformatics, Functional.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Statistical Testing with Genes Saurabh Sinha CS 466.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Gene Ontology TM (GO) Consortium
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Canadian Bioinformatics Workshops
1 A Discussion of False Discovery Rate and the Identification of Differentially Expressed Gene Categories in Microarray Studies Ames, Iowa August 8, 2007.
What’s new in GO?. Priorities Annotation outreach Reference genomes User advocacy Ontology development Software.
::: Schedule. Biological (Functional) Databases
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Presentation transcript:

Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit CNIO Bioinformatics Unit CNIO

::: Schedule. 1.Biological (Functional) Databases 2.Threshold-based and threshold free methods 3.Threshold-based example: FatiGO. 4.Threshold free example 1: FatisScan.

Many of these slides have been taken and adapted from original slides by Fatima Al-Shahrour from Joaquin Dopazo’s group (Babelomics team). We are grateful for the material and for the great tools they have developed!!!! ACKNOWLEDGEMENTS

Arabidopsis thaliana Homo sapiens Mus musculus Rattus norvegicus Drosophila melanogaster Caenorhabditis elegans Saccharmoyces cerevisae Gallus gallus Danio rerio HGNC symbol EMBL acc RefSeq PDB Protein Id IPI…. Genes IDs Gene Ontology Biological Process Molecular Function Cellular Component UniProt/Swiss-Prot UniProtKB/TrEMBL Ensembl IDs EntrezGene Affymetrix Agilent KEGG pathways Regulatory elements miRNA CisRed Transcription Factor Binding Sites Biocarta pathways InterPro Motifs Bioentities from literature: Diseases terms Chemical terms Gene Expression in tissues Keywords Swissprot Biological databases

Gene Ontology CONSORTIUM The objective of GO is to provide controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products. These terms are to be used as attributes of gene products by collaborating databases, facilitating uniform queries across them. The controlled vocabularies of terms are structured

GO structure The three categories of GO Molecular Function the tasks performed by individual gene products; examples are transcription factor and DNA helicase Biological Process broad biological goals, such as mitosis or purine metabolism, that are accomplished by ordered assemblies of molecular functions Cellular Component subcellular structures, locations, and macromolecular complexes; examples include nucleus, telomere, and origin recognition complex GO tree structure IS_A relation PART_OF relation

::: Schedule. 1.Biological (Functional) Databases 2.Threshold-based and threshold free methods 3.Threshold-based example: FatiGO. 4.Threshold free example 1: FatisScan.

The two-steps approach Genes of interest are selected using the experimental value. Selected genes are compared to the background. Threshold-based functional analysis Study the enrichment in functional terms in groups of genes defined by the experimental value. FatiGO GOminer DAVID Marmite Threshold-free functional analysis Select genes taking into account their functional properties. FatiScan GSEA MarmiteScan Under a systems biology perspective. Detect blocks of functionally related genes.

Class1 Class2 ttest cut-off FDR<0.05 Biological meaning? Threshold-based functional analysis

ES/NES statistic - + Class1 Class2 Gene Set 1 ttest cut-off Gene Set 2 Gene Set 3 Gene set 3 enriched in Class 2 Gene set 2 enriched in Class 1 Threshold-free functional analysis

::: Schedule. 1.Biological (Functional) Databases 2.Threshold-based and threshold free methods 3.Threshold-based example: FatiGO. 4.Threshold free example 1: FatisScan.

::: How the functional profiling should never be done It is not uncommon to find the following assertion in papers and talks: “then we examined our set of genes selected in this way (whatever) and we discover that 65% of them were related to metabolism, so we can conclude that our experiment activates metabolism genes”. Annotation is not a functional result!!!

::: Exercise 1: FatiGO SEARCH 1. Select “FatiGO Search” ” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “KEGG pathways” and click “Run”

::: Exercise 1: FatiGO SEARCH 1. Select “FatiGO Search” ” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “KEGG pathways” and click “Run” FatiGO-Search annotations

Testing the distribution of GO terms among two groups of genes (remember, we have to test hundreds of GOs) Biosynthesis 60%Biosynthesis 20% Sporulation 20% Group AGroup B Genes in group A have significantly to do with biosynthesis, but not with sporulation. Are this two groups of genes carrying out different biological roles? 84 No biosynthesis 26 Biosynthesis BA

Using FatiGO  List1: genes of interest (they are significantly over- or under- expressed when two classes of experiments are compared, co- located in the chromosomes, etc.)  List2:the background (typically the rest of genes).  Select suitable database, Run... List2 Remove genes repeated in list1 Remove genes repeated between both lists Remove genes repeated in list2 Extract functional terms Comparing groups of genes List1 “clean” List1 “clean” List2 BABELOMICS GO KEGG Interpro KW Bioentities Gene Expression TF Cisred Matrix of functional terms Fisher´s test Adjust p-value by FDR Significant functional terms

ttest cut-off FDR<0.05 List 1 List 2 (background) Class1 Class2 List 1b / List 2b

::: Exercise 2: FatiGO COMPARE 1. Select “FatiGO Compare” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “Rest of Genome” as background. 4. Select “KEGG pathways” and click “Run”

::: Exercise 2: FatiGO COMPARE 1. Select “FatiGO Compare” and “H. sapiens”. 2. Upload FatiGO_example.txt file 3. Select “Rest of Genome” as background. 4. Select “KEGG pathways” and click “Run” Only “Apoptosis” is significant