What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.

Slides:



Advertisements
Similar presentations
Annotation of Gene Function …and how thats useful to you.
Advertisements

24th Feb 2006 Jane Lomax Gene Ontology tutorial Talk:Using the Gene Ontology (GO) for Expression Analysis Practical:Onto-Express analysis tool Talk: GO.
Www. GeneOntology.org Gene Ontology Collaboration.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Gene ontology & hypergeometric test Simon Rasmussen CBS - DTU.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
COG and GO tutorial.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Demonstration Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Internet tools for genomic analysis: part 2
Computational Approaches for Understanding Biological Significance of Microarray Data Liangjiang (LJ) Wang KSU Bioinformatics Center, Biology.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Automatic methods for functional annotation of sequences Petri Törönen.
Gene Ontology (GO) Project
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology Project
What is an Ontology? An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and implementations.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
PaLS: Pathways and Literature Strainer Filtering common literature, ontology terms and pathway information. Andrés Cañada Pallarés Instituto Nacional de.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
The Gene Ontology and its insertion into UMLS Jane Lomax.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
MAPPING OF SEQUENCES TO GENE ONTOLOGY. GO consortium.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Gene Ontology TM (GO) Consortium
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
GENETICS.
Gene Annotation & Gene Ontology
a Cytoscape plugin to assess enrichment of
Networks and Interactions
Annotating with GO: an overview
GO : the Gene Ontology & Functional enrichment analysis
Themes of Biology Chapter 1
The Pathway Tools Schema
Mental Functioning and the Gene Ontology
Statistical Testing with Genes
Biomedical Technology I
Department of Genetics • Stanford University School of Medicine
Lecture 6 By Ms. Shumaila Azam
Functional Annotation of the Horse Genome
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
GENETICS.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Advanced PGDB Editing: Regulation GO Terms
Overview Gene Ontology Introduction Biological network data
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
INTRODUCTION TO MOLECULAR GENETICS
INTRODUCTION TO MOLECULAR GENETICS
Statistical Testing with Genes
Presentation transcript:

What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common knowledge) Terms represent a controlled vocabulary, and define the concepts of a domain. Terms are linked by relationships, which constitute a semantic network. Ontologies augment natural language annotations and can be more easily processed computationally. (becomes the language of the domain it describes for communication, coordination and collaboraton)

Why We Need Ontology in Bioinformatics Biologists need knowledge in order to perform their work. Sequence comparison to infer the function. Biologists need knowledge for communication, but such knowledge may be represented in different ways. Different use of gene: The coding region of DNA DNA fragment that can be transcripted and translated into a protein DNA region of biological interest with a name and that carries a genetic trait or phenotype

The Gene Ontology (GO) Provides structured vocabularies for describing gene products in the domain of molecular biology. Enables a common understanding of model organisms and between databases Consisted of three structurally unlinked hierarchies (molecular function, biological process and cellular component). 2 types of relationships between terms: is-a: subclass. part-of: physical part of, or subprocess of.

Why Gene Ontology? Without structured vocabularies, different sources can refer to the same concept using different terms (e.g., cdc54 in yeast is MCM4 in mouse). What is a well-known shorthand in one research community is gibberish in another. Contributions by one research community may not be recognized by others. Without coordination, research work may be duplicated. The goal of the Gene Ontology Consortium is to produce a controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

Three GO Hierarchies Molecular function: elemental activity/task (what) (e.g., DNA-binding, polymerase, transcription factor) (what a gene does at the biochemical level) Biological process: goal or objective (why) (e.g., mitosis, DNA replication, cell cycle control) (A broad biological perspective – not currently a pathway) Cellular component: location within cellular structures and macromolecular complex (where) (e.g., nucleus, ribosome, pre-replication complex) (Each GO hierarchy has a DAG structure. A child term may have many parent terms) (Gene Ontology information can be accessed at http://www.geneontology.org/)

Example: Gene Ontology Hierarchy Biological process (GO:0008150) i i i i … Development (GO:0007275) Cellular process (GO:0009987) Physiological (GO:0007582) Behavior (GO:0007610) … … i i i i i … … … … … … … … Communication (GO:0007154) Cell death (GO:0008219) Cell growth (GO:0008151) P i … … … … … … … Cell aging (GO:0007569) Programmed (GO:0012501) P i … … … … Induction (GO:0012502) Apoptosis (GO:0006915) i is a i i … … … HS response (GO:0009626) Autophagic cell death (GO:0048102) P part of

is-a part-of i P

Gene Annotation Using GO Terms Association of GO terms with gene products based on evidence from literature reference or computational analysis. The creation of GO and the association of GO terms with gene products (gene annotation) are two independent operations. A gene can be associated with one or more GO terms (gene categories), and one category normally has many genes (many-to-many relationship between genes and GO terms)

Gene Product Associations to an Ontology yeast ID Term Definition Ontology Synonyms fly Is-a| Part-of Node1 ID Node2 ID GO ID DB ID Evidence code Reference Citation NOT mouse

Example: Part of Molecular Function

Example: Part of Biological Process

Example: Part of Cellular Component

Genes of a Biological Process Tend to Be Co-Regulated Gene Names

Use Gene Ontology (GO) to Annotate Genes GO URL: http://www.geneontology.org/ Two concepts: Gene Ontology: Provides structured vocabularies for describing gene products in the domain of molecular biology (all species share the same gene ontology) Annotations: Association of GO terms with gene products based on evidence from literature reference or computational analysis (each species has a separate annotation file)

The Gene Ontology (GO) GO file: http://www.geneontology.org/ontology/gene_ontology.obo An example of GO term [Term] id: GO:0000001 (A unique id for the GO term) name: mitochondrion inheritance (The name of the GO term) namespace: biological_process (see next slide) def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [PMID:10873824, PMID:11389764, SGD:mcc] (A detailed description of the GO term) is_a: GO:0048308 ! organelle inheritance is_a: GO:0048311 ! mitochondrion distribution

Gene Annotation Using GO Terms http://www.geneontology.org/GO.current.annotations.shtml Select the annotation file for a particular species An example of an annotation entry for yeast SGD S000004660 AAC1 GO:0005743 SGD_REF:S000050955|PMID:2167309 TAS C ADP/ATP translocator YMR056C gene taxon:4932 “AAC1” is the gene name “GO:0005743” is the GO id, we can link it to the corresponding item in the ontology file “SGD_REF:S000050955|PMID:2167309” is where this annotation comes from “C” means this annotation belongs to the “cellular component” namespace “ADP/ATP translocator” is a brief description of this annotation “YMR056C” is another name for this gene “taxon:4932” means this is a yeast gene

Gene Annotation Using GO Terms Given a list of genes L from a specific species Sj 1) go to http://www.geneontology.org/GO.current.annotations.shtml 2) select and download the annotation file Fj for Sj For each gene Gi in list L 3) find the annotation entry Ek for Gi in Fj 4) find the GO term id from entry Ek 5) go to http://www.geneontology.org/ontology/gene_ontology.obo 6) find the GO term in the ontology file, the GO term provides more detailed annotation for this gene

Use of GO to Annotation Genes Problem: Given a list of n genes, whether they are significantly associated with a specific GO term ? Solution: Calculate the p-Value. Notations Total number of genes in the data set : N Total number of genes assigned to term T: M Number of genes in the list: n Number of genes in the list and assigned to term T: m

How to Assess Overrepresentation of a GO Term? Genes on an array: Total number of genes (N): 2,285 Number of genes – cell cycle (M): 161 Genes in a cluster: Number of genes in the cluster (n): 147 Number of genes – cell cycle (m): 25 Is the GO term (i.e., cell cycle) significantly overrepresented in the cluster?

Hyper-geometric Distribution Given the total number of genes in the data set associated with term T is M, if randomly draw n genes from the data set N, what is the probability that m of the selected n genes will be associated with T?

P-Value Based on Hyper-geometric distribution, the probability of having m genes or fewer associated to T in N can be calculated by summing the probabilities of a random list of N genes having 1, 2, …, m genes associated to T. So the p-value of over-representation is as follows:

MAPPFinder A tool for mapping gene expression data to the GO hierarchies. Part of the free software package GenMAPP. Available at http://www.genmapp.org/. (Doniger et al., 2003)

MAPPFinder Sample Output (Doniger et al., 2003)

GoMiner A client-server application using Java (data on the server side). Available at http://discover.nci.nih.gov/gominer/. (Zeeberg et al., 2003)

Onto-Express A web application for GO-based microarray data analysis (http://vortex.cs.wayne.edu/Projects.html). The input to Onto-Express is a list of Affymetrix probe IDs, GenBank sequence accessions or UniGene cluster IDs. Part of the integrated Onto-Tools, including: Onto-Compare: compare commercial arrays. Onto-Design: help array design (probe selection). Onto-Translate: provide mapping of different IDs. p GO # genes (Genes linked to poor breast cancer outcome)