Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 www.informatics.jax.org Mouse Genome Informatics.

Similar presentations


Presentation on theme: "Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 www.informatics.jax.org Mouse Genome Informatics."— Presentation transcript:

1 Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 www.informatics.jax.org Mouse Genome Informatics

2 Human FOXN1 forkhead box N1 T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY Frank J, et al. Nature 398, 473 - 474 (1999) Mouse Foxn1. Homozygous “nude” mouse. One of 8 known phenotypic mutations in mouse for the forkhead box N1 gene. www.informatics.jax.org

3 Data Integration Primary literature Centers: mutagenesis, gene trap, etc Data Loads: GenBank, SNPs, clone collections, UniProt, RIKEN, etc Electronic Submissions (individual labs) Processing, QC, and curation Gather data from multiple sources Factor out common objects Assemble integrated objects

4 Integration is hard…not just a matter of combining data sources… Data from multiple sources can be of differing quality The same data can enter the system via various paths Naming conventions may or may not be to standards Some data sources don’t maintain unique accession numbers (or allow them to change) Periodic updates from data sources can cause problems if objects have disappeared… (or reappear) If objects have split in two

5 Data integration is hard “Bucketizing” establishe types of correspondence between objects in the input sets. Allows immediate incorporation of 1:1 corresponding data. Sorts conflicting data into bins that allow prioritization for curator resolution.

6 Data Acquisition Object Identity Standardizations Data Associations Integration with other bioinformatics resources Literature & Loads New Gene, Strain or Sequence? Controlled Vocabularies Evidence & Citation Co-curation of shared objects and concepts Annotation Pipeline

7 Making semantic sense Controlled vocabularies/nomenclatures Strains Genes Alleles (phenotypic or variant) Classes of genetic markers Types of mutations Types of assays Developmental stages Tissues Clone libraries ES cell lines and more… ….. organized as lists or simple hierarchies

8 Semantics plus relationship data Ontologies/structured vocabularies Gene Ontology (GO) Molecular function Biological process Cellular component Mouse Anatomy (MA) Embryonic Adult Mammalian Phenotype (MP) Sequence Ontology (SO) Trait Ontology ….. organized as directed acyclic graphs (DAGs) DAGs

9 Vocabularies in MGI DAGs Definition Synonyms MP:1956 Strain: AEJ Alleles:bd/bd Genotype Strain: C57BL/6 Alleles: Ppp1r3a tm1Adpt / Ppp1r3a tm1Adpt Terms … Respiratory failure Postnatal lethality Dilated renal tubules Growth retardation Vocabulary Note … J:65378 TAS J:62648 IDA J:65322 EE Annotations

10 Common software for users to access vocabularies in MGI

11 Mammalian Phenotype Ontology Structured as DAG >6,250 terms covering physiological systems, behavior, survival, and development Available in web browser and in OBO and text formats from MGI ftp and OBO sites Each term linked to all annotations to the term or its children >133,00 annotations genotype - MP Synonyms Term in context Links to all mouse genotypes with this phenotype

12 abnormal reflex opisthotonus tremors myoclonus abnormal muscle physiology muscle phenotype behavior/ neurological phenotype abnormal Involuntary movement

13 …make phenotype & disease model data robust & accessible to researchers & computational biologists semantically consistent search methods integrated access to all phenotypic variation sources (single-gene, genomic mutations, engineered mutations, QTL, strains) data on human disease correlation access to mouse models from various approaches - Genetic - Phenotypic - Computational Mammalian Phenotype (MP) Ontology

14 Developing the Mammalian Phenotype Ontology New terms from ongoing curation process Collaborative community efforts identify new terms suggest improved organization of terms Rat Genome Database Mutagenesis Centers Human (NCBI) OMIA (Online Mendelian Inheritance in Animals) Proprietary Databases Future (International Mouse Knockout Projects) Comparisons among Ontologies (GO Process, Mouse Anatomy, FMA, Cell Type, MPath, etc.) Systematic review by domain experts

15 Making Mammalian Phenotype Ontology Work DAGs accommodate bio-specific terms computationally useful human accessible practical for curation cross-reference to other ontologies

16 Terms in MP MP termEntityPATO Quality MP def microphthalmiaeyesmall sizereduced average size of the eyes hydrocephalycerebro- spinal fluid increased, excessive, accumulated excessive accumulation of cerebrospinal fluid in the brain, especially the cerebral ventricles, often leading to increased brain size and other brain trauma brainlarge size (dilated) trauma of brain observed

17 Complex Examples: id: MP:0006159 ! ocular albinism intersection_of: PATO:0001558 ! lacking processual parts intersection_of: inheres_in MA:0000261 ! eye intersection_of: towards GO:0006582 ! melanin metabolic process MP definition: absence of melanin (pigment) production in the eye with identifiable melanocytes present id: MP:0006110 ! ventricular fibrillation !intersection_of: PATO:0000688 ! asynchronous !intersection_of: inheres_in CL:0000746 ! cardiac muscle cell !intersection_of: towards GO:0060048 ! cardiac muscle contraction !intersection_of: located_in MA:0000079 ! ventricle endocardium !intersection_of: located_in MA:0000082 ! ventricle myocardium MP definition: asynchronous contraction or quivering of individual cardiac muscle fibers in the ventricles

18 Status of Phenotype & Disease Data Nov 2008 Phenotype terms in MP ontology 6,355 Phenotypic alleles cataloged number of genes represented targeted alleles number of genes targeted 21,996 8,225 13,549 5,547 Alleles with MP annotation Genotypes with MP annotation Total MP annotations 19,458 27,261 137,577 Genotypes with OMIM associations OMIM with associated genotypes 2,520 882 QTLs 4,015 Strains >10,500

19 Current QTL Display

20 Current QTL display + +

21

22 Genome coordinates: 132851306-135646474 (MGI Mouse GBrowse) Changes planned for QTL Display

23 Need for a trait ontology What is measured –Blood pressure –% body fat –Coat color Annotation of –QTL –Strain characteristics / baseline –Measurements Some issues specificity vs broad synchronizing wih MP “how much” cross-species?

24 OBO-Edit, curation tool for building ontologies

25 Working on Trait Ontology MGI IMPC MPD RGD Domestic Species (Animal QTL) Currently: approx. 3600 terms, built initially by stripping MP working systematically on branches

26 MGI Phenotype Data Staff Anna Anagnostopoulos Randal P. Babiuk Susan M. Bello Donna L. Burkart Howard Dene Michelle Knowlton Ira Lu Hiroaki Onda Cynthia L. Smith Monika Tomczuk Linda L. Washburn Jonathan S. Beal Kim L. Forthofer Peter Frost

27 The End NHGRI grant HG000330


Download ppt "Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 www.informatics.jax.org Mouse Genome Informatics."

Similar presentations


Ads by Google