Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory.

Similar presentations


Presentation on theme: "Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory."— Presentation transcript:

1 Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory

2 Dagstuhl - 2007 What is my perspective? Biological data is voluminous and complex Data integration is hard work Bio-ontologies provide semantic structure and standards that aid in data analysis and hypothesis generation. There are many challenges to the effective use of bio- ontologies (in addition to challenges to the development of ontologies)

3 Dagstuhl - 2007 What is my approach? Goal is to facilitate ‘translational research’ through effective integration of experimental data from mouse models of human conditions with human clinical data from disease studies Bio-ontologies provide a mechanism to support comprehensive data integration and analysis

4 Dagstuhl - 2007 Interesting…. - Refine Relations Ontology (RO) - Identify critical datasets - Focus on bottlenecks - Create views

5 Dagstuhl - 2007 Phenotype mutant allele definitions QTL strain characteristics phenotype vocabularies disease models (human) comparative phenotypes Genes & Gene Products nomenclature gene characterization transcripts, proteins, gene products functional annotation orthologs & paralogs Sequences & Maps sequence representation C57BL/6J genomic sequence SNPs and strain variants adding biological context to computational gene models Gene Expression mouse anatomy time, tissue, level of expression range of assays & results emphasis on embryonic stages Tumor Biology tumor classifications & descriptions strain incidence histopathology images tumor genetics Overview of Mouse Genome Informatics

6 Dagstuhl - 2007 Data acquisition is constant Load Program Summary of Data Loaded Mouse EntrezGeneEntrezGene IDs for mouse markers. Plus marker-to-sequence associations from EntrezGene not already in MGD Human/Rat EntrezGeneNomenclature, map position and other data regarding human and rat genes. OMIM associations for human. GenBank SeqMouse sequence records from GenBank RefSeq SeqMouse sequence records from RefSeq UniProt/TrEMBL SeqMouse sequence records from UniProt and TrEMBL TIGR/DoTS/NIA Seq Mouse consensus sequence records from TIGR/DoTS/NIA clusters TIGR/DoTS/NIA Association Associations between TIGR/DoTS/NIA cluster sequences and markers. Ensembl Gene ModelEnsembl gene model sequences, coordinates, & associations between these & markers NCBI Gene ModelNCBI gene model sequences, coordinates, & associations between these & markers UniProt AssociationUniProt/TrEMBL IDs and additional GenBank IDs for mouse markers. Plus GO and InterPro annotations UniGene AssociationUniGene cluster IDs for mouse markers. EST cDNA CloneMouse IMAGE, NIA, MGC, Riken, cDNAs and EST sequence associations MGC AssociationMGC IDs and associations between MGC full length sequences and MGC cDNAs RPCI CloneRPCI 23/24 BAC clones and sequence associations GO VocabularyUpdated Gene Ontology (GO) vocabularies from the central GO site. OMIM VocabularyUpdated OMIM disease terms MP VocabularyUpdated MP vocabulary (from OBO-Edit) AnatomyUpdated adult mouse anatomy ontology (from OBO-Edit) Mapping panelJAX, EUCIB, Copeland-Jenkins and many others PIRSF Mouse PIR superfamily terms and associations to markers SNPsMouse SNPs from dbSNP and associations between SNPs & markers.

7 Dagstuhl - 2007 Snapshot of MGI data content MGI data statisticsMarch, 2007 Number of genes with sequence data28,292 Number of genes (incl. unmapped mutants)35,733 Number of markers (including genes)69,639 Number of markers mapped65,345 Number of genes with protein sequence information24,293 Number of genes with GO annotations17,664 Number of mouse/human orthologies16,127 Number of mouse/rat orthologies15,802 Number of genes with one or more phenotypic alleles6,979 Number of cataloged phenotypic alleles17,494 Number of references113,508 Number of integrated mouse nucleotide sequences (+ ESTs)8,3574,701

8 Dagstuhl - 2007 Build 36: Ensembl and NCBI 28807 24237 Unification (Exon Overlap Detection) 22182 6910 2646 Unique to Ensembl Unique to NCBI Equivalent 1:11:nn:1n:m 20663365874280

9 Dagstuhl - 2007 Who is the authority? Data typeWorking relationship Gene Symbol/NameMGI makes primary assignment; coordination with HGNC, RGNC Allele Symbol/NameMGI makes primary assignment Strain DesignationsMGI makes primary assignment Gene -to- nucleotide sequence associationCo-curation with NCBI Gene -to- protein sequence associationCo-curation with UniProt Gene Ontology (GO) annotationsMGI provides primary curation Gene homology data between mouse and other speciesMGI curates orthology relationships Mammalian Phenotype OntologyMGI develops vocabulary Genotype -to- phenotype dataMGI provides primary curation Mouse model -to- human disease (OMIM)MGI provides primary curation Mouse data for which MGI serves as the authoritative source.

10 Dagstuhl - 2007 Having the data, we want to ask complex questions

11 Dagstuhl - 2007 Multiple Controlled Vocabularies in MGI Gene Nomenclature Gene/Marker Type Allele Type Developmental and Adult Anatomies Assay Type  Expression  Mapping Molecular Mutation Inheritance Mode Gene Ontology Mammalian Phenotype Ontology Tissue Types Cell Types Cell Lines Units  Cytogenetic  Molecular ES Cell Line Strain Nomenclature

12 Dagstuhl - 2007 Vocabularies in MGI: GO Example DAGs Definition Synonyms GO:54321 Terms … Transcription factor DNA binding Protein binding Ligand binding or carrier Vocabulary Annotations … J:65378TAS J:62648IDA J:60000IEA Ahr Edr2 Genes Synonyms NameMGI:105043

13 Dagstuhl - 2007 Mammalian Phenotype Ontology Compositional terms ‘working’ ontology Projected xref to ‘core’ ontologies  Anatomy  GO Built with attention to ontological principles but with primary goal of supporting annotation of diverse experimental results from many research groups and perspectives

14 Dagstuhl - 2007

15 We are exploring ontological representations that relate human clinical data with mouse phenotypes Create compositional view for annotation of mouse models and human clinical data Provide xref / RO back to core ontologies Support both annotation and ontology alignment efforts Develop tools to support complex queries

16 Dagstuhl - 2007 We modeled gangliosidoses as a test case. Two types of gangliosidoses are Sandoff and Tay-Sachs diseases.

17 Dagstuhl - 2007 Curators use controlled terms from structured vocabularies (ontologies) to curate complex biological systems described in the literature The knowledge is in the details

18 Dagstuhl - 2007 The knowledge is in the details

19 Dagstuhl - 2007 Including the relationship to human disease

20 Dagstuhl - 2007 More mouse models – Tay Sachs

21 Dagstuhl - 2007 Dopamine CHEBI:18243 Chemical Ontology Cell Type Ontology Dopaminergic Neuron CL:0000700 Biological Process Synaptic transmission GO:0007268 Brain MA:0000168 Anatomical Dictionary Different core ontologies need to be combined to describe complex biological systems

22 Dagstuhl - 2007 Dilemma: No formal links currently exist between the separate ontologies Solution? 1. Generate cross-products (compositional terms) as necessary for annotations of characteristics of disease cases and disease models; 2. Annotate specific instances of human cases and mouse models; 3. Visualize and mine co-annotated data

23 Dagstuhl - 2007

24 Abnormal neuron morphology

25 Dagstuhl - 2007

26

27

28 Next Steps Perspective (views) Lung Cancer  Provide Disease Ontology  Build compositional view Mouse Data  Curate comprehensive annotations for genes implicated in lung phenotypes Human Data  Curate clinical data for ontology annotation Data Analysis  Use ontological structures to facilitate data exploration and hypothesis generation

29 Dagstuhl - 2007 Next conference? “enabling technologies for ontological access to clinical and animal model data” A hands-on problem solving workshop – a problem use case

30 Dagstuhl - 2007 Gene Ontology www.geneontology.org MGI projects are supported by NIH [NHGRI, NICH, and NCI]. Bar Harbor, Maine, USA Mouse Genome Informatics www.informatics.jax.org GO Consortium is supported by NIH-NHGRI and by the European Union RTD Programme


Download ppt "Alignment of Ontologies for Biological Research Judith A. Blake, Ph.D. Bioinformatics and Computational Biology The Jackson Laboratory."

Similar presentations


Ads by Google