PATO An Ontology of Phenotypic Qualities

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

Mouse Phenotype Ontology George Gkoutos. Phenotype Annotation Traditional phenotypic descriptions are captures as free text Information retrieval based.
Homology.
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
Linking Animal Models to Human Diseases Supported by NIH P41 HG and U54 HG the University of Oregon, Eugene, OR.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Homology Review Human arm Lobed-fin fish fin Bat wing Bird wing Insect wing Homologous forelimbs not homologous as forelimbs or wings Definition: Structures.
Confessions/Disclaimers Ontologies and REDfly CARO SO OBO Foundry.
The problem How to integrate the massive amounts of data on Drosophila neurobiology to explore anatomy, formulate hypotheses and find reagents?
Application of OBO Foundry Principles in GO Chris Mungall Lawrence Berkeley Labs NCBO GO Consortium.
Patterns of inheritance
Enabling Systems Genetics to Translational Medicine: The PATO approach George Gkoutos Department of Genetics University of Cambridge.
Linking Animal Models to Human Diseases Supported by NIH P41 HG and U54 HG the University of Oregon, Eugene, OR
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Automated tools to help construction of Trait Ontologies Chris Mungall Monarch Initiative Gene.
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Iowa State University Animal Science Department Bioinformatics & Computational Biology Program - 01/16/06 1 Overview of Animal Trait Ontology and PATO.
Multiscale Information Modelling for Heart Morphogenesis Tariq Abdulla 13 th IMEKO TC1-TC7 Joint Symposium 02/09/2010.
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
PATO & Phenotypes: From model organisms to clinical medicine Suzanna Lewis September 4th, 2008 Signs, Symptoms and Findings Workshop First Steps Toward.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Comparative Genomics of the Eukaryotes
Core 2: Bioinformatics CBio-Berkeley. Outline Berkeley group background Core 2 first round –what: aims, milestones –how: software lifecycle, interaction.
2. The inability of the dominant allele to make up for the presence of the recessive allele is associated with _____. A. codominance.
PATO An ontology for phenotypes. The development of PATO is the work of George Gkoutos, supported by the NCBO, working in Cambridge.
Relating Animal Model Phenotypes to Human Disease Genes Project Goals: To develop methods and syntax for describing phenotypes using ontologies To compare.
The Plant Ontology: Linking Phenotypes and Genomics Across Plant Taxa Laurel D. Cooper* 1, Ramona L. Walls 2, Justin Elser 1, Justin Preece 1, Dennis W.
GO and OBO: an introduction. Jane Lomax EMBL-EBI What is the Gene Ontology? What is OBO? OBO-Edit demo & practical What is the Gene Ontology? What is.
The National Center for Biomedical Ontology Stanford – Berkeley Mayo – Victoria – Buffalo UCSF – Oregon – Cambridge.
An (OBO) ontology is NOT a model of language, it is a model of reality. Words are ambiguous – especially in isolation. Take the word 'wing' what type of.
Chapters 19 - Genetic Analysis of Development: Development Development refers to interaction of then genome with the cytoplasm and external environment.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
BIO624: Developmental Genetics GASTRULATION PART I Suk-Won Jin, Ph.D.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Ontology of Disease and the OBO Foundry Chris Mungall NCBO GO Nov 2006.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
The “über-ontology” (Uberon) Melissa Häendel, Chris Müngall, George Gkoütos Cell Ontology Workshop May, 2010.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
Linking Animal Models and Human Diseases Supported by NIH P41 HG002659, U54 HG004028, & R01 HG Cambridge University & the University of Oregon.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
GO terms implicitly refer to other term cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation.
Phenotype Ontology Meeting Cold Spring Harbor November 19-20th, 2005 The FlyBase Consortium: Harvard University University of Bloomington-Indiana University.
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
Eye  what kinds of things exist?  what are the relationships between these things? ommatidium sense organeye disc is_a part_of develops from A biological.
Chapters 19 - Genetic Analysis of Development:
Phenote Mark Gibson Berkeley Bioinformatics and Ontology Project (BBOP) National Center for Biomedical Ontologies(NCBO) Lawrence Berkeley National Lab.
OBO Foundry Workshop 2009 Cell Ontology (CL) Preliminary review.
Phenotype And Trait Ontology (PATO) and plant phenotypes
Chapter 15 The Chromosomal Basis of Inheritance.
Gene Ontology Consortium The Pathogen Group Schizosaccharomyces pombe Genome Sequencing Project DictyBase.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Anatomy Ontologies & Potential Users: Bridging the Gap Ravensara Travillian European Bioinformatics Institute
Gene Ontology TM (GO) Consortium
Chapter 21: The Genetic Basis of Development From single cell to multicellular organisms: –Embryonic development involves cell division, morphogenesis,
Spring 2016 BNFO 300 PTK7 protein domains involvement in PCP regulation of axon pathfinding of CoPA neurons in Zebrafish. Damien Islek:
Linking Animal Models and Human Diseases
A statistical method for comparing phenotypes in the OBD
The Teleost Anatomy Ontology: computable evolutionary morphology for teleost fishes Wasila Dahdul University of South Dakota & National Evolutionary Synthesis.
GO : the Gene Ontology & Functional enrichment analysis
The Common Anatomy Reference Ontology (CARO) and queries across species Melissa Haendel ZFIN.
Genetics of qualitative and quantitative phenotypes
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Summary of the Standards of Learning
Presentation transcript:

PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge

Phenotype Information Literature Qualitative descriptions Experimental data Quantitative descriptions Various representation methodologies Complex phenotype data Need for : “A platform for facilitating mutual understanding and interoperability of phenotype information across species and domains of knowledge amongst people and machines” …..

Representation of Phenotypic data Organism attributes T – Species G – Genotype I – Strain S – Genotypic Sex A – Alleles at named loci E–Environmental/handling condition D – Age/stage of development Assay means of making observations Phenotypic Character any feature of the organism that is observed or 'assayed'.

Assay Controlled Vocabulary Abnormality Relative_to Ranges of values Allows the schema to be dynamic Definition of qualities and their relations Explicit differences (between laboratories) Allows labs around the world to “plug-in” their assays to the schema Phenotypic Character Assay Phenotypic Character Phenotypic Character

Phenotypic character representation methodologies Pre-composition Examples: MGI Mouse genotype-phenotype annotation (Mammalian Phenotype) Gramene trait annotation (Plant trait ontology) etc. Pre-composition often follows the compositional structure occasionally adopted by GO terms. Positive/negative regulation of mitosis  positive/negative + regulation of mitosis (GO:0045839) Increased/decreased angiogenesis  increased/decreased + angiogenesis (GO:0001525) Advantages Easy for annotation Control Complex phenotypic information Disadvantages Lack of rigidity Ontology management Expansion Quantitative data

Methodologies (cont.) post-composition The post-composition methodology takes advantage of the ability to describe phenotypes by describing the particular affected entity (bearer), which could be an anatomical structure, a biological process, a particular function etc. , and the qualities that this entity possess, which can be described either in qualitative or quantitative terms.   Advantages Ontology management Rigidity expansion Quantitative data Advanced queries Disadvantages Complex phenotypic information More difficult for annotation Need for constraints for ensuring meaningful annotations

Phenotype And Trait Ontology (PATO) An ontology of phenotypic qualities, which can be shared across different species and domains of knowledge. Qualities are the basic entities that we can perceive and/or measure: colors, sizes, masses, lengths etc. Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exist. Qualities belong in a finite set of quality types (i.e. color, size etc) and inhere in specific individuals. No two individuals can have the same quality, and each quality is specifically constantly dependent on the entity it inheres in.

Phenotypic Character PATO EQ Core Ontologies PATO Entity (E) (e.g. anatomy, behaviour, pathology) PATO Species Independent PATO Species Independent Entity (E) Quality (Q) EQ Phenotype Description EQ Phenotype Description

Simple phenotype descriptions Phenotypic Character entity + quality (mouse body weight) (mouse anatomy: body + PATO: weight) (Drosophila anatomy: eye + PATO: colour) (ChEBI: glucose + PATO: concentration) (eye colour) (glucose concentration) increased size hepatocellular carcinoma hepatocellular carcinoma (MPATH:357) has_quality increased size (PATO:0000586)

Phenotype annotation model Genetic Environment Evidence Qualifier Assertion Source Entity relationship Quality To create tools that enable annotators to compose structured, computationally interpretable phenotype statements. this particular fruitfly participating in this particular flight from here to there (IDs are proxies for these) An attributed statement positing some relation(s) between entities Typically accompanied by associations to evidence-oriented entities and metadata Shh participates_in heart development p53 implicated_in cancer p53 has_function DNA repair PMID:1234 mentions melanoma Abc[-] influences blood pressure Trial3456 has_inclusion_criteria (age that < 65) Attribution Properties Units Who makes the assertion When, what organization

Annotation: Phenotypes in literature Evidence: light microscopy Source: PMID:8431945 Assertion eya1 E=eye disc (FBbt:00001768) Q=condensed (PATO:0001485) influences appears Date: 10/26/2007 Organization: FlyBase Version: 1 M. Ashburner 11

Quantitative Data PATO – part of a representation of qualitative phenotypic information More often than not it is important to record quantitative information that results from a specific measurement of a quality Measurements involve units (Phenotypic Character + Unit) The tail of my mouse is 2.1 cm

PATO & measurements UO – an ontology of unit UO’s top-level division is between primary base units of a particular measure and units that are derived from base units mapping between the various scalar qualities (such as weight, height, concentration etc.) and the corresponding units used to measure those qualities UO includes 264 terms, all of which are defined email list (http://sourceforge.net/mailarchive/forum.php?forum_id=50613)

Mapping PATO to the UO

Linking quantitative data to qualitative descriptions Measurement  qualitative description Assay range normality necessary & sufficient conditions EQ descriptor  high level annotation marking phenodeviance (e.g. MP)

Multiple phenotypic characters to describe complex phenotypes SHH-/+ SHH-/- shh-/+ shh-/-

(character) = entity + quality Phenotype (character) = entity + quality Each EAV set defines a phenotypic character. By combining PCs we can build up a description that defines a complex syndrome or disease. As more is learned about a disease, PCs can be easily added or updated.

(character) = entity + quality P1 = eye + hypoteloric Phenotype (character) = entity + quality P1 = eye + hypoteloric Each EAV set defines a phenotypic character. By combining PCs we can build up a description that defines a complex syndrome or disease. As more is learned about a disease, PCs can be easily added or updated.

(character) = entity + quality P1 = eye + hypoteloric Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic Each EAV set defines a phenotypic character. By combining PCs we can build up a description that defines a complex syndrome or disease. As more is learned about a disease, PCs can be easily added or updated.

(character) = entity + quality P1 = eye + hypoteloric Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied Each EAV set defines a phenotypic character. By combining PCs we can build up a description that defines a complex syndrome or disease. As more is learned about a disease, PCs can be easily added or updated.

+ ZFIN: PATO: eye hypoteloric midface hypoplastic kidney hypertrophied Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied ZFIN: eye midface kidney PATO: hypoteloric hypoplastic hypertrophied + Each EAV set defines a phenotypic character. By combining PCs we can build up a description that defines a complex syndrome or disease. As more is learned about a disease, PCs can be easily added or updated.

(character) = entity + quality P1 = eye + hypoteloric Phenotype (character) = entity + quality P1 = eye + hypoteloric P2 = midface + hypoplastic P3 = kidney + hypertrophied Phenotype = P1 + P2 + P3 (phenotypic profile) = holoprosencephaly Each EAV set defines a phenotypic character. By combining PCs we can build up a description that defines a complex syndrome or disease. As more is learned about a disease, PCs can be easily added or updated.

Assays for complex phenotype data & quantitative data Phenotypic Character Assay Phenotypic Character Phenotypic Character necessary necessary & sufficient phenodeviance

Linking qualitative descriptions across species Decomposition of precomposed phenotype ontologies by providing logical definitions based on PATO Link annotations across different knowledge domains and species Link phenotypic descriptions of human diseases to animal models

Reconciling pre and post composed annotations Retrospective PATO definitions of pre-coordinated terms in phenotype ontology Precomposed Ontologies Mammalian Phenotype Plant trait Worm phenotype etc. OMIM

EQ definitions Aristotelian definitions (genus-differentia) A <Q> *which* inheres_in an <E> [Term] id: MP:0001262 name: decreased body weight namespace: mammalian_phenotype_xp Synonym: low body weight Synonym: reduced body weight def: " lower than normal average weight “[] is_a: MP:0001259  ! abnormal body weight intersection_of: PATO:0000583 ! decreased weight intersection_of: MA:0002405 ! adult mouse

Phenotypic information captured differently within the same domain (OMIM) Query # of records “large bone” 713 "enlarged bone" 136 "big bones" 16 "huge bones" 4 "massive bones" 28 "hyperplastic bones" 8 "hyperplastic bone" 34 "bone hyperplasia" 122 "increased bone growth" 543 OMIM is an extraordinarily valuable resource that consists of 18,344 records (all numbers are of 01/01/08 and were obtained from the NCBI's Entrez server). However, the number of records with both a known sequence and phenotypic data is small, 386 (12,017 records have a known sequence.) OMIM is, fundamentally, a text-based resource, and retrieval of information suffers from this fact, as the following Entrez searches show:

Phenotypic information captured differently across different domains MP:0001265 – decreased body size MP:0001255 – decreased body height WBPhenotype0000229 – small OMIM %210710 – short stature

Logical definitions allow for cross species – domain links [Term] id: MP:0001265 ! decreased body size intersection_of: PATO:0000587 ! decreased size intersection_of: inheres_in MA:0002405 ! adult mouse id: MP:0001255 ! decreased body height intersection_of: PATO:0000569 ! decreased height id: WBPhenotype0000229 ! small intersection_of: OBO_REL:inheres_in WBls:0000041 ! Adult id: OMIM:xxxxxxx ! short stature intersection_of: OBO_REL:inheres_in FMA!:20394 ! Body intersection_of: ATO:0000569 ! decreased height intersection_of: OBO_REL:inheres_in FMA:20394 ! Body

Suzie Lewis....

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Strategy for Annotation Leverage OMIM gene and related disease records Use FMA, CL, GO, EDHAA, CHEBI, PATO ontologies Annotate 5 (in parallel) to check for curator consistency Annotate fly & fish orthologs (FB, ZFA) Import mouse ortholog data (MA, MP)

Testing the methodology Annotated 11 gene-linked human diseases described in OMIM, and their homologs in zebrafish and fruitfly: Gene Disease ATP2A1 Brody Myopathy EPB41 Elliptocytosis EXT2 Multiple Exostoses EYA1 BOR syndrome FECH Protoporphyria PAX2 Renal-Coloboma Syndrome SHH Holoprosencephaly SOX9 Campomelic Dysplasia SOX10 Peripheral Demyelinating Neuropathy TNNT2 Familial Hypertrophic Cardiomyopathy TTN Muscular Dystrophy Incomplete list of “syndromes”!!! 34

An OMIM Record 35

Annotation Results Gene # geno-types phenotype statements total average/ allele ATP2A1 5 16 3 EPB41 4 18 EXT2 35 7 EYA1* 335 19 FECH 14 37 PAX2* 24 183 8 SHH 207 9 SOX9* 13 321 23 SOX10* 15 192 12 TNNT2 10 36 TTN 21 63 Total (11) 146 1443 This shows the results of the annotation effort. For the 11 genes we annotated 146 genotypes with a total of 1443 annotation statements. We performed 4 of these in triplicate (with asterisk) to check for consistency. Without getting into it, the genes annotated in triplicate revealed that the annotators had more than 75% similar annotations. (we just don’t have time in the 15 minutes to go through this.) 36

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Ontology-based similarity scoring First, you have to discuss the scoring metrics. There’s information content, and the IC ratios between things. Nodes are deemed similar on the basis of what they have in common. we are looking for similarity on the basis of shared annotations to classes in an ontology, or to compositional description classes In these cases, we used inferred annotations. E.g. if geneA is annotated to Leg and geneB to Wing, they have Appendage in common. Scoring is typically a measure of what the nodes have in common vs what one node has that the other one does not. The basicSimilarityScore (aka class overlap) is the ratio of nodesInCommon to nodesInUnion . Recall that this includes inferred annotations. This is desirable for two reasons: it allows approximate matching for non-exact classes, and it penalises general matches in favour of specific matches. The information content of a class is a measure of how "surprised" we are to see it in an annotation. The pre-reasoned results are essential for finding nodesInCommon - annotations do not necessarily match exactly - they may match further up the graph. so we do not report or double-count nodes that subsume existing nodes. Ontology-based similarity scoring Measure IC of any node: Compute ‘similarity’ by finding IC ratios between any genotypes, genes, classes, etc. 38

Ontology-based Search Algorithm Now, given that we can compute the IC ratios between any two things, then we can certainly do this for the phenotypic profiles for any two gene pairs. Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q). The annotation profile is the set of classes used to annotate that entity, and their ancestors, via some relevant relation(s). c ∈ A(q) iff link(r,q,c) link(r,q,c) may be computed via reasoning. For example: link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone) Candidate hits are prioritized according to how close they are to the profile. They are ordered in descending order by | A(h) ∩ H(q) |, and the first N are chosen as the final set Ontology-based Search Algorithm Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q). First step: create an annotation profile for the thing to be searched (i.e., a gene) The annotation profile is the set of classes used to annotate that entity, and their ancestors Comparing annotation profiles using same similarity IC metric c ∈ A(q) iff link(r,q,c) link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone) 39

Yes, we can find alleles of same gene # geno-types allelic phenotype profiles phenotype statements # alleles >0 sim ratio average sim ratio average IC ratio total average/ allele ATP2A1 5 0.8 0.799 16 3 EPB41 4 0.315 0.422 18 EXT2 1 35 7 EYA1* 0.226 0.229 335 19 FECH 14 0.365 0.364 37 PAX2* 24 0.068 0.063 183 8 SHH 0.457 0.414 207 9 SOX9* 13 0.207 0.197 321 23 SOX10* 15 0.038 0.031 192 12 TNNT2 10 0.517 0.505 36 TTN 21 0.106 0.1 63 Total (11) 146 142 1443 Those with astersiks (*) were done in triplicate Really, here, the take home message is that for all 11 genes tested, nearly all (exception of two alleles) were able to search in a pairwise way and a find the other alleles of the same gene. (in bold). YES WE CAN!!! 40

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

UBERON: an anatomical linking ontology Each organism has its own anatomical ontology To connect annotations across species, need a way to link the anatomies Wanted an ontology that incorporated both functional homology and anatomical similarity Created an ontology linking anatomies from ZFA, FMA, XAO, MA, MIAA, WBbt, FBbt To enable these queries that annotate using different anatomical ontologies, we needed a way to connect them together. We created an “uber” anatomy ontology that brings together the anatomical parts from the different anatomy ontologies. When used in our searches, the annotations to individual anatomy terms, like fish eye and human eye can be linked together through a common “uber” eye. NEED DIAGRAM HERE 42

UBERON connects phenotype entities from separate anatomy ontologies The entities that annotations were made two in mouse, human, and zebrafish are shown in orange. Then, the links between the ontology terms have been made with the aide of the UBERON ontology… each of the annotated entities can be linked through the UBERON:forebrain term. 43

Homologs are found by similarity search simIC human/ mouse simIC human/ zebrafish Gene ATP2A1 0.047 0.177 EPB41 0.328 0.141 EXT2 0.067 0.050 EYA1 0.264 0.495 FECH 0.430 0.101 PAX2 0.157 0.375 SHH 0.091 0.253 SOX9 0.226 0.383 SOX10 0.380 0.443 TNNT2 0.000 0.118 TTN 0.248 0.567 Using the UBERON connections, we are able to find homologs of each of the human disease genes in mouse and zebrafish. Here, we show the similarity ratio based on information content between the human-mouse and human-zebrafish homologous gene pairs. The phenotypic profiles for each gene represent a consolidation (promotion) of the phenotypic description Eqs. Interesting things are suggested here. Its possible that some of the zebrafish homologs (EYA1, PAX2, SHH, SOX9, SOX10, TTN) might make better models than the mouse homologs for the diseases caused by the human genes. 44

Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

shha is phenotypically similar to homologous pathway members zebrafish shh pathway mouse homologs human homologs shha Shh SHH smo Smo   disp1 Disp1 prdm1a Prdm1 hdac1 HDAC4 scube2 wnt11 Wnt1, 7b, 3a, 9b, 10b WNT6 gli1,2a Gli2, Gli3 GLI2 bmp2b Bmp4 ndr1,2 NDRG1 hhip Hhip ptc1,ptc2 Ptch1,2 Rab23 Gas1 Nck1 Zic2 notch1a Notch1,2 Gsk3b This table shows the list of genes known to be involved in the shh pathway that were retrieved with a similarity search using the zebrafish shh as bait. The list of zf genes is like that in the earlier slide. The mouse and human homologs are also indicated. For some, the mouse/human homologs were retrieved when the zf genes were not. This could be fore several reasons… the biggest reason is that much of the knowledge of the zebrafish pathway members comes from morpholino experiments. The morpholino data was not included in our initial analyses. One of the next steps is for us to include the morpholino data and redo this search. Many of the human homologs also are not annotated… These lacking annotations for the human disease genes therefore represent significant deficiencies and extremely necessary resources for biological research. The next slide shows how these genes fall in the shh pathway… 46

Zebrafish SHH signaling pathway The picture is from KEGG. Their model includes the known members of the the human HH signaling pathway. Additional genes known to be involved in the zebrafish signaling pathway have been added (gli1, gli2a, hdac1, prdm1a, bmp2b, dsp1, ndr2, scube2). Ptc and Smo are transmembrane proteins thought to form a receptor complex for the Hh ligand (7, 8), and the Gli zinc-finger transcription factors have been demonstrated to have both activating and inhibitory roles in the Hh pathway (9–13). A second Ptc gene has been isolated, Ptch-2, which encodes a putative receptor for Shh (14, 15). 47

Potential candidates also found Gene Similarity Characterization dharma 0.483 Paired type homeodomain protein that has dorsal organizer inducing activity and is regulated by wnt signaling. tbx16 0.401 T-box transcription factor regulates mesenchyme to epithelial transition and LR patterning. plod3 0.387 Lysyl hydroxylase and glycosyltransferase important for axonal growth cone migration. ntl 0.382 T-box transcription factor important for notochord and mesoderm development. kny 0.374 Glypican component of the wnt/PCP pathway tll1 0.372 Metalloprotease that can cleave Chordin and increase Bmp activity. copa Cotamer vesicular coat complex important for maintenance of the Golgi and ER transport. Important for notochord differentiation. sfpq 0.369 RNA splicing factor required for cell survival and neuronal development. lama1 Basement membrane protein important for eye and body axis development. lamc1 0.367 Basement membrane protein important for eye development atp7a 0.365 Copper transporting ATPase. atp2a1 0.363 Sarcoplasmic reticulum transmembrane ATPase that mediates calcium re-uptake. flh 0.358 Homeobox gene important for notochord and epiphysis development. Anterior/posterior expression determined by wnt activity. wnt5b 0.327 Extracellular cysteine rich glycoprotein required for convergent extension movements during posterior segmentation. In addition to the known pathway members, there were many more as-yet-unlinked genes found with similar phenotypes to shha. These represent potential pathway candidates. Here we’ve summarized some likely candidates based on their characterization. This is where the real power of this method comes in… discovery! 48

Results thus far Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms)

Conclusions Ontologies help Promising new directions for ontology-based phenotype annotation Promising ways for identifying novel pathway members, generating hypotheses to test at the bench

Acknowledgements NCBO-Berkeley Christopher Mungall Nicole Washington Mark Gibson Rob Bruggner U of Oregon Monte Westerfield Melissa Haendel Cambridge Michael Ashburner George Gkoutos (PATO) David Osumi-Sutherland National Institutes of Health