1 Using Ontologies for Annotation of Genomic Data Barry Smith University at Buffalo

Slides:



Advertisements
Similar presentations
More than one way to dissect an animal Melissa Haendel ZFIN Scientific Curator.
Advertisements

Confessions/Disclaimers Ontologies and REDfly CARO SO OBO Foundry.
Species-Neutral vs. Multi-Species Ontologies Barry Smith.
1 The Future of Biomedical Informatics Barry Smith University at Buffalo
Goal and Status of the OBO Foundry Barry Smith. 2 Semantic Web, Moby, wikis, crowd sourcing, NLP, etc.  let a million flowers (and weeds) bloom  to.
The Environment Ontology Barry Smith 1.
1 Introduction to Biomedical Ontology Barry Smith University at Buffalo
Welcome to the Second Annual Infectious Disease Ontology Workshop Generously supported by.
The Future of Health Information Barry Smith Ontology Research Group Center of Excellence in Bioinformatics and Life Sciences University at Buffalo ontology.buffalo.edu/smith.
1 The OBO Foundry Towards Gold Standard Terminology Resources in the Biomedical Domain Thomas Bittner (based on a presentation by Barry Smith)
1 Intelligence Ontology: A Strategy for the Future Barry Smith University at Buffalo
1 How Ontologies Create Research Communities Barry Smith
1 Workshop 7.00 Welcoming Remarks 7.15 Barry Smith (Buffalo, NY) 7.40 Lindsay Cowell (Duke University, NC) 8.05 Nigam Shah (Stanford University, CA) 8.30.
1 The OBO Foundry Barry Smith University at Buffalo
1 How Ontologies Create Research Communities Barry Smith University at Buffalo
What is an ontology and Why should you care? Barry Smith with thanks to Jane Lomax, Gene Ontology Consortium 1.
1 The OBO Foundry 2 A prospective standard designed to guarantee interoperability of ontologies from the very start (contrast.
The Problem of Reusability of Biomedical Data OBO Foundry & HL7 RIM Barry Smith.
Underlying Ontologies for Biomedical work - The Relation Ontology (RO) and Basic Formal Ontology (BFO) Thomas Bittner SUNY Buffalo
The Future of Ontology in Buffalo Barry Smith 1.
1 Introduction to Ontology Barry Smith
Room for Lunch: Arlington Room Room for Evening Reception: Grand Prairie Room.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
CTO - Clinical Trials/Research in the Ontology of Biomedical Investigation Richard H. Scheuermann U.T. Southwestern Medical Center.
The RNA Ontology RNAO Colin Batchelor Neocles Leontis May 2009 Eckart, Colin and Jane In Cambridge.
1 BIOLOGICAL DOMAIN ONTOLOGIES & BASIC FORMAL ONTOLOGY Barry Smith.
1 The OBO Foundry Barry Smith Center of Excellence in Bioinformatics & Life Sciences, University at Buffalo IFOMIS, Saarland University
CoE Ontology Research Group (ORG) Barry Smith Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Department of Philosophy.
How to Organize the World of Ontologies Barry Smith 1.
New York State Center of Excellence in Bioinformatics & Life Sciences Biomedical Ontology in Buffalo Part I: The Gene Ontology Barry Smith and Werner Ceusters.
The Core Infectious Disease Ontology. Purpose: To make infectious disease-relevant data deriving from different sources comparable and computable Across.
Introduction to Ontologies for Environmental Biology Barry Smith
The OBO Foundry Chris Mungall Lawrence Berkeley Laboratory NCBO GO Consortium May 2007.
1 The Canonical Life Barry Smith
1 Ontology (Science) Barry Smith University at Buffalo
Infectious Disease Ontology Lindsay Cowell Department of Biostatistics and Bioinformatics Duke University Medical Center.
Building the Ontology Landscape for Cancer Big Data Research Barry Smith May 12, 2015.
Gene Ontology Project
Limning the CTS Ontology Landscape Barry Smith 1.
Mouse; NOTC1; Q01705 Function: protein binding Process: Notch signaling pathway Process: positive regulation of transcription from RNA polymerase II promoter.
Ontology of Sensors: Some Examples from Biology
Ontological realism as a strategy for integrating ontologies Ontology Summit February 7, 2013 Barry Smith 1.
Intelligence Ontology A Strategy for the Future Barry Smith University at Buffalo
Resurrecting SOWG BS, Baltimore, CTS Ontology Workshop April
1 How Ontologies Create Research Communities Barry Smith University at Buffalo
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
The Gene Ontology: a real-life ontology, progress and future. Jane Lomax EMBL-EBI.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
Biomedical Ontologies: The State of the Art Barry Smith and Werner Ceusters MIE, Sarajevo, August 30 1.
1 Introduction to Bio-Ontologies Barry Smith
2 3 where in the body ? where in the cell ?
What is an ontology and Why should you care? Barry Smith 1.
Gene Ontology Project
Need for common standard upper ontology
Introduction to Biomedical Ontology for Imaging Informatics Barry Smith, PhD, FACMI University at Buffalo May 11, 2015.
Influenza Ontology Infectious Disease Ontology Workshop 2008 Burke Squires.
1 An Introduction to Ontology for Scientists Barry Smith University at Buffalo
1 Ontology (Science) vs. Ontology (Engineering) Barry Smith University at Buffalo
Immunology Ontology Rho Meeting October 10, 2013.
OBO Foundry Principles BFO RO Barry Smith 1. OBO Foundry Principles  open  common formal language (OBO Format, OWL DL, CL)  commitment to collaboration.
Basic Formal Ontology Barry Smith August 26, 2013.
Building Ontologies with Basic Formal Ontology Barry Smith May 27, 2015.
Joined up ontologies: incorporating the Gene Ontology into the UMLS.
What is an ontology and Why should you care? Barry Smith 1.
What is an ontology and Why should you care?
Intelligence Ontology: A Strategy for the Future
Infectious Disease: A New Challenge for Biomedical Informatics
Presentation transcript:

1 Using Ontologies for Annotation of Genomic Data Barry Smith University at Buffalo

Outline 1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

Who am I? NCBO: National Center for Biomedical Ontology (NIH Roadmap Center) 4 −Stanford Medical Informatics −University of San Francisco Medical Center −Berkeley Drosophila Genome Project −Cambridge University Department of Genetics −The Mayo Clinic −University at Buffalo (PI of Dissemination and Ontology Best Practices)

Who am I? Duke/Dallas CTSA Ontology Consortium Cleveland Clinic Semantic Database in Cardiothoracic Surgery Gene Ontology Scientific Advisory Board Biomedical Informatics Research Network (BIRN) Ontology Task Force Advancing Clinico-Genomic Trials on Cancer (ACGT) 5

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Mass spec Genotype / SNP data 7

How to find your data? How to find and integrate other people’s data? How to reason with data when you find it? How to understand the significance of the data you collected 3 years earlier? Part of the solution must involve consensus- based, standardized terminologies and coding schemes 8

Making data (re-)usable through standards Standards provide –common structure and terminology –single data source for review (less redundant data) Standards allow –use of common tools and techniques –common training –single validation of data 9

10 Problems with standards Standards involve considerable costs of re- tooling, maintenance, training,... Not all standards are of equal quality Bad standards create lasting problems

11 NIH Mandates for Sharing of Research Data Investigators submitting an NIH application seeking $500,000 or more in any single year are expected to include a plan for data sharing (

12 Program Announcement Number: PAR Title: Data Ontologies for Biomedical Research (R01) NIH Blueprint for Neuroscience Research, ( National Cancer Institute (NCI), ( National Center for Research Resources (NCRR), ( National Eye Institute (NEI), ( National Heart Lung and Blood Institute (NHLBI), ( ) National Human Genome Research Institute (NHGRI), ( National Institute on Alcohol Abuse and Alcoholism (NIAAA), ( National Institute of Biomedical Imaging and Bioengineering (NIBIB), ( National Institute of Child Health and Human Development (NICHD), ( National Institute on Drug Abuse (NIDA), ( National Institute of Environmental Health Sciences (NIEHS), ( National Institute of General Medical Sciences (NIGMS), ( National Institute of Mental Health (NIMH), ( National Institute of Neurological Disorders and Stroke (NINDS), ( National Institute of Nursing Research (NINR), (

13 Purpose. Optimal use of informatics tools and data resources depends upon explicit understandings of concepts related to the data upon which they compute. This is typically accomplished by a tool or resource adopting a formal controlled vocabulary and ontology.

14 Currently, there is no convenient way to map the knowledge that is contained in one data set to that in another data set, primarily because of differences in language and structure... in some areas there are emerging standards. Examples include: the Unified Medical Language System (UMLS), the Gene Ontology, the caBIG project, Open Biomedical Ontologies (OBO)

15 NIH anticipates that, once important data sets in a topical area have been unified, others in that area will adopt the emerging standard. The nucleation points should be able to interact with each other, e.g. through the use of the tools made freely available by the National Center for Biomedical Ontology (NCBO) ( or by caBIG.

16 Another determinate of ontology acceptance is the degree to which the ontology conforms to best practices governing ontology design and construction.... the applicant should specify the criteria with which the ontology will conform Criteria have been developed by the Vocabulary and Common Data Element Work Group of caBIG and by the OBO Foundry (

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV How to do biology across the genome?

MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 19

20 what cellular component? what molecular function? what biological process? through annotation of data

21 what cellular component? what molecular function? what biological process? and through curation of literature

22 what cellular component? what molecular function? what biological process? three types of data

Clark et al., 2005 part_of is_a 23

24

The Gene Ontology 25

WormBase Gramene FlyBase Rat Genome Database DictyBase Mouse Genome Database The Arabidopsis Information Resource The Zebrafish Information Network Berkeley Drosophila Genome Project Saccharomyces Genome Database Gene Ontology Consortium

Benefits of GO 1.rooted in basic experimental biology 2.links people to data and to literature 3.links data to data across species (human, mouse, yeast, fly...) across granularities (molecule, cell, organ, organism, population) 4.links medicine to biological science 5.cumulation of scientific knowledge in algorithmically tractable form 27

A strategy for translational medicine Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO identified 189 genes as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science Oct 13;314(5797):

29

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine: Open Biomedical Ontologies 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

31 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck

32 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)

Clark et al., 2005 part_of is_a 33

Goal of the OBO Foundry all biomedical research data should cumulate to form a single, algorithmically processable, whole Smith, et al. Nature Biotechnology, Nov

35 CRITERIA The ontology is open and available to be used by all. The ontology is instantiated in, a common formal language and shares a common formal architecture The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. OBO FOUNDRY CRITERIA

36 CRITERIA  The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.  They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.

37 Mature OBO Foundry ontologies Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)

38 Foundry ontologies being built ab initio Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)

39 Ontologies in planning phase Environment Ontology (EnvO) Infectious Disease Ontology (IDO) Biobank/Biorepository Ontology Food Ontology Allergy Ontology Vaccine Ontology

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

Anatomy Ontologies Fish Multi-Species Anatomy Ontology (NSF funding received) Ixodidae and Argasidae (Tick) Anatomy Ontology Mosquito Anatomy Ontology (MAO) Spider Anatomy Ontology (SPD) Xenopus Anatomy Ontology (XAO) undergoing reform: Drosophila and Zebrafish Anatomy Ontologies 41

Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query brain without ontology 20 Query brain with ontology 45 42

Multiple axes of classification Functional: cardiovascular system, nervous system Spatial: head, trunk, limb Developmental: endoderm, germ ring, lens placode Structural: tissue, organ, cell Stage: developmental staging series 43

CARO – Common Anatomy Reference Ontology for the first time provides guidelines for model organism researchers who wish to achieve comparability of annotations Haendel et al., “CARO: The Common Anatomy Reference Ontology”, in: Burger (ed.), Anatomy Ontologies for Bioinformatics: Springer, in press. 44

45

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.IDO: The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

We have data TBDB: Tuberculosis Database, including Microarray data VFDB: Virulence Factor DB TropNetEurop Dengue Case Data ISD: Influenza Sequence Database at LANL PathPort: Pathogen Portal Project... 47

We need to annotate this data to allow retrieval and integration of –sequence and protein data for pathogens –case report data for patients –clinical trial data for drugs, vaccines –epidemiological data for surveillance, prevention –... Goal: to make data deriving from different sources comparable and computable 48

IDO needs to work with Disease Ontology (DO) + SNOMED CT Gene Ontology Immunology Branch Phenotypic Quality Ontology (PATO) Protein Ontology (PRO) Sequence Ontology (SO)... 49

We need common controlled vocabularies to describe these data in ways that will assure comparability and cumulation What content is needed to adequately cover the infectious domain? –Host-related terms (e.g. carrier, susceptibility) –Pathogen-related terms (e.g. virulence) –Vector-related terms (e.g. reservoir, –Terms for the biology of disease pathogenesis (e.g. evasion of host defense) –Population-level terms (e.g. epidemic, endemic, pandemic, ) 50

IDO Processes 51

IDO Qualities 52

IDO Roles 53

IDO provides a common template IDO works like CARO. It contains terms (like ‘pathogen’, ‘vector’, ‘host’) which apply to organisms of all species involved in infectious disease and its transmission Disease- and organism-specific ontologies built as refinements of the IDO core 54

Malaria Vectors of 422 species of Anopheles worldwide, about 40 are significant vectors for malaria in humans IDO Malaria ontology will contain those terms which apply to all types of malarial plasmodium infection 55

Disease-specific IDO test projects MITRE, Mount Sinai, UTSouthwestern – Influenza –Stuart Sealfon, Joanne Luciano, IMBB/VectorBase – Vector borne diseases (A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus) –Kristos Louis Colorado State University – Dengue Fever –Saul Lozano-Fuentes Duke – Tuberculosis –Carol Dukes-Hamilton Cleveland Clinic – Infective Endocarditis –Sivaram Arabandi University of Michigan – Brucilosis –Yongqun He 56

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

58 All OBO Foundry ontologies work in the same way –we have data (biosample, haplotype, clinical data, survey data,...) –we need to make this data available for semantic search and algorithmic processing –we create a consensus-based ontology for annotating the data

59

60

61

62

to enhance alignment of data about instances (communities, places,...) 63

to enhance alignment of data about relevant types of entities (origin, community, cell type, race, family...) 64

65

to enhance coordination of research 66

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

Community / Population Ontology 68 − family, clan − ethnicity − religion − diet − social networking − education (literacy...) − healthcare (economics...) − household forms − demography − public health −...

69 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)

70 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Family, Community, Deme, Population Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)

71 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO)

1.Who am I? 2.How to find your data 3.How to do biology across the genome 4.How to extend the GO methodology to clinical and translational medicine 5.Anatomy Ontologies: An OBO Foundry success story 6.The Infectious Disease Ontology 7.Towards a controlled vocabulary for community-based medicine 8.The Community Ontology and its branches 9.The Environment Ontology: A new type of patient data

73 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Organ Function (FMP, CPRO) Population Phenotype Population Process ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) E N V I R O N M E N T

74 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Environment of population ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Environment of single organism CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Environment of cell MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular environment E N V I R O N M E N T

75 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS Family, Community, Deme, Population Environment of population ORGAN AND ORGANISM Organism (NCBI Taxonomy) (FMA, CARO) Environment of single organism* CELL AND CELLULAR COMPONENT Cell (CL) Cell Com- ponent (FMA, GO) Environment of cell MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular environment E N V I R O N M E N T * The sum total of the conditions and elements that make up the surroundings and influence the development and actions of an individual.

76 RELATION TO TIME GRANULARITY CONTINUANT INDEPENDENT COMPLEX OF ORGANISMS biome / biotope, territory, habitat, neighborhood,... work environment, home environment; host/symbiont environment;... ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT extracellular matrix; chemokine gradient;... MOLECULE hydrophobic surface; virus localized to cellular substructure; active site on protein; pharmacophore... E N V I R O N M E N T

clinical data includes clinical records clinical trial data demographic data National Hospital Discharge Survey National Ambulatory Medical Care Surveys MEDPAR Medicare’s national claims data base 77

The Environment Ontology 78 OBO Foundry Genomic Standards Consortium National Environment Research Council (UK) USDA, Gramene, J. Craig Venter Institute...

79 Applications of EnvO in biology

How EnvO currently works for information retrieval Retrieve all experiments on organisms obtained from: –deep-sea thermal vents –arctic ice cores –rainforest canopy –alpine melt zone Retrieve all data on organisms sampled from: –hot and dry environments –cold and wet environments –a height above 5,000 meters Retrieve all the omic data from soil organisms subject to: –moderate heavy metal contamination 80

extending EnvO to clinical and translational research we have public heath, community and population data we need to make this data available for search and algorithmic processing we create a consensus-based ontology which can interoperate with ontologies for neighboring domains of medicine and basic biology 81

Environment = totality of circumstances external to a living organism or group of organisms –pH –evapotranspiration –turbidity –available light –predominant vegetation –predatory pressure –nutrient limitation … 82

extend EnvO to the clinical domain –dietary patterns (Food Ontology: FAO, USDA)... allergies –neighborhood patterns built environment, living conditions climate social networking crime, transport education, religion, work health, hygiene –disease patterns bio-environment (bacteriological,...) patterns of disease transmission (links to IDO) 83

a new type of patient data a patient’s environmental history use EnvO and the community ontology to mine relations between disease phenotypes and environmental patterns and patterns of community behavior 84

with thanks to CARO: Fabian Neuhaus (NIST), Melissa Haendel (ZFin), David Sutherland (Flybase) EnvO: Dawn Field, Norman Morrison, (NERC) IDO: Lindsay Cowell (Duke) OBO Foundry: Michael Ashburner, Suzanna Lewis, Chris Mungall (Flybase, GO), Alan Ruttenberg (MIT, Neurocommons) NCBO : NIH RFA-RM PRO: NIH R01 GM ACGT: European Commission IST