Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Future of Biomedical Informatics Barry Smith University at Buffalo

Similar presentations


Presentation on theme: "1 The Future of Biomedical Informatics Barry Smith University at Buffalo"— Presentation transcript:

1 1 The Future of Biomedical Informatics Barry Smith University at Buffalo http://ontology.buffalo.edu/smith

2 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 2

3 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 3

4 Biomedical Informatics Needs Data Four sides of the equation of translational medicine Biological data + clinical data Access + usability 4

5 5 Problems of gaining access to clinical data 1.privacy, security, liability 2.incentives (value of data...) 3.costs (training...)

6 Making data (re-)usable through standards Standards provide –common structure and terminology –single data source for review (less redundant data) Standards allow –use of common tools and techniques –common training –single validation of data 6

7 7 Problems with standards Not all standards are of equal quality Once a bad standard is set in stone you are creating problems for your children and for your children’s children Standards, especially bad standards, have costs

8 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 8

9 Multiple kinds of data in multiple kinds of silos Lab / pathology data Clinical trial data, including regulatory data Electronic Health Record data Patient histories (free text) Medical imaging Microarray data Protein chip data Flow cytometry Mass spectrometry data Genotype / SNP data Mouse data, fly data, chicken data... 9

10 How to find your data? How to find other people’s data? How to reason with data when you find it? How to work out what data you do not have? How to understand the significance of your own data from 3 years ago? 10

11 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 11

12 12 Sharing Research Data: Investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year are expected to include a plan for data sharing or state why this is not possible (http://grants.nih.gov/grants/policy/d ata_sharing).http://grants.nih.gov/grants/policy/d ata_sharing

13 13 Program Announcement (PA) Number: PAR-07-425 Title: Data Ontologies for Biomedical Research (R01) NIH Blueprint for Neuroscience Research, (http://neuroscienceblueprint.nih.gov/) National Cancer Institute (NCI), (http://www.cancer.gov) National Center for Research Resources (NCRR), (http://www.ncrr.nih.gov/) National Eye Institute (NEI), (http://www.nei.nih.gov/) National Heart Lung and Blood Institute (NHLBI), (http://http.nhlbi.nih.gov ) National Human Genome Research Institute (NHGRI), (http://www.genome.gov) National Institute on Alcohol Abuse and Alcoholism (NIAAA), (http://www.niaaa.nih.gov/) National Institute of Biomedical Imaging and Bioengineering (NIBIB), (http://www.nibib.nih.gov/) National Institute of Child Health and Human Development (NICHD), (http://www.nichd.nih.gov/) National Institute on Drug Abuse (NIDA), (http://www.nida.nih.gov/) National Institute of Environmental Health Sciences (NIEHS), (http://www.niehs.nih.gov/) National Institute of General Medical Sciences (NIGMS), (http://www.nigms.nih.gov/) National Institute of Mental Health (NIMH), (http://www.nimh.nih.gov/) National Institute of Neurological Disorders and Stroke (NINDS), (http://www.ninds.nih.gov/) National Institute of Nursing Research (NINR), (http://www.ninr.nih.gov)http://neuroscienceblueprint.nih.gov/http://www.cancer.govhttp://www.ncrr.nih.gov/http://www.nei.nih.gov/http://http.nhlbi.nih.govhttp://www.genome.govhttp://www.niaaa.nih.gov/http://www.nibib.nih.gov/http://www.nichd.nih.gov/http://www.nida.nih.gov/http://www.niehs.nih.gov/http://www.nigms.nih.gov/http://www.nimh.nih.gov/http://www.ninds.nih.gov/http://www.ninr.nih.gov Release/Posted Date: August 3, 2007 Letters of Intent Receipt Date(s): December 18, 2007, August 18, 2008, December 22, 2009, and August 21, 2009 for the four separate receipt dates.

14 14 Purpose. Optimal use of informatics tools and resources [data sets] depends upon explicit understandings of concepts related to the data upon which they compute. This is typically accomplished by a tool or resource adopting a formal controlled vocabulary and ontology... that describes objects and the relationships between those objects in a formal way.... this FOA solicits Research Project Grant (R01) applications from institutions/ organizations that propose to develop an ontology that will make it possible for software to understand how two or more existing data sets relate to each other.

15 15 Currently, there is no convenient way to map the knowledge that is contained in one data set to that in another data set, primarily because of differences in language and structure.... in some areas there are emerging standards. Examples include: the Unified Medical Language System (UMLS), the Gene Ontology, http://www.geneontology.org/,http://www.geneontology.org/ the work supported by the caBIG project (https://cabig.nci.nih.gov/workspaces/VCDE/),https://cabig.nci.nih.gov/workspaces/VCDE/ ontologies listed at the Open Biomedical Ontology web site (http://obo.sourceforge.net/). http://obo.sourceforge.net/

16 16 This FOA will support limited awards, each of which focuses on integrating information between two (or a few very closely related) data sets in a single subject domain. The hope is that the developed vocabularies and ontologies will serve as nucleation points for other researchers in the area to build upon by adopting and extending the vocabularies and ontologies developed under this FOA. Applicants are expected to identify and adopt emerging standards (such as those listed above) whenever possible. Applicants are also strongly encouraged to federate their data under appropriate infrastructures when possible. One potential infrastructure is provided by the Biomedical Informatics Research Network (http://www.nbirn.net ). The caBIG infrastructure (http://cabig.cancer.gov ) is another well established infrastructure that researchers should consider.http://www.nbirn.nethttp://cabig.cancer.gov

17 17 NIH anticipates that once important data sets in a topical area have been unified that others in that area will adopt the emerging standard. The nucleation points should be able to interact with each other, e.g. through the use of tools that are made freely available to the research community, such as those created by the National Center for Biomedical Ontology (NCBO) (http://bioontology.org/) or by caBIGhttp://bioontology.org/

18 18 Another determinate of ontology acceptance is the degree to which the ontology conforms to best practices governing ontology design and construction. Criteria have been developed, and are undergoing empirical validation, by the Vocabulary and Common Data Element Work Group of caBIG. Other criteria have been specified by the OBO Foundry (http://obofoundry.org/ ). http://obofoundry.org/ In this FOA, the applicant should specify the criteria with which the ontology will conform and the reasons that those criteria are relevant to the data sets being integrated by the proposed ontology.

19 Growth of Clinical and Translational Research Consortia Examples: PharmGKB caBIG BIRN – Biomedical Informatics Research Network –BIRN Ontology Task Force 19

20 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 20

21 http://ontolog y.buffalo.edu/ smith 21 medical records SNOMED codes

22 The Systematized Nomenclature of Medicine built by College of American Pathologists now maintained by International Health Terminology Standards Development Organisation access via Virginia Tech SNOMED CT® Browser http://snomed.vetmed.vt.edu/http://snomed.vetmed.vt.edu/ (semi-) Open Source 22

23 SNOMED often includes non- perspicuous terms FullySpecifiedName: Coordination observable (observable entity) FullySpecifiedName: Coordination (observable entity) 23

24 and more: Self-control behavior: aggression (observable entity) Physical activity target light exercise (finding) is a type of physical activity finding (finding) 24

25 odd bunchings European is a ethnic group 6 Other European in New Zealand (ethnic group) is a ethnic group Mixed ethnic census group is a ethnic group Flathead is a ethnic group 25

26 Poor modular development No clear strategy for improvement Difficult to use for coding A tax on world health information technology? 26

27 SNOMED embraces only some of the multiple kinds of siloed data Lab / pathology data Electronic Health Record data Patient histories Clinical trial data, including regulatory data Medical imaging Microarray data Protein chip data Flow cytometry Mass spectrometry data Genotype / SNP data Mouse data, fly data, chicken data... 27

28 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 28

29 29

30 30 The Gene Ontology Open Source Cross-Species Impressive annotation resource Impressive policies for maintenance

31 MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFES IPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVIS VMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVY TLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLER CHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKY GYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERL KRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRAC ALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVC KLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDD NNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGI SLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLK TLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPW MDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEY ATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSG TTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV 31 sequence of X chromosome in baker’s yeast How to do Biology across the Genome?

32 MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG RFDILLCRDSSREVGE 32

33 33 what cellular component? what molecular function? what biological process?

34 A strategy for translational medicine Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO for given gene product types identified 189 as being mutated at significant frequency and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74. 34

35 GO widely used Sjöblöm T, et al. analyzed 13,023 genes in 11 breast and 11 colorectal cancers using functional information captured by GO for given gene product types identified189 as being mutated at significant frequencies and thus as providing targets for diagnostic and therapeutic intervention. Science. 2006 Oct 13;314(5797):268-74. http: //ont olog ist.c om 35

36 36 Benefits of GO 1.links people to data 2.links data together across species (human, mouse, yeast, fly...) across granularities (molecule, cell, organ, organism, population) 3.links medicine to biological science

37 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 37

38 38 a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.nethttp://obo.sourceforge.net  NCBO BioPortal 2003

39 39

40 40 OntologyScopeURLCustodians Cell Ontology (CL) cell types from prokaryotes to mammals obo.sourceforge.net/cgi- bin/detail.cgi?cell Jonathan Bard, Michael Ashburner, Oliver Hofman Chemical Entities of Bio- logical Interest (ChEBI) molecular entitiesebi.ac.uk/chebi Paula Dematos, Rafael Alcantara Common Anatomy Refer- ence Ontology (CARO) anatomical structures in human and model organisms (under development) Melissa Haendel, Terry Hayamizu, Cornelius Rosse, David Sutherland, Foundational Model of Anatomy (FMA) structure of the human body fma.biostr.washington. edu JLV Mejino Jr., Cornelius Rosse Functional Genomics Investigation Ontology (FuGO) design, protocol, data instrumentation, and analysis fugo.sf.netFuGO Working Group Gene Ontology (GO) cellular components, molecular functions, biological processes www.geneontology.orgGene Ontology Consortium Phenotypic Quality Ontology (PaTO) qualities of anatomical structures obo.sourceforge.net/cgi -bin/ detail.cgi? attribute_and_value Michael Ashburner, Suzanna Lewis, Georgios Gkoutos Protein Ontology (PrO) protein types and modifications (under development)Protein Ontology Consortium Relation Ontology (RO) relationsobo.sf.net/relationshipBarry Smith, Chris Mungall RNA Ontology (RnaO) three-dimensional RNA structures (under development)RNA Ontology Consortium Sequence Ontology (SO) properties and features of nucleic sequences song.sf.netKaren Eilbeck

41 41 RELATION TO TIME GRANULARITY CONTINUANTOCCURRENT INDEPENDENTDEPENDENT ORGAN AND ORGANISM Organism (NCBI Taxonomy) Anatomical Entity (FMA, CARO) Organ Function (FMP, CPRO) Phenotypic Quality (PaTO) Biological Process (GO) CELL AND CELLULAR COMPONENT Cell (CL) Cellular Component (FMA, GO) Cellular Function (GO) MOLECULE Molecule (ChEBI, SO, RnaO, PrO) Molecular Function (GO) Molecular Process (GO) Building out from the original GO

42 42 OBO Foundry Coordinators Lewis Berkeley Ashburner Cambridge Smith Buffalo Mungall Berkeley

43 The goal all biological (biomedical) research data should cumulate to form a single, algorithmically processible, whole http://obofoundry.org 43

44 44 CRITERIA  The ontology is open and available to be used by all.  The ontology is in, or can be instantiated in, a common formal language.  The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap. FOUNDRY CRITERIA

45 45 CRITERIA  UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.  ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.

46 46  OBO Foundry is serving as a benchmark for improvements in discipline-focused terminology resources  yielding callibration of existing terminologies and data resources and alignment of different views Consequences

47 47 Mature OBO Foundry ontologies (now undergoing reform) Cell Ontology (CL) Chemical Entities of Biological Interest (ChEBI) Foundational Model of Anatomy (FMA) Gene Ontology (GO) Phenotypic Quality Ontology (PaTO) Relation Ontology (RO) Sequence Ontology (SO)

48 48 Ontologies being built to satisfy Foundry principles ab initio Common Anatomy Reference Ontology (CARO) Ontology for Biomedical Investigations (OBI) Protein Ontology (PRO) RNA Ontology (RnaO) Subcellular Anatomy Ontology (SAO)

49 49 Ontologies in planning phase Biobank/Biorepository Ontology (BrO, part of OBI) Environment Ontology (EnvO) Immunology Ontology (ImmunO) Infectious Disease Ontology (IDO)

50 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 50

51 NCBO National Center for Biomedical Ontology (NIH Roadmap Center) 51 Stanford Medical Informatics University of San Francisco Medical Center Berkeley Drosophila Genome Project Cambridge University Department of Genetics The Mayo Clinic University at Buffalo Department of Philosophy

52 1.Biomedical Informatics Needs Data 2.The Problem of Local Coding Schemes 3.NIH Policies for Data Reusability and the Growth of Clinical Research Consortia 4.Is SNOMED the Solution? 5.The Gene Ontology 6.The OBO Foundry 7.The National Center for Biomedical Ontology 8.Ontology in Buffalo 52

53 53 Ontology Research Group in CoE Werner Ceusters Louis Goldberg Barry Smith Robert Arp Thomas Bittner Maureen Donnelly David Koepsell Ron Rudnicki Shahid Manzoor

54 54 Ontologies in Buffalo Common Anatomy Reference Ontology (CARO) Environment Ontology (EnvO) Foundational Model of Anatomy (FMA) Infectious Disease Ontology (IDO) MS Ontology Protein Ontology (PRO) Relation Ontology (RO)

55 55 Ontologies planned ICF Ontology Food Ontology Allergy Ontology Vaccine Ontology Ontology for Community-Based Medicine Psychiatry Ontology


Download ppt "1 The Future of Biomedical Informatics Barry Smith University at Buffalo"

Similar presentations


Ads by Google