Presentation is loading. Please wait.

Presentation is loading. Please wait.

@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal.

Similar presentations


Presentation on theme: "@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal."— Presentation transcript:

1 @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal Scientist

2 Weather conditions Open source ethic is mainstream Beginnings of a viable Semantic Web Funders: products of public science not optimally used Burgeoning quality-focused developer community

3 Initial standardizations OWL 1.0 (OWL 1.1 WG in progress) SPARQL Viable tools Scalable triple stores e.g. Virtuoso, Oracle… Reasoners: Pellet, Fact++, CEL, QuOnto… Beginnings of a viable Semantic Web

4 Funders: Products of public science not optimally used Both government and philanthropies Data sharing mandates Open access publication mandates Recognition that Ontology can play key role (and funding) Wonderweb, NCBO, JCOR, (more in Europe, beginnings in Australia, China) E.g. NIH Ontology grants

5 Burgeoning quality-focused developer community W3C Semantic Web for Life Sciences Interest Group Brings together scientists, medical researchers, science writers and informaticians from academia, government, non- profit organizations - health care, pharmaceuticals and industry vendors Chartering of second phase in progress OBO Foundry Principle-based development of science-based ontologies with the goal of creating a suite of interoperable reference ontologies for biomedicine. Process and governance are being refined Groups are lining up to join

6 Some projects I’m involved in The challenge of data integration at Web scales The Neurocommons Collaborative Ontology Development OBI – The Ontology for Biomedical Investigations Identifying and working through aspects of Ontology Working with, and on, the Basic Formal Ontology What is a Gene Ontology Annotation?

7 The Neurocommons AddGene Plasmids NeuronDB BAMS Neurocommons text mining Homologene SWAN Entrez Gene Gene ontology annotations Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas Publications CCDB Neuronbank OBO Ontologies NeuroMorpho SAO Coriell cells

8 What’s a (Science) Commons? Built on open resources: public domain, open databases, open literature Encoded in open architectures and technical standards

9 Science Commons Science Commons is a project of Creative Commons Creative Commons provides free tools that let authors, scientists, artists, and educators easily mark their creative work with the freedoms they want it to carry 140,000,000 objects on the Web under CC licenses in 40+ countries 700+ peer-reviewed journals carry CC licensing, including Public Library of Science Science Commons specializes CC to science For consumers of knowledge: make it easy to use and re-use information and increase chances for discovery For providers of knowledge: provide legal certainty and automated attribution and tracking For funders: provide new metrics for tracking return on investment based on re-use

10 Neurocomons approach From OBO Foundry: Carefully model biology to enable integration of data sources. “Audit trail to reality” From Web: Assign all biological entities URIs (lots already provided by OBO) and translate to OWL/RDF From OWL: Add triples inferred by reasoner to increase expressiveness of queries with even simple query engine From software engineering: Provide data via SPARQL first (API). Build tools on top of that. From open source movement: Make it freely available, reproducible

11 The Gene Ontology The gene ontology names many biological processes and tells us which genes are known to be involved in those processes.

12 The Gene Ontology (a small portion) Activation of innate immune response Cell surface pattern recognition receptor signaling pathway Biological Process is_a part_of

13 A simple query: Biological processes in dendrites? Alzheimer’s disease is characterized by neural degeneration. Among other things, there is damage to dendrites and axons, parts of nerve cells. What resources do we have available to learn more about biological processes in dendrites?

14 Biological processes naming dendrites PREFIX owl: PREFIX go: PREFIX obo: PREFIX rdfs: select ?name ?class ?definition from where { graph {?class rdfs:subClassOf go:GO_0008150} ?class rdfs:label ?name. ?class obo:hasDefinition ?def. ?def rdfs:label ?definition filter(regex(?name,"[Dd]endrite")) } URI for Biological Process (OBO Foundry principles guarantee unique names for each Universal)

15 From the “console”

16 But answers are also available by a “GET” /sparql/?query=PREFIX%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2 F2002%2F07%2Fowl%23%3E%0APREFIX%20go%3A%20%3Chttp%3A%2F% 2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0APREFIX%20obo%3A%20%3C http%3A%2F%2Fwww.geneontology.org%2Fformats%2FoboInOwl%23%3E%0 APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01 %2Frdf- schema%23%3E%0A%0Aselect%20%20%3Fname%20%20%3Fclass%20%3F definition%0Afrom%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2 F20070416%3E%0Awhere%0A%7B%20%20%20graph%20%3Chttp%3A%2F %2Fpurl.org%2Fcommons%2Fhcls%2F20070416%2Fclassrelations%3E%0A% 20%20%20%20%20%7B%3Fclass%20rdfs%3AsubClassOf%20go%3AGO_000 8150%7D%0A%20%20%20%20%3Fclass%20rdfs%3Alabel%20%3Fname.%0 A%20%20%20%20%3Fclass%20obo%3AhasDefinition%20%3Fdef.%0A%20% 20%20%20%3Fdef%20rdfs%3Alabel%20%3Fdefinition%20%0A%20%20%20% 20filter(regex(%3Fname%2C%22%5BDd%5Dendrite%22))%0A%7D%0A&form at=&maxrows=50 So someone, somewhere else, can build something better *Note: Different query than previous slide

17 Three levels of representing scientific knowledge Record level: Represent database records. Inconsistent if two sources disagree about contents of a field. Statement level: Represent what researchers say. Inconsistent if two people disagree about what a paper said Domain level: OBO Foundry approach. Represent your best understanding of consensus. Inconsistent if facts contradict. We need all three (but make clear which is which) Next slide query is hybrid of Record/Domain

18 A SPARQL query for processes involved in pyramidal neurons prefix go: prefix rdfs: prefix owl: prefix mesh: prefix sc: prefix ro: select ?genename ?processname where { graph { ?paper ?p mesh:D017966. ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph {{?process go:GO_0007166} union {?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph { ?gene rdfs:label ?genename } graph { ?process rdfs:label ?processname} } Mesh: Pyramidal Neurons Pubmed: Journal Articles Entrez Gene: Genes GO: Signal Transduction Inference required

19 Google: 223,000 results

20 Results DRD1, 1812adenylate cyclase activation ADRB2, 154adenylate cyclase activation ADRB2, 154arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway DRD1IP, 50632dopamine receptor signaling pathway DRD1, 1812dopamine receptor, adenylate cyclase activating pathway DRD2, 1813dopamine receptor, adenylate cyclase inhibiting pathway GRM7, 2917G-protein coupled receptor protein signaling pathway GNG3, 2785G-protein coupled receptor protein signaling pathway GNG12, 55970G-protein coupled receptor protein signaling pathway DRD2, 1813G-protein coupled receptor protein signaling pathway ADRB2, 154G-protein coupled receptor protein signaling pathway CALM3, 808G-protein coupled receptor protein signaling pathway HTR2A, 3356G-protein coupled receptor protein signaling pathway DRD1, 1812G-protein signaling, coupled to cyclic nucleotide second messenger SSTR5, 6755G-protein signaling, coupled to cyclic nucleotide second messenger MTNR1A, 4543G-protein signaling, coupled to cyclic nucleotide second messenger CNR2, 1269G-protein signaling, coupled to cyclic nucleotide second messenger HTR6, 3362G-protein signaling, coupled to cyclic nucleotide second messenger GRIK2, 2898glutamate signaling pathway GRIN1, 2902glutamate signaling pathway GRIN2A, 2903glutamate signaling pathway GRIN2B, 2904glutamate signaling pathway ADAM10, 102integrin-mediated signaling pathway GRM7, 2917negative regulation of adenylate cyclase activity LRP1, 4035negative regulation of Wnt receptor signaling pathway ADAM10, 102Notch receptor processing ASCL1, 429Notch signaling pathway HTR2A, 3356serotonin receptor signaling pathway ADRB2, 154transmembrane receptor protein tyrosine kinase activation (dimerization) PTPRG, 5793transmembrane receptor protein tyrosine kinase signaling pathway EPHA4, 2043transmembrane receptor protein tyrosine kinase signaling pathway NRTN, 4902transmembrane receptor protein tyrosine kinase signaling pathway CTNND1, 1500Wnt receptor signaling pathway Many of the genes are indeed related to Alzheimer’s Disease through gamma secretase (presenilin) activity

21 What happens when data is discoverable, queryable, and accessible on the open web? Allen Brain Institute Servers Javascript SPARQL AJAX Query URL http://www.brainmap.org://….0205032816_B.aff/TileGroup3/1-0-1.jpg Google Maps API http://hcls1.csail.mit.edu/map/#Kcnip3@2850,Kcnd1@2800 Neurocommons Servers

22 Others can “view source”, use our code in their own applications

23 Background Technology So far about 350M triples in Openlink Virtuoso (~20Gb) Commodity Hardware: 2x2core duo/2 disks/8G Ram Biggest so far is MeSH associations to articles (200M triples) Smaller, from 10K to 10M triples/source A small fraction of biological knowledge (another element of the perfect storm is that computer hardware is so cheap and powerful)

24 Results are success, but process more so Sample of three interesting cases on the way to the neurocommons Integration of Senselab Finding and addressing inconsistency Modeling Gene Ontology Annotations

25 Process(1): NeuronDB Started with homegrown ontology. Problem: How to link with anything else Eg. No links to evidence, “receptors” versus proteins with receptor activity (like GOA) Process, iterate many times, fixing OWL, GO understanding/conformance, augmenting what is in ontology. Ends with something that links with GO Function. Accepted process for how to move both NeuronDB and GO forward. Next slides – in detail how the discussion/teaching goes

26 Words mix up functions and objects Ligand Neurotransmitter Hormone Peptide Looking for peptides?

27 Foundry approach connects words to their corresponding entities in reality PeptideReceptorLigand - A peptide that has a function which makes it able to bind to a receptor PeptideNeurotransmitter - A peptide expressed in a neuron that has a function which makes it able to regulate another neuron PeptideHormone - A peptide that produced in one organ and having an regulatory effect in another. Peptide - A “short” polymer of amino acids Looking for peptides?

28 Peptides from CHEBI Chemical Entities of Biological Interest

29 Hormone Activity from GO Molecular Function

30

31 Towards RDF/OWL (1) ALL instances of PeptideHormone are an instance of Peptide that has_role SOME instance of HormoneActivity

32 Towards RDF/OWL (3) ALL instances of PeptideHormone are an instance of Peptide that has_role SOME instance of HormoneActivity

33 Towards RDF/OWL (3) - Instances

34 Towards RDF/OWL (4) URIs chebi:25905 =

35 Towards OWL (5) : triples chebi:25905 rdfs:subClassOf chebi:16670. chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_00179. …

36 SPARQLing: Put ?variables where you are looking for matches chebi:25905 rdfs:subClassOf chebi:16670. chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_00179. select ?moleculeClass where { ?moleculeClass rdfs:subClassOf chebi:16670. ?moleculeClass rdfs:subClassOf ?res. ?res owl:onProperty ro:hasRole. ?res owl:someValuesFrom go:GO_00179. } ?moleculeClass = chebi:25905

37 Process(2): Inconsistency! Once Neurondb is coded properly, and an OWL reasoner is run, it declares the ontology inconsistent Problem: There are contradictory assertions about whether a particular ionic current occurs in a particular cell type. What to do? “ Three levels of representing scientific knowledge” tell us how inconsistency arises in each Inconsistency is NOT acceptable, but might this be an issue of confusion over desired level?

38 The dispute: Ionic current? Yes or No Another investigation One investigation Illustration – not the particular cell/current

39 Resolving the inconsistency If at the statement level, there need be no inconsistency if the assertions are qualified as being statements of someone. Choice 1: Rework representation to make this so If at the domain level, then only one can be right. Choice 2) As curator make judgement about which is right, or, see if information missing in the representation that would have this not be a contradiction. Resolution: Domain level is desired. Closer examination of papers find results from different species. Example of “ontological commitment” and dealing with consequences.

40 Process(3): What is a GO Annotation

41 Problems with integrating annotations with other knowledge What are the entities? What are the relationships between the process and the entities. How can we make All-Some statements involving annotations?

42 A closer look Ask me about evidence?

43 Semantic Web technology and ontology in the service of science Let our tools help us find mistakes (and other insights) by having representation that is good enough to be wrong. Expressed formally, and in conjunction with a reasoner, we might find that it can't possibly be there are instances of this class (unsatisfiable)

44 Public science: What we’d like to do better Broader knowledge base - cells, anatomy, physiology, behavior, protocols, reagents Beyond simple interaction: More precise representations of mechanism to be able to query and exploit computationally Built in a open, scalable, scientifically credible way, to encourage sustained contribution, and to take advantage of “web effects”

45 How do we get there? Interoperation is paramount, but modeling is hard: Work with the OBO Foundry Build a skilled community Use (open!) Semantic Web Technologies to enable web effects Support and nurture a growing and vigorous community (SWAN, BIRN, OBI) all of whom build on the rest and enable others to build more Work to advance key technologies and infrastructure - text mining, structured abstracts, query, reasoning. Recruit more ontologists! (That’s you)


Download ppt "@Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal."

Similar presentations


Ads by Google