Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical.

Similar presentations


Presentation on theme: "Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical."— Presentation transcript:

1 Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical Ontology ◦ Stanford University ◦ September, 2009

2 General Interests Programming Languages Automated Reasoning Databases Logic

3 Outline Ontology-based Data Management – Background, Motivation – Theory – Benchmarking – Application Domain, Query Answering Inconsistency Detection – Theory – The serotonin example – GO plus ZFIN, MGI annotations

4 Ontology-based Database Integration: reducing database integration to ontology translation

5

6 Ontology-based Data Management

7 Ontology User Data Annotation Data Management Data Access Layer

8 Example: sisters-siblings All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? { } Obviously, the answer should be : Hilary and Lynn are siblings. { | siblingOf(x,y) }

9 Example: sisters-siblings

10

11 Example: The Gene Ontology GO_0003674 z01, z02, z03 GO_0003674 z01, z02, z03 GO_0005488 e01,e02, e03 GO_0005488 e01,e02, e03 GO_0030528 y01, y02, y03 GO_0030528 y01, y02, y03 GO_0003677 x01, x02, x03 GO_0003677 x01, x02, x03 GO_0003700 w01, w02, w03 GO_0003700 w01, w02, w03 GO_0003676 c01, c02, c03 GO_0003676 c01, c02, c03 GO_0003723 d01, d02, d03 GO_0003723 d01, d02, d03 GO_0008135 a01, a02, a03 GO_0008135 a01, a02, a03 GO_0045182 b01, b02, b03 GO_0045182 b01, b02, b03

12 Example: The Gene Ontology GO_0003674 z01, z02, z03 GO_0003674 z01, z02, z03 GO_0005488 e01,e02, e03 GO_0005488 e01,e02, e03 GO_0030528 y01, y02, y03 GO_0030528 y01, y02, y03 GO_0003677 x01, x02, x03 GO_0003677 x01, x02, x03 GO_0003700 w01, w02, w03 GO_0003700 w01, w02, w03 GO_0003676 c01, c02, c03 GO_0003676 c01, c02, c03 GO_0003723 d01, d02, d03 GO_0003723 d01, d02, d03 GO_0008135 a01, a02, a03 GO_0008135 a01, a02, a03 GO_0045182 b01, b02, b03 GO_0045182 b01, b02, b03

13 Ontology Databases: General Models for Database Designs Generality is important – Avoid rewriting Scalability of KB is important – Persistence, caching and indexing Major generic models – Horizontal Models – Vertical Models – Decomposition Storage Models

14 Ontology Databases: View-based Approach CREATE VIEW v_Person(id) AS SELECT id FROM Person UNION SELECT id FROM v_Male UNION SELECT id FROM v_Female Person Female Male [Pan & Heflin. DLDB: Extending Relational Databases to Support Semantic Web Queries. ISWC, 2003.]

15 Ontology Databases: Active Database Approach Person Female Male [LePendu, et al. Ontology Database: a New Method for Semantic Modeling and an Application to Brainwave Data. SSDBM, 2008.] ON INSERT into Male INSERT into Person On INSERT into Female INSERT into Person

16 Ontology Databases: Active Database Approach Person Female Male ON INSERT into Male INSERT into Person On INSERT into Female INSERT into Person

17 Ontology Databases: Active Database Approach Person Female Male

18 Ontology Databases: Active Database Approach Person Female Male

19 Ontology Databases: Active Database Approach Person Female Male

20 Ontology Databases: Active Database Approach Person Female Male

21 Ontology Databases: Active Database Approach Person Female Male

22 Ontology Databases: Active Database Approach Person Female Male

23 All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited)

24 All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited)

25 All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited)

26 All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited) { | siblingOf(x,y) }

27 All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited) { | siblingOf(x,y) } Just look it up!

28 All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited) { | siblingOf(x,y) } Just look it up! { }

29 Lehigh University Benchmark (LUBM) Load Time and Query Time (1.5 million facts) (10 Universities, 20 Departments) [Guo, et al. LUBM: A Benchmark for OWL Knowledge Base Systems. J Web Semantics, 2005.]

30 Ontology-based Data Management [Frishkoff, et al. Development of Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials. ICBO, 2009]

31 Ontology-based Query Answering Return all data instances that belong to ERP pattern classes which have a surface positivity over frontal regions of interest and are earlier than the N400. Which patterns have a region of interest that is left-occipital and manifests between 220 and 300ms? What is the range of intensity mean for the region of interest for N100? Show the region of interest for all ERP patterns that occur between 0 and 300ms. Which PCA factor do P100 patterns most often appear in? What is the range of intensity mean for the region of interest for N100 patterns? Show the patterns whose region of interest is left occipital and occurs between 220 and 300ms.

32 Inconsistency Detection Background and Motivation – Expressiveness – From disjunctions to negations Theory – Not-gadgets Motivation – Serotonin example – ATP-gated cation channel activity Results from ZFIN and MGI Annotations

33 Not-gadgets ¬ → ¬ ¬

34 Example: inconsistency detection "Annotations in this way sometimes point to errors in the type- type relationships described in the ontology. An example is the recent removal of the type serotonin secretion as an is_a child of neurotransmitter secretion from the GO Biological Process ontology. This modification was made as a result of an annotation from a paper showing that serotonin can be secreted by cells of the immune system where it does not act as a neurotransmitter.“ [Hill, et al. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics, 2008]

35 gene-x not-gadget fail! Example: serotonin secretion

36 Example: GO:0004931 ATP-gated cation channel activity (as of 3/09): [Term] id: GO:0004931 name: ATP-gated cation channel activity namespace: molecular_function def: "Catalysis of the transmembrane transfer of an ion by a channel that opens when extracellular ATP has been bound by the channel complex or one of its constituent parts." [GOC:mah, PMID:9755289] comment: Note that this term refers to an activity and not a gene product. Consider also annotating to the molecular function term 'purinergic nucleotide receptor activity ; GO:0001614'. synonym: "P2X activity" RELATED [] synonym: "purinoceptor" BROAD [] synonym: "purinoreceptor" BROAD [] is_a: GO:0005231 ! excitatory extracellular ligand-gated ion channel activity is_a: GO:0005261 ! cation channel activity

37 Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):

38 Example: GO:0004931 What is so interesting about GO:0004391? ZFINZDB-GENE-030319-2p2rx2NOTGO:0004931ZFIN:ZDB- PUB-031031-8|PMID:14580944IDAFpurinergic receptor P2X, ligand-gated ion channel, 2gene taxon:795520071005ZFIN ZFINZDB-GENE-030319-2p2rx2GO:0004931ZFIN:ZDB- PUB-031031-8|PMID:14580944IGIZFIN:ZDB-GENE-000427-3 Fpurinergic receptor P2X, ligand-gated ion channel, 2 genetaxon:795520071005ZFIN Source: [1/13/2009] http://www.geneontology.org/gene-associations/http://www.geneontology.org/gene-associations/

39 Example: GO:0004931 The not-gadget will raise a logical inconsistency.  p2rx2NOTGO:0004931  p2rx2GO:0004931 GO_0004931 * Tables starting with an '_' are negations. not-gadget fail! _GO_0004931 p2rx2 _GO_0004931 p2rx2

40 Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):

41 Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):

42 Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):

43 ZFIN

44

45 MGI

46

47

48 ZFIN - MGI

49 ZFIN

50 Outcome: suspect IEA annotations

51 GO Online SQL Environment (GOOSE) Source: [1/13/2009] http://www.geneontology.org/GO.database.shtml#diagram pos,IEA (graph_path x association) x neg (grapth_path x association)

52 What do logical inconsistencies mean? Several possibilities: – Incorrect annotation (e.g., suspect IEA annotations) – Incorrect relationship (e.g., serotonin secretion) – Incomplete model: Recall: ZFINZDB-GENE-030319-2p2rx2GO:0004931 ZFIN:ZDB-PUB-031031-8|PMID:14580944IGI ZFIN:ZDB-GENE-000427-3Fpurinergic receptor P2X, ligand-gated ion channel, 2gene taxon:795520071005ZFIN – Perfectly admissible!

53 Next Directions Explanation and proof-reconstruction ✓ Deep (data) annotation tools Distributed network of Ontology Databases

54 Data Annotation: Neural ElectroMagnetic Ontologies LFRON RFRON frontocentral [Frishkoff, et al. ERP measures of partial semantic knowledge: Left temporal indices of skill differences and lexical quality. Biological Psychology, 2009.]

55 Network of Ontology Databases [Thorisson, Muilu and Brookes. Genotype–phenotype databases: challenges and solutions for the post-genomic era. Nature Reviews, 2009.]

56 Thank you Questions?

57

58 Andrea’s Example Is John supervised by a TopManager who is a friend of an AreaManager? [Franconi. Ontologies and databases: myths and challenges. VLDB, 2008.]

59 Raymond Reiter [Reiter. Deductive Question-Answering on Relational Data Bases. Logic and Data Bases, 1977]

60 Raymond Reiter

61

62

63

64 Benchmarking Suite

65 Origins

66 CIS @ UO

67 Research Areas in Computer Science: – software engineering – programming languages – human-computer interaction – parallel and distributed computing – networking and graph theory – scientific computation/visualization – information integration and mining Affiliates: – Neurosciences Institute – Computational Science Institute – Zebrafish Information Network

68 Ontology-based Data Access [Rodriguez-Muro, et al. Realizing Ontology Based Data Access: A plug-in for protégé. ICDEW, 2008.]


Download ppt "Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical."

Similar presentations


Ads by Google