Presentation is loading. Please wait.

Presentation is loading. Please wait.

Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens.

Similar presentations


Presentation on theme: "Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens."— Presentation transcript:

1 Storing and Retrieving Biological Instances with the Instance Store http://instancestore.man.ac.uk Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens dturi@cs.man.ac.uk, Department of Computer Science, University of Manchester, Oxford Road, Manchester, UK Abstract: Many in the bioinformatics community have come to understand the benefits of ontologies – models of domains consisting of hierarchies of concepts, their interrelationships and definitions. The most prominent ontology currently used by biologists is the Gene Ontology (GO), which currently consists of over 17,000 terms. In addition, GO has been used to annotate several million different gene products. The GO database (GO DB) can be used to store and search over these annotated gene products using the ontological information in GO to drive these searches. GO is structurally simple, being represented as a directed acyclic graph. Description logics (DLs) are another, highly expressive, form of ontology formalism, and underlie the recently released W3C (World Wide Web Consortium) standard for representing ontologies, OWL-DL. Like the GO DB, the instance store (iS) combines uses a commodity relational database but combines it with a DL reasoner, which enables it to search and retrieve using the rich expressiveness of a DL. While the GO DB performs well with a structural simple ontology such as GO, we believe iS may offer a possible implementation path for searching over richer and more expressive ontological information as it appears. Acknowledgements: The authors acknowledge the contribution of Ian Horrocks, Lei Li and Sean Bechhofer to the design and implementation of iS. DT was supported by the MONET EU project IST-2001-34145. PL was supported by my Grid UK e-Science programme EPSRC GR/R67743. MB was supported by e-Science North-West Centre. Database Reasoner GO GO-Term Associations GOAT: As GO continues to increase in size, users find it increasingly difficult to find the terms they wish to use for annotation. Furthermore, beyond the taxonomic and partonomic hierarchical relationships, there are no constraints within GO that can be used to indicate which terms should or should not be used together in the annotation of a given gene product, potentially resulting in inconsistent or even nonsensical descriptions of gene products. Relying upon a description-logic-based version of GO, mined GO- term-to-GO-term and GO-term-to-gene-product-type associations, and the FaCTreasoner, GOAT aims to guide the user in the annotation of gene products with GO terms by displaying those field values that are appropriate based on previously entered terms. GO-term-to-GO-term associations were mined from the complete version of the Gene Ontology Annotation (GOA) database. We compiled associations of GO terms in the sense that the two terms that make up each associative pair (i.e., a given GO term and (one of) its associated GO term(s)) have been used together as annotating terms in at least one database entry of GOA. The iS system is used to hold these associations: While each concept of our description-logic-based GO represents a GO term, each instance in iS is an association record for a corresponding GO-term concept in the ontology. Each association record refers to its corresponding GO term and to the set of other GO terms with which that term is associated. In the screenshot above, the user has entered “endoribonuclease activity” in the molecular-function field. Upon indicating that she wishes to enter a value for the biological-process field (by pressing the “Add term” button next to the biological-process text area), GOAT dynamically attempts to retrieve the association record for “endoribonuclease activity” from iS. Thus, rather than displaying all terms of the GO biological-process subontology, GOAT only presents the much smaller subset of biological- process terms most likely to be appropriate based on the represented associations to the annotation the user has already entered for this gene product. The screenshots above show a technical user interface to iS that we have created. In the lower left pane of each screen shot is a description expressed in DIG, the XML-based de facto standard to communicate with DL reasoners. OWL-DL descriptions can be mapped to DIG and vice versa. On the right are all the gene products of the GO DB that are instances of corresponding description. Descriptions of arbitrary complexity can be used, fully utilizing the expressivity of OWL-DL. A standard for expressing ontologies, OWL, has recently emerged from the W3C, as part of its vision towards a Semantic Web, in which more of the knowledge of Web pages will be represented in more computationally amenable formalisms. OWL is one of the standard syntaxes for GO/OBO ontologies. OWL-DL has a direct correspondence with a Description Logic, which means a DL reasoner can be used to reason about OWL-DL ontologies and the annotations formed using terms from these ontologies. OWL-DL is very expressive, allowing for boolean operators, arbitrary numbers of link types (properties), transitivitiy (good for anatomical or pathway information), among other constructs. For example, one can express the class of all gene products that either take part in germination or have transporter activity function and are not localized to a chloroplast. We can describe gene products and search over them using these rich descriptions. Instance Store


Download ppt "Storing and Retrieving Biological Instances with the Instance Store Daniele Turi, Phillip Lord, Michael Bada, Robert Stevens."

Similar presentations


Ads by Google