Presentation on theme: "PENN S TATE Compatible text, visual and mathematical representations for biological process ontologies Nigam Shah Penn State University."— Presentation transcript:
PENN S TATE Compatible text, visual and mathematical representations for biological process ontologies Nigam Shah Penn State University
PENN S TATE Ontologies in Molecular Biology An ontology is a formal way of representing knowledge. –In an ontology, concepts are described both by their meaning and their relationship to each other.* Gene Ontology 43 open ontologies under OBO –First name things … then name relations. If we specify the logic of combining things and relations we can write hypotheses about biological processes in a formal manner & evaluate them for consistency with existing information. * Bard and Rhee, Nature Reviews Genetics, Vol 5, March 2004, pg 213
PENN S TATE Hypotheses and Events An hypothesis about a biological process is a statement about relationships within a biological system. Protein P induces transcription of gene X We define an event as a relationship between two biological entities, which we call agents.
PENN S TATE Testing events Protein P induces transcription of gene X promoter | gene X nucleus P P Implicit claims (that can made explicit) : 1.P is a transcription factor. 2.P is a transcriptional activator. 3.P is localized to the nucleus. 4.P can bind to the promoter of gene X
PENN S TATE Hypothesis Ontology Expressive enough to describe the galactose system at a coarse level of detail. It is compatible with other ontology efforts. –E.g. GO so that GO annotations can be used directly in HyBrow. We have also developed a grammar to write hypotheses using events from this ontology.
PENN S TATE Grammar for a hypothesis A hypothesis consists of at least one event stream An event stream is a sequence of one or more events or event streams with logical joints (or operators) between them. An event has exactly one agent_a, exactly one agent_b and exactly one operator (i.e. a relationship between the two agents). It also has a physical location that denotes where the event happened, the genetic context of the organism and associated experimental perturbations when the event happened. A logical joint is the conjunction between two event streams.
PENN S TATE Making Hypotheses with increasing formality 1. Controlled Vocabulary 2. Formal Language 3. Context-Free Grammar We have developed a formal language & grammar for representing an hypothesis as a sequence of events. We use constraints and rules to decide if an hypothesis is a valid production of the language. The mathematical representation A biological event is any occurrence for which we gather experimental data. Hypotheses make testable statements about combinations of biological events. http://conferences.computer.org/bioinformatics/CSB2003/SectA.html#Poster9
PENN S TATE Constraints and Rules Consistency of an hypothesis with prior knowledge is evaluated by applying constraints and rules. A constraint is a statement specifying the evidence that contradicts or supports an event. A protein must be in the nucleus to bind to a promoter. A rule comprises the steps for deciding whether a constraint is satisfied or violated. Binds_to_promoter [P, g] : Annotation constraints if cellular location of P is not nucleus, give a penalty. if biological process is not transcription, give a penalty.
PENN S TATE Visual language representation Uses a formal Visual Language: 1.Direct composition of hypotheses in a format akin to reaction pathway diagrams 2.Translatable to other representation forms
PENN S TATE Other notations: Cook Notation -- BioDKohn Notation
PENN S TATE Multiple views of the ontology Once we have an ontology for hypotheses … it can be represented as –Text files that users type. –As formal constructs that can be evaluated for validity in a formal manner. –As files that are browsed by using special programs. Having such equivalent formats allows us to perform computer aided hypothesis-evaluation.
PENN S TATE Multiple equivalent representations Biological process described in a formal language ev0 = Gal2p transports galactose in mem in wt ev1 = galactose activate Gal3p in wt in cyt ev2 = Gal3p Binds_to_promoter gal1 in wt in nuc ev3 = Gal3p induce gal1 in presence_of galactose in wt in nuc hy1 = (ev0+ev1) and (ev2+ev3) XML format?