Presentation is loading. Please wait.

Presentation is loading. Please wait.

Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.

Similar presentations


Presentation on theme: "Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file."— Presentation transcript:

1 Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file

2 FASTA syntax >AMYLASEE1 TGCATNGY FASTA syntax in Backus-Naur notation ::= | ::= “>” ::= { } ::= | ::= | “” Identifier Sequence A very simple FASTA file

3 The syntax rules do not fix meaning >AMYLASEE1 TGCATNGY Do I denote a protein, “Amylase E1”, or a person, “Amy Lasee, I”? What kind of sequence am I? TGCATNGY = Threonine-Glycine-Cysteine-Alanine-Threonine... TGCATNGY = Thymine-Guanosine-Cytosine-Adenine-Thymine... What is the relationship between the identifier and the sequence? Is Amy Lasee the sample donor? The experimenter? The owner? Is Amylase E1 a gene or a protein name? Is it arbitrary? Is it unique?

4 ... but a fixed meaning requires clear syntax >AMYLASEE1 TGCATNGY AMYLASEE1 TGCATNGY So the first step to fixing the semantics is to make the syntax more explicit. XML can help. An xml-ified version of FASTA

5 Extensible syntax >AMYLASEE1|NP_523768|GO_0004556 TGCATNGY AMYLASEE1 NP_523768 TGCATNGY 0004556 Its easy to expand the XML unambiguously to include other elements that may be useful (GO term, GB identifier)...... but this is all just associating strings with other strings.

6 Implicit vs. explicit semantics >AMYLASEE1 TGCATNGY AMYLASEE1 NP_523768 TGCATNGY 0004556 AMYLASEE1 NP_523768 TGCATNGY 0004556 Notice that the semantics are implicit in the tags. To see this, lets replace “go_function” with “string3”. The two versions are syntactically identical. What differentiates them? The answer is that a human expert supplies the semantics by recognizing “go_function” as a reference to Gene Ontology molecular function annotations, whereas “string3” means nothing to us, though it has the same value.

7 Referring to the source AMYLASEE1 TGCATNGY http://purl.org/obo/owl/GO#GO_0004556 So, lets try something new-- we will make a direct informational link to the GO concept. This gives us a human-readable definition that seems to resolve the “amylase vs. Amy Lasee” question, and it gives us machine-accessible relations, e.g., a machine can navigate the GO hierarchy to learn that amylase activity is_a glycogenase activity. But there is still something missing...

8 Making implicit semantics explicit How did we know how to interpret this? To a computer, the tagged values are just three different strings, with no semantics. However, an expert human can supply semantics by combining background knowledge with cues hidden in the tags. In this case, we infer that string3 is an URL with the GO function for this sequence. To specify the same meaning to a computer, we need to make explicit many things: 1.That string3 is operationally a URL (subject to URL protocols) 2.That this URL is the source of a thing that is an ontology concept 3.That the concept (“foo456”) is associated with the entity “foo123” 4.That foo456 is_function_of foo123 (or, foo123 has_function foo456) AMYLASEE1 TGCATNGY http://purl.org/obo/owl/GO#GO_0004556

9 Describing the world with triples In other words, we want to specify a subject-predicate-object triple: Sequence{ name=“amylaseE1” sequence=“TGCATNGY” } alpha amylase activity has_molecular_function subjectobject predicate 2.2.6 Anyone Can Make Statements About Any Resource To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make statements about any resource. In general, it is not assumed that complete information about any resource is available. RDF does not prevent anyone from making assertions that are nonsensical or inconsistent with other statements, or the world as people see it. Designers of applications that use RDF should be aware of this and may design their applications to tolerate incomplete or inconsistent sources of information. From the RDF spec:

10 Specifying an RDF triple <fasta_archive xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:bfo=“http://www.purl.org/obo/owl/BFO#” > AMYLASEE1 TGCATNGY http://purl.org/obo/owl/GO#GO_0004556 The ultra-succinct form of the RDF triple syntax is or in more familiar language we might say

11 Using nexml syntax Not finished. Nexml provides 2 ways to express semantics: 1.Built-in links to CDAO (SAWSDL links in schema) 2.Ad hoc references to external namespaces in

12 Built-in links to CDAO Not finished. Example (“Edge”) of SAWSDL links in schema

13 References to external namespaces using elements Not finished. Will explain some examples from wiki.

14 some things to express (see wiki) Attaching a concept to an element Attaching annotation or an external resource to an element Attaching a concept to an element through a relation Attaching a taxon identifier to an OTU through a relation Identifying specimens within collections Literature References Example 1: associate a reference with a tree (or other) element Example 2: associate a reference with a record Associating an OBO phenotype with a character state

15 What can I do with semantics? Not finished. 1 thing to do is to make semantics clear to human users. Another thing is to make this accessible to computers. What can the computers do? If you have software to read your files and reconstruct the RDF triples as statements in the ontology language, then you can carry out reasoning in the ontology language. Examples (taxonomy; types of chars; anatomical relations of chars)


Download ppt "Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file."

Similar presentations


Ads by Google