Presentation is loading. Please wait.

Presentation is loading. Please wait.

SO meets RNAO Karen Eilbeck University of Utah RNAO Consortium Meeting May 28-29 2007.

Similar presentations


Presentation on theme: "SO meets RNAO Karen Eilbeck University of Utah RNAO Consortium Meeting May 28-29 2007."— Presentation transcript:

1 SO meets RNAO Karen Eilbeck University of Utah RNAO Consortium Meeting May 28-29 2007

2 What SO is. How SO is used How SO is managed Where do SO and RNAO meet How SO and RNAO can work together If we have time - a demo of OBO-Edit

3 The Sequence Ontology describes the features of biological sequence Genome sequence Annotation of regions Coordinates Need to agree on meaning of terms. E.g. Does the CDS include the stop codon?

4 An annotation captures what we know about a gene Annotations evidence 3 Alternate transcripts of Glut1 gene 5’ UTR Start codon Coding exon Transposon within intron

5 Structure of the ontology SO is structured into a directed acyclic graph. transcript exon processed transcript primary transcript intron clip splice site polyA site protein coding primary transcript nc primary transcript mRNAncRNA CDS UTR P P P P P P P i P i i i i five_prime_UTR three_prime_UTR i i tRNA rRNA i i i d

6 GFF3 SO is used to ‘type’ the features and relationships. Id type start end strand attributes ctg123. gene 1000 9000. +. ID=gene00001;Name=EDEN ctg123. TF_binding_site 1000 1012. +. ID=tfbs00001;Parent=gene00001 ctg123. mRNA 1050 9000. +. ID=mRNA00001;Parent=gene00001;Name=EDEN.1 ctg123. mRNA 1050 9000. +. ID=mRNA00002;Parent=gene00001;Name=EDEN.2 ctg123. mRNA 1300 9000. +. ID=mRNA00003;Parent=gene00001;Name=EDEN.3 ctg123. exon 1300 1500. +. ID=exon00001;Parent=mRNA00003 ctg123. exon 1050 1500. +. ID=exon00002;Parent=mRNA00001,mRNA00002 ctg123. exon 3000 3902. +. ID=exon00003;Parent=mRNA00001,mRNA00003 ctg123. exon 5000 5500. +. ID=exon00004;Parent=mRNA00001,mRNA00002,mRNA00003 ctg123. exon 7000 9000. +. ID=exon00005;Parent=mRNA00001,mRNA00002,mRNA00003 relationships terms

7 Why we made SO Standardize vocabulary used in genomics. Clarify the relationships between the terms. Make genomics data more computable by adding semantics to the sequence. Its not just about sequence similarity.

8 What is the scope of SO? Features that can be located on a sequence with coordinates. exon, promoter, binding_site Properties of these features: –Sequence attributes Maternally_imprinted –Consequences of mutation mutation_affecting_editing –Chromosome variation aneuploid

9 The SO community Model Organism DB –SGD –(MGI) –FlyBase –WormBase –DictyBase –Pombe GMOD Comparative genomics MGED Ontology NLP

10 Genome annotation unification The model organism databases use SO to type their features. The GFF3 file format for annotation, the Chado db schema and DAS2 annotation protocol rely on SO to type features.

11 Genomic analysis The Comparative Genomics Library written in Perl uses SO based annotations to perform complex analysis over multiple genomes. –Yandell M, Mungall CJ, Smith C, Prochnik S, Kaminker J, Hartzell G, Lewis S, Rubin GM. 2006. Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes. PLoS Comput Biol. 2:e15

12 Genome data integration Multiple genomes are organized using SO: –Flymine, –Gramene, –the BRCs

13 NLP/text mining Recently SO have been used for some new projects - –Semantic enrichment by the Royal Society of Chemistry. –Anaphora resolution by the NLIP group in Cambridge.

14 How SO is managed SO uses CVS to manage and version the ontology. There is a mailing list for developers to get things off their chest. There is a tracker for term suggestions There are workshops when we get a critical mass for a given problem. We want to do more workshops. SO is expressed in OBO format.

15 Example of OBO format http://www.geneontology.org/GO.format.obo- 1_2.shtmlhttp://www.geneontology.org/GO.format.obo- 1_2.shtml [Term] id: SO:0000587 name: group_I_intron def: "Group I catalytic introns are large self-splicing ribozymes. They catalyse their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms. The core secondary structure consists of 9 paired regions (P1- P9). These fold to essentially two domains, the P4-P6 domain (formed from the stacking of P5, P4, P6 and P6a helices) and the P3-P9 domain (formed from the P8, P3, P7 and P9 helices). Group I catalytic introns often have long ORFs inserted in loop regions." [http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00028] subset: SOFA is_a: SO:0000188 ! intron

16 OBO and OWL http://purl.org/obo/owl/SO Mapping OBO and OWL http://www.bioontology.org/wiki/index.ph p/OboInOwl:Main_Page http://www.bioontology.org/wiki/index.ph p/OboInOwl:Main_Page

17 Navigate SO using OBO-Edit Structure of the ontology Search the ontology Details for selected term All parents of the term

18 Annotating with SO and RNAO The nanos translational control element represses translation in somatic cells by a Bearded box-like motif. ・ Duchow HK, Brechbiel JL, Chatterjee S, Gavis ER. Developmental Biology Volume 282, Issue 1, 1 June 2005, Pages 207-217 AGAGGGCGAATCCAGCTCTGGAGCAGAGGCTCTGGCAGCTTTTGCAGCGT TTATATAACATGAAATATATATACGCATTCCGATCAAAGCTGGGTTAACCAG ATAGATAGATAGTAACGTTTAAATAGCGCCTGGCGCGTTCGATTTTAAAGA GATTTAGAGCGTTATCCCGTGCCTATAGATCTTATAGTATAGACAACGAAC GATCACTCAAATCCAAGTCAATAATTCAAGAATTTATGTCTGTTTCTGTGAA AGGGAAACTAATTTTGTTAAAGAAGACTTACAATATCGTAATACTTGTTCAA TCGTCGTGGCCGATAGAAATATCTTACAATCCGAAAGTTGATGAATGGAAT TGGTCTGCAACTGGTCGCCTTCATTTCGTAAAATGTTCGCTTGCGGCCGAA AAATTTCGATATATCTACAATTGATCTACAATCTTTACTAAATTTTGAAAAAG GAACACTTTGAATTTCGAACTGTCAATCGTATCATTAGAATTTAATCTAAATT TAAATCTTGCTAAAGGAAATAGCAAGGAACACTTTCGTCGTCGGCTACGCA TTCATTGTAAAATTTTAAATTTTGACATTCCGCACTTTTTGATAGATAAGCGA AGAGTATTTTTATTACATGTATCGCAAGTATTCATTTCAACACACATATCTAT ATATATATATATATATATATATATATATATATATATATATGTTATATATTTATTC AATTTTGTTTACCATTGATCAATTTTTCACACATGAAACAACCGCCAGCATT ATATAATTTTTTTATTTTTTTAAAAAATGTGTACACATATTCTGAAAATGAAAA ATTCAATGGCTCGAGTGCCAAATAAAGAAATGGTTACAATTTAAGG Translational control element

19 Overlap with RNAO SO provides regions of sequence - start and stop coordinates with regards to the whole sequence - i.e. assembly / chromosome –Transcripts and parts of transcripts –Some secondary structure –Some motifs –Results of algorithms such as blast

20 SO names features

21 Secondary structure This part of SO needs work. Any volunteers?

22 Divergent from RNAO Where do SO and RNAO differ dramatically? –Multiple sequence alignments. SO does not provide a solution to this. It does however provide the terms to describe the results of sequence similarity searches. –Numerical results. SO has not needed to use values so far.

23 RNAO working groups Motif identification/annotation RNA interaction Biochemical-structure mapping Multiple sequence alignment Backbone conformation Base stacking

24 Working together Remain 2 separate ontologies. Give SO annotators option of ‘importing’ RNAO terms using the OBO programs SO and RNAO work together to align key terms in their ontologies.

25 SO is still evolving RNAO could use the SO features to describe regions of sequence SO could reference RNAO for detailed annotation of structure and biochemical features.

26 Multiple ontologies in OBO 2 options. 1.The ontologies reference each other: Will always need to load both ontologies 2.There is a mapping file that you can load to import external terms. Maintain separate ontologies and keep mapping up to date. http://obofoundry.org/wiki/index.php/Mappings

27 Example: Importing terms from SCOR. 1. Made an OBO file from a subset of SCOR terms 2. Work out where there is overlap 3. Make OBO mapping file between the two ontologies 4. Load all 3 files at once.

28 format-version: 1.2 date: 16:05:2007 15:26 saved-by: kareneilbeck auto-generated-by: OBO-Edit 1.100 [Term] id: SC:0000000 name: hairpin_loop [Term] id: SC:0000001 name: diloop is_a: SC:0000000 ! hairpin_loop [Term] id: SC:0000002 name: triloop is_a: SC:0000000 ! hairpin_loop … format-version: 1.2 date: 24:05:2007 10:37 saved-by: kareneilbeck import: so-xp.obo import: scor2.obo id: SC:0000015 hairpin loop is_a: SO:0000715 is_a RNA motif id: SC:0000016internal loop is_a: SO:0000715 is_a RNA motif id: SC:0000035tertiary interaction is_a: SO:0000122 is_a RNA sequence secondary structure scor.obomapping file

29 OBO-Edit DEMO Fingers crossed…

30 Possible action items A SO-RNAO mailing list for discussion of collaboration Phone/skype/webinars at intervals to keep track of progress.

31 GFF3 http://www.sequenceontology.org/gff3.shtml Apollo http://www.fruitfly.org/annot/apollo/ SO http://www.sequenceontology.org OBO-Edit http://sourceforge.net/projects/geneontology OBO foundry http://www.obofoundry.org GO-perl http://www.godatabase.org/dev/go-perl/doc/go-perl- doc.html Resources

32 Acknowledgements SO is funded as part fo the Gene Ontology Consortium, via the NIH P41- HG002274 People: –Suzi Lewis and Michael Ashburner - the vision –Chris Mungall - programming infrastructure –John Richter - made OBO-Edit

33 keilbeck@genetics.utah.edu


Download ppt "SO meets RNAO Karen Eilbeck University of Utah RNAO Consortium Meeting May 28-29 2007."

Similar presentations


Ads by Google