Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.

Similar presentations


Presentation on theme: "Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning."— Presentation transcript:

1 Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis- annotated exon. (Yandell and Ence, 2012, Nature Reviews) Automated annotation is often not good enough for genes you really care about!

2

3 Yandell and Ence, 2012, Nature Reviews http://www.yandell- lab.org/publications/pdf/euk_genome_annotation_review.pdf

4 Different lines of evidence go into modern gene annotation pipelines: 1.Computational prediction (Open Reading Frames, etc.) 2.Evidence based prediction (ESTs, RNA-seq, etc) 3.Homology based prediction (BLAST, etc) Synthesized into a consensus gene annotation – still may be wrong!

5 Bees (Order Hymenoptera, Family Apidae) Western Honey Bee (Apis mellifera) Common Eastern Bumble Bee (Bombus impatiens) Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee (Apis florea)

6 NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH cytochrome P450 monooxygenase enzymes classification:CYP 3 A 4 family >40% amino acid sequence- homology sub-family >55% amino acid sequence- homology isoenzyme *15 A-B allele

7 Chemical signalling??? (pheromone synthesis and breakdown) Detoxication (toxin and pesticide metabolism) Hormone synthesis (highly conserved orthologs) + Detoxication

8

9

10 Repeats

11 Intron splice sites are highly conserved

12 P450s: ~ 500 amino acids (1500 nucleotides) Highly conserved heme-binding site (cysteine)

13 Basic Annotation Rules CDS Start Amino acid M Nucleotide ATG CDS Stop * Amino Acid TAA/TAG/TAG Nucleotide Translation Frames Frame 1 Frame 2 Frame 3

14 http://en.wikipedia.org/wiki/File:Exon_and_Intron_classes.png http://doc.goldenhelix.com/SVS/latest/_images/splice_site_diagram.png Intron splice sites GT-AG

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33 “(\w)” “\1 “

34

35 ‘GT’ intron donor site

36

37

38 ‘AG’ intron acceptor site

39 ‘GT’ intron donor site 1 nucelotide “G” for next codon = Phase 1 intron

40 ‘AG’ intron acceptor site 2 nucelotides “AA” before first full codon Combine with “G” on exon 2 Make the codon “GAA” for glutamic acid (E)

41

42 This start looks good!

43

44

45

46

47 Jamboree! Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera) CYP9R1 CYP6AS3 CYP6BD1 CYP6AQ1 CYP4G11 Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism: Apis florea Bombus impatiens Bombus terrestris Megachile rotundata Copy and paste verified amino acid sequences (FASTA formatted) into a text file:

48 Add comments to the header and include a gene identifier Send to me at: johnson.5005@osu.edujohnson.5005@osu.edu Thanks!!


Download ppt "Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning."

Similar presentations


Ads by Google