Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.

Similar presentations


Presentation on theme: "Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release."— Presentation transcript:

1 Arabidopsis Genome Annotation TAIR7 Release

2 Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release data  Preview of next release (TAIR8)

3 Overview of releases to date 26,819 protein coding genes 3,866 alternatively spliced

4 146 bp 268 bp 165 bp 233 bp Avg 5’ UTR Avg Exon Avg Intron Avg 3’ UTR 2221 bp long 1.16 splice variants per locus Average gene in TAIR7 release

5 What was done for TAIR7  681 new loci, 1774 new gene models  211 Cysteine-rich peptides (CRPs) K. Silverstein, Univ. of Minnesota  71 MicroRNAs Matt Jones-Rhoades, MIT/miRBASE  34 merges, 41 splits, 47 obsolete loci  797 models with CDS updates  10,792 models with UTR updates  One third of all TAIR6 loci (10,098 loci) were updated for TAIR7

6 TAIR6 vs TAIR7 Release All nuclear: 31,762 All genes: 32,041

7 Annotation pipeline and strategy Gene updates New Arabidopsis cDNAs/ESTs incorporated via automated pipeline (PASA)  Result: 1717 non-UTR updates Community updates (affecting 330 genes) Manual curation to identify potential errors (targeted approach) ~10% loci examined manually

8 Specific problems targeted  Small introns (65), long introns (89)  AT-AC splicing (55)  UTR errors (1098)  ncRNAs and small proteins (251)

9 AT-AC splicing genes  55 Gene models updated TAIR6 Model AT-AC splice junction

10 Manual updates – UTRs  UTRs overextended Identified 1051 gene pairs 909 loci updated Incorrectly extended by ESTs

11 ncRNAs & small proteins  cDNAa not represented in TAIR6 gene set  1260 cDNAs do not map to TAIR6 annotation (385 splice)  947 separate cDNA clusters (“Loci”) (291 splice)  251 new loci added TAIR7 1619 overlapping loci 1459 exon-exon overlaps 127 possible natural antisense genes ncRNA

12 ncRNAs & small proteins  cDNAa not represented in TAIR6 gene set  1260 cDNAs do not map to TAIR6 annotation (385 splice)  947 separate cDNA clusters (“Loci”) (291 splice)  251 new loci added TAIR7 Small protein

13 Computational descriptions  Updated all computational descriptions ANAC001 (Arabidopsis NAC domain containing protein 1); transcription factor; similar to ANAC069 (Arabidopsis NAC domain containing protein 69), transcription factor [Arabidopsis thaliana] (TAIR:AT4G01550.1); similar to putative NAC2 protein [Oryza sativa (japonica cultivar-group)] (GB:BAD09612.1); contains InterPro domain No apical meristem (NAM) protein; (InterPro:IPR003441).  ~4000 loci have similarity only to uncharacterised proteins (i.e. hypothetical, predicted, unknown etc).  758 have no significant protein similarity to Genbank proteins  286 also have no supporting EST/cDNA evidence

14 TAIR7 Summary  Chromosome sequence not changed  681 new loci  10,098 loci updated  ~10% loci manually examined

15 Where to find TAIR7 data  TAIR: Genome Annotation Portal Bulk Download Tool (Sequences) SeqViewer (genome browser) FTP site  NCBI genomes section

16 Genome Annotation Portal

17

18

19 SeqViewer (Genome Browser)

20 FTP download whole datasets

21

22  Genome assembly updates  Annotation maintenance Correct structural errors New transcript data Community submissions  Missing genes and splice variants  Improved transposon annotation Preview of TAIR8 release

23 Missing genes and splice variants  Continued identification of missing genes  Alternative splicing  8,264 alternative splicing events affecting 4,707 genes, (Brendel V et. al. Proc Natl Acad Sci 2006)  16,252 events in 11665 models affecting 5,313 genes, (Buell 2006 Genomics)  TAIR7 alternative splicing giving 8844 models affecting 3866 genes  Retained introns ~48% of alternatively spliced genes/loci

24  Continued identification of missing genes  Alternative splicing  8,264 alternative splicing events affecting 4,707 genes, (Brendel V et. al. Proc Natl Acad Sci 2006)  16,252 events in 11665 models affecting 5,313 genes, (Buell 2006 Genomics)  TAIR7 alternative splicing giving 8844 models affecting 3866 genes  Retained introns ~48% of alternatively spliced genes/loci  30% of time shorter splice variant prevalent Missing genes and splice variants A A B B C C

25 Transposons and pseudogenes  3889 “pseudogenes”  2490 transposons 1399 pseudogenes  ~100 TEs not currently tagged as pseudo’s  Defined by a single pair of coordinates At3g26295

26 TIGR transposon classification  Searched against a curated database of protein-coding transposon sequences (TIGRs Transposon ORF Collection)  Classified into one of the major classes of transposable elements

27 Who cares about TEs?  Efficient markers in gene tagging and phylogenetic studies.  Similarity with virus replication machinery and transcription factors  Role in heterochromatin formation  Involved in epigenetic gene regulation  Genome annotators

28 Transposon feature annotation  Transposons can contain multiple genes  Four levels of data Genes>Transcripts>Exons>CDS_features  Repeat features Diagram thanks to LBNL

29  Mitochondrial and chloroplast gene reannotation  Comparative analysis using new genome sequences  Improved pseudogene annotation  Guide to supporting evidence for gene structure Beyond TAIR8


Download ppt "Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release."

Similar presentations


Ads by Google