Presentation is loading. Please wait.

Presentation is loading. Please wait.

BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.

Similar presentations


Presentation on theme: "BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli."— Presentation transcript:

1 BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli Dong, Paul Lu, Duane Szafron, Russ Greiner, and David S. Wishart ‡ Departments of Computing Science and Biological Sciences University of Alberta Edmonton AB T6E 2E9 † gary.vandomselaar@ualberta.ca ‡ david.wishart@ualberta.ca Abstract BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal, plasmid, and contig) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual and hyperlinked image output. BASys uses more than 30 programs to determine nearly 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3-D structure, reactions, and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a standard SwissProt entry. BASys also generates colourful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in approximately 24 hours for an average bacterial chromosome (5 Megabases). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. The BASys server and databases can also be downloaded and run locally. BASys is accessible at: http://wishart.biology.ualberta.ca/basys Abstract BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal, plasmid, and contig) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual and hyperlinked image output. BASys uses more than 30 programs to determine nearly 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3-D structure, reactions, and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a standard SwissProt entry. BASys also generates colourful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in approximately 24 hours for an average bacterial chromosome (5 Megabases). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. The BASys server and databases can also be downloaded and run locally. BASys is accessible at: http://wishart.biology.ualberta.ca/basys Genomic Sequence Data Genomic Sequence Data (Optional) Gene Identification Data (Optional) Gene Identification Data Head Node SWISSPRO T CCDB Reference DB Similarity Search Data Submission BASys supplies a web form for uploading chromosome, plasmid, or contig sequence data. Optional gene identification data can be provided, or BASys can predict protein coding regions from the genomic data using Glimmer [1]. Data Submission BASys supplies a web form for uploading chromosome, plasmid, or contig sequence data. Optional gene identification data can be provided, or BASys can predict protein coding regions from the genomic data using Glimmer [1]. E. coli D. melanogaster H. sapiens C. elegans S. cerevisiae Model Organism Similarity Search Compute Node KEGG Metabolite Analysis Sequence Analysis Pfam PROSITE PredictSPTM etc. Data Scheduling BASys is implemented as a distributed system. The head node monitors and manages the job scheduling. Annotation and report generation are carried out by the compute nodes. Data Scheduling BASys is implemented as a distributed system. The head node monitors and manages the job scheduling. Annotation and report generation are carried out by the compute nodes. Annotation Reports BASys uses CGView [3] to generate clickable genome maps for navigating the genome data. An HTML-formatted tabular summary is also provided. The genome maps are prerendered as a series of hyperlinked PNG image files. Each gene label is hyperlinked to its corresponding HTML-formatted “gene card”. The card is hyperlinked where applicable to external references. Text-only gene cards are also provided. BASys also supplies an 'evidence card' describing how each annotation was generated. The gene cards, evidence cards, and graphical genome maps are downloadable for offline viewing. Annotation Reports BASys uses CGView [3] to generate clickable genome maps for navigating the genome data. An HTML-formatted tabular summary is also provided. The genome maps are prerendered as a series of hyperlinked PNG image files. Each gene label is hyperlinked to its corresponding HTML-formatted “gene card”. The card is hyperlinked where applicable to external references. Text-only gene cards are also provided. BASys also supplies an 'evidence card' describing how each annotation was generated. The gene cards, evidence cards, and graphical genome maps are downloadable for offline viewing. References 1.Delcher AL et al. (1999) Nucleic Acid Res. 27:4636-41. 2. Ilioupoulos I et al. (2003) Bioinformatics 19:717- 26. 3.Stothard P. and Wishart DS (2005) Bioinformatics 21:537-39. References 1.Delcher AL et al. (1999) Nucleic Acid Res. 27:4636-41. 2. Ilioupoulos I et al. (2003) Bioinformatics 19:717- 26. 3.Stothard P. and Wishart DS (2005) Bioinformatics 21:537-39. Report Generation CGview Annotation Reports Annotation Reports Search Capability BASys supports online keyword searches and sequence similarity searches Search results contain hyperlinks to their gene cards and graphical genome maps. Search Capability BASys supports online keyword searches and sequence similarity searches Search results contain hyperlinks to their gene cards and graphical genome maps. BASys Annotation Pipeline The BASys annotation engine combines database comparison and computational sequence analysis in its annotation pipeline. Translated coding sequences are initially compared using BLAST to the expertly annotated reference databases UniProt and the CyberCell comprehensive molecular database on Escherichia coli. The similarity score between the query and database sequence is compared to the threshold value for each annotation type and qualifying annotations are transitively applied to the query sequence. BASys attempts to fill the remaining annotations with additional similarity searches and sequence analyses. BLAST searches are conducted against the protein sequences of C. elegans, human, yeast, and Drosophila; a non- redundant database of bacterial protein sequences, the PDB, KEGG, and COG databases. Various sequence analyses are also performed including Pfam, PROSITE, signal peptide and transmembrane domain predictions, and predicted secondary structure with PSIPRED. If the sequence has sufficient similarity to a sequence represented in the PDB database, then BASys may use HOMODELLER to generate a homology model and subsequently perform a structural analysis using VADAR. Several additional annotations, such as protein molecular weight, isoelectric point, and operon structure are calculated directly from the chromosomal, protein-coding nucleotide, and translated protein sequence data. In all collection of nearly 60 distinct annotations is generated for each gene. BASys Annotation Pipeline The BASys annotation engine combines database comparison and computational sequence analysis in its annotation pipeline. Translated coding sequences are initially compared using BLAST to the expertly annotated reference databases UniProt and the CyberCell comprehensive molecular database on Escherichia coli. The similarity score between the query and database sequence is compared to the threshold value for each annotation type and qualifying annotations are transitively applied to the query sequence. BASys attempts to fill the remaining annotations with additional similarity searches and sequence analyses. BLAST searches are conducted against the protein sequences of C. elegans, human, yeast, and Drosophila; a non- redundant database of bacterial protein sequences, the PDB, KEGG, and COG databases. Various sequence analyses are also performed including Pfam, PROSITE, signal peptide and transmembrane domain predictions, and predicted secondary structure with PSIPRED. If the sequence has sufficient similarity to a sequence represented in the PDB database, then BASys may use HOMODELLER to generate a homology model and subsequently perform a structural analysis using VADAR. Several additional annotations, such as protein molecular weight, isoelectric point, and operon structure are calculated directly from the chromosomal, protein-coding nucleotide, and translated protein sequence data. In all collection of nearly 60 distinct annotations is generated for each gene. Validation BASys annotations were compared to a set of expertly annotated proteins from C. trachomatis [2]. BASys annotations agreed with the expert annotations 762 times out of 894. The sensitivity is 94% ; the specificity is 73%. Validation BASys annotations were compared to a set of expertly annotated proteins from C. trachomatis [2]. BASys annotations agreed with the expert annotations 762 times out of 894. The sensitivity is 94% ; the specificity is 73%. Structure Analysis Homodeller VADAR PDB


Download ppt "BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli."

Similar presentations


Ads by Google