Basics of Genome Annotation Daniel Standage Biology Department Indiana University.

Slides:



Advertisements
Similar presentations
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Ab initio gene prediction Genome 559, Winter 2011.
Basics of Comparative Genomics Dr G. P. S. Raghava.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
Finding approximate palindromes in genomic sequences.
Gene Finding Charles Yan.
Comparative ab initio prediction of gene structures using pair HMMs
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Eukaryotic Gene Finding
Lecture 12 Splicing and gene prediction in eukaryotes
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Gene Structure and Identification
Using DNA Subway in the Classroom Red Line Lesson Sketch.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar.
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
The iPlant Collaborative
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Genome Annotation Rosana O. Babu.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
From Genomes to Genes Rui Alves.
Introduction to ab initio and evidence-based gene finding Wilson Leung08/2015.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Gene Prediction Chengwei Luo, Amanda McCook, Nadeem Bulsara, Phillip Lee, Neha Gupta, and Divya Anjan Kumar.
Introduction to RNAseq
Bioinformatics and Computational Biology
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Motif Search and RNA Structure Prediction Lesson 9.
Applied Bioinformatics
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
A knowledge-based approach to integrated genome annotation Michael Brent Washington University.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
Canadian Bioinformatics Workshops
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
bacteria and eukaryotes
Annotating The data.
Genome Annotation (protein coding genes)
EGASP 2005 Evaluation Protocol
What is a Hidden Markov Model?
VectorBase genome annotation
Daphnia Genome Preview at wFleaBase.org
EGASP 2005 Evaluation Protocol
Basics of Comparative Genomics
PlantGDB: Annotation Principles & Procedures
Ab initio gene prediction
Genome Annotation w/ MAKER
Introduction to Bioinformatics II
Follow-up from last night: XSEDE credits
Basics of Comparative Genomics
Presentation transcript:

Basics of Genome Annotation Daniel Standage Biology Department Indiana University

An-no-ta-tion \ ˌ a-nə- ˈ tā-shən\ 1.A critical or explanatory note or body of notes added to a text 2.The act of annotating

Genome annotation

 Information itself (e.g., this gene encodes a cytochrome P450 protein, with exons at…)  Annotation process (operational definition)  Data management  formatting  storage  distribution  representation

Methods for gene finding  Ab initio gene prediction  Gene prediction by spliced alignment

Ab initio gene prediction  Ab initio: “from first principles”  Requires only a genomic sequence  Uses statistical model of genome composition to identify most probable location of start/stop codons, splice sites  Popular implementations  Augustus  GeneMark  SNAP

Ab initio gene prediction

Prediction by spliced alignment  Utilizes experimental (transcript) and/or homology (reference proteins) data  Spliced alignment of sequences reveals gene structure  matches = exons  gaps = introns  Popular implementations  GeneSeqer  Exonerate  GenomeThreader

Comparison of prediction methods Ab initio Spliced alignment Do not require extrinsic evidence Requires transcript and/or protein sequences Does not benefit from additional transcript data Accuracy improves with additional transcript data More likely to recover complete gene structures More likely to recover accurate internal exon/intron structure

Issues with gene prediction  Accuracy (best methods achieve ≈80% at exon level)  Parameters matter (species-specific codon usage)  Comparison and assessment

Recurring theme in genomics  Once I have a result, how to I assess its reliability?  How do I compare it to alternative results?

Recurring theme in genomics "Why, when you only had one result, did you think that was the correct one?"

Manual annotation  Visually inspect gene predictions, spliced alignments  Determine reliable consensus gene structure  Available software  Apollo:  yrGATE:

“Combiner” tools  Maker:  EVidenceModeler:

Evaluating annotations  Comparison  ParsEval 1 :  Quality assessment  Annotation Edit Distance 2 (Maker)  GAEVAL (PlantGDB) 1 Standage and Brendel (2012) BMC Bioinformatics, 13 : Eilbeck et al (2009) BMC Bioinformatics, 10 :67.

Recommendations / Considerations  Automated annotation  Manual refinement  Assessment and filtering for particular analyses  Be very skeptical  Remember: no “one true” assembly / annotation

xGDBvm  Pre-installed on iPlant cloud (free for academics!)  Search for xGDBvm image  Includes an EVM pipeline for automated annotation  Includes yrGATE for manual annotation  Visualization, search, access control  More info:

xGDBvm demo Polistes dominula example