Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Annotation BCB 660 October 20, 2011. From Carson Holt.

Similar presentations


Presentation on theme: "Genome Annotation BCB 660 October 20, 2011. From Carson Holt."— Presentation transcript:

1 Genome Annotation BCB 660 October 20, 2011

2 From Carson Holt

3 Annotations Automated – Ab initio (based on genomic sequence alone) Involves comparisons to known proteins (BLAST similarity) Sequence motifs such as start/stop codons, intron/exon boundaries – Evidence-based (ESTs) Involves alignment of experimental EST (cDNA) data to a gene prediction Manual – Manual curation of genes predicted automatically – Check gene structure, presence of conserved domains, match of ESTs to gene prediction – Align to related genes/proteins and look for oddities (missing exons, early stop codons, etc). – Annotation can then be manually edited – May also involve assigning function (based on sequence similarity, conserved domains) via Gene Ontology Structural: exons, introns, UTRs, splice forms etc. Functional: process a gene is involved in (metabolism), molecular function (hydrolase), location of expression (expressed in the mitochondria), etc.

4 Classic strategy Combine ab initio and evidence-based gene predictors together to come up with a concensus predicted gene set Ask community to pitch in and manually annotate as many genes as possible Leads to great variability in quality of different genome annotations, often many versions of official gene sets

5 NGS and the future of genome annotation In 2010, 1300 eukaryotic genome projects were underway -- assuming 10,000 genes per genome, that’s 13,000,000 new annotations will be needed -- quality control and maintenance become an issue Some organizations dedicated to genome annotation (i.e ENSEMBL and VectorBase) but 1300 genomes will not be feasible Need for high quality, automated annotation pipelines, that are easy to use by small research groups without extensive bioinformatics expertise

6 MAKER Pipeline: Especially effective for Emerging Eukaryote Model Organisms Incorporates ab initio and evidence-based gene predictors Gene predictions are run a first time Then a small subset of the genome assembly is used to train gene predictors (building genome- specific HMMs) Then trained gene predictors are run again on whole genome ** Really nice if you don’t have a basis to start from (e.g. de novo gene prediction)

7 What does MAKER do? * Identifies and masks out repeat elements * Aligns ESTs to the genome * Aligns proteins to the genome * Produces ab initio gene predictions * Synthesizes these data into final annotations * Produces evidence-based quality values for downstream annotation management

8 MAKER Steps involved 1. Compute phase – RepeatMasker – BLAST – Exonerate – SNAP (and other gene predictors) 2. Filter/cluster phase – Identify/remove marginal predictions and alignments based on quality scores/cutoffs, etc – Cluster to identify overlapping alignments/predictions– to remove redundancy and assess weight of evidence 3. Polish – Realigns BLAST hits to obtain greater precision at exon boundaries (Exonerate) 4. Synthesis – Collect evidence for each annotation, using EST evidence – Evidences scores plus sequences (genomic, EST, coding, intron) passed to SNAP – SNAP then uses this evidence to retrain and alter its internal HMM 5. Annotate – Post-processing of SNAP prediction, recombine with evidence to generate complete annotations – Output is a gff3 annotation that can be imported into genome browsers

9 Inputs to MAKER Genomic sequence Config files – External executables – Sequence database locations – Compute parameters Sequence database files (choice of these turns out to be extremely important) – Transposons file (default plus known organism-specific) – Repeatmasker database file (organism-specific, optionsal) – Proteins file (known proteins from related organisms you want to align to the genome) – ESTs/mRNAs file (the evidence)

10 MAKER Output (Apollo browser)


Download ppt "Genome Annotation BCB 660 October 20, 2011. From Carson Holt."

Similar presentations


Ads by Google