Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) www.yandell-lab.org iPlant: Josh Stein (CSHL) Matt Vaughn.

Similar presentations


Presentation on theme: "Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) www.yandell-lab.org iPlant: Josh Stein (CSHL) Matt Vaughn."— Presentation transcript:

1 Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) www.yandell-lab.org iPlant: Josh Stein (CSHL) Matt Vaughn (TACC) Dian Jiao (TACC) Zhenyuan Lu (CSHL) Nirav Merchant (U. Arizona) Carson Holt (Ontario Institute Cancer Research) Cantarel et al. 2008. Genome Research 18:188 Holt & Yandell. 2011. BMC Bioinformatics 12:491

2 What Are Annotations? Annotations are descriptions of features of the genome Structural: exons, introns, UTRs, splice forms etc. Coding & non-coding genes Functional: enzymatic activity, expression Annotations should include evidence trail Assists in quality control of genome annotations Examples of evidence supporting a structural annotation: Ab initio gene predictions ESTs Protein homology

3 Secondary Annotation Protein Domains InterPro Scan: combines many HMM databases GO and other ontologies Pathway mapping E.g. BioCyc Pathway tools

4 Challenges in Plant Genome Annotation Genomes are BIG Highly repetitive Many pseudogenes Yet it is important to get it right!

5 Contamination Issue

6 Annotation Error Example: split gene models

7 Typical Annotation Pipeline Contamination screening Repeat/TE masking Ab initio prediction Evidence alignment (cDNA, EST, RNA-seq, protein) Evidence-based prediction Combiner Evaluation/filtering Manual curation

8 Options for Protein-coding Gene Annotation

9 MAKER is an easy-to-use annotation pipeline designed to help smaller research groups convert the mountain of genomic data provided by next generation sequencing technologies into a usable resource.

10 MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab-initio gene predictions, automatically synthesizes these data into gene annotations, and produces evidence-based quality values for downstream annotation management

11 Quality Control evaluation of the MAKER-P and TAIR10 datasets using Annotation Edit Distance (AED). Better Quality Worse

12 MAKER-P MPI Support Message Passing Interface (MPI) is a communication protocol for computer clusters which essentially allows multiple computers to act like a single powerful machine.

13 Current evidence Current Assembly Annotating the Genome – Apollo View

14 Current evidence Current Assembly Identify and Mask Repetitive Elements

15 Current evidence Current Assembly Identify and Mask Repetitive Elements RepeatMasker –RepBase –Species specific library RepeatRunner –MAKER internal protein library

16 Current evidence Current Assembly Identify and Mask Repetitive Elements

17 Current evidence Current Assembly Ab initio Predictions Generate Ab Initio Gene Predictions

18 Current evidence Current Assembly Ab initio Predictions Generate Ab Initio Gene Predictions MAKER currently supports: – SNAP – Augustus – GeneMark – FGENESH Can be run internally or externally

19 Current evidence Current Assembly Ab initio Predictions Generate Ab Initio Gene Predictions

20 Current evidence Current Assembly Ab initio Predictions Align EST and Protein Evidence EST TBLASTX EST BLASTN Protein BLASTX

21 Current evidence Current Assembly Ab initio Predictions Align EST and Protein Evidence EST TBLASTX EST BLASTN Protein BLASTX Identify regions being actively transcribed (i.e. EST data) Identify region with homology to a known protein

22 Current evidence Current Assembly Ab initio Predictions Align EST and Protein Evidence EST TBLASTX EST BLASTN Protein BLASTX

23 Polish BLAST Alignments with Exonerate Current evidence Current Assembly Ab initio Predictions Polished protein Polished EST

24 Polish BLAST Alignments with Exonerate Current evidence Current Assembly Ab initio Predictions Polished protein Polished EST All base pairs must aligns in order. No HSP overlap is permitted Aligns HSPs correctly with respect to splice sites.

25 Polish BLAST Alignments with Exonerate Current evidence Current Assembly Ab initio Predictions Polished protein Polished EST

26 Current evidence Current Assembly Ab initio Predictions Hint-based SNAP Hint-based FgenesH Pass Gene Finders Evidence-based ‘hints’

27 Current evidence Current Assembly Ab initio Predictions Hint-based SNAP Hint-based FgenesH * * Quantitative Measures for the Management and Comparison of Annotated Genomes Karen Eilbeck, Barry Moore, Carson Holt and Mark Yandell BMC Bioinformatics 2009 10:67doi:10.1186/1471-2105-10-67 Identify Gene Model Most Consistent with Evidence*

28 Current evidence Current Assembly Ab initio Predictions * Revise it further if necessary; Create New Annotation

29 Compute Support for Each Portion of Gene Model

30 MAKER-P v2.28 at iPlant TACC Lonestar Supercomputer with 22,656 CPU MPI enabled for parallel computation Can complete entire rice genome in ~2 hrs (1,152 cores) 96 CPU per chromosome Can complete Aegilops tauschii ALLPATHS-LG assembly in ~8 hrs (1,152 cores) Currently being integrated into the iPlant Discovery Environment Atmosphere MPI enabled for parallel computation Maximum instance size 16 CPU

31 Assembly & Annotation at iPlant


Download ppt "Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) www.yandell-lab.org iPlant: Josh Stein (CSHL) Matt Vaughn."

Similar presentations


Ads by Google