Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Similar presentations


Presentation on theme: "Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,"— Presentation transcript:

1 Annotating genomes using MAKER-P and iPlant

2 What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns, UTRs, splice forms etc. –Coding & non-coding genes –Expression, repeats, transposons Annotations should include evidence trail –Assists in quality control of genome annotations Examples of evidence supporting a structural annotation: –Ab initio gene predictions –ESTs –Protein homology

3 Secondary Annotation Protein Domains – InterPro Scan: combines many HMM databases GO and other ontologies Pathway mapping – E.g. BioCyc Pathway tools

4 Challenges in Plant Genome Annotation Genomes are BIG Highly repetitive Many pseudogenes Assembly contamination Incomplete evidence No method is 100% accurate

5 Options for Protein-coding Gene Annotation Yandell & Ence. Nature Reviews Genetics 13, 329-342 (May 2012) | doi:10.1038/nrg3174

6 Typical Annotation Pipeline Contamination screening Repeat/TE masking Ab initio prediction Evidence alignment (cDNA, EST, RNA-seq, protein) Evidence-driven prediction Chooser/combiner Evaluation/filtering Manual curation

7 MAKER-P Automated Pipeline Ab initio prediction Evidence MPI-enabled to allow parallel operation on large compute clusters Collaboration with Yandell Lab Repeat Library

8 What is a GFF File? Generic Feature Format

9 W559 - Annotation of the Lobolly Pine Megagenome—Jill Wegrzyn – 20.15 Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours P157 - Disease Resistance Gene Analysis on Chromosome 11 Across Ten Oryza Species – 10 rice species (each w/12 chromosome pseudomolecules) – 96 CPU per chromosome (1152 CPU total) ~ 2hr per genome 9 22,656 CPU cores on1,888 nodes GenomeAssembly Size (Mb) CPU Run Time Arabidopsis thalianaTAIR101206002:44 Arabidopsis thalianaTAIR1012015001:27 Zea maysRefGen_v2206721722:53 TACC Lonestar Supercomputer Campbell et al. Plant Physiology. December 4, 2013, DOI:10.1104/pp.113.230144 PAG 2014: MAKER-P at iPlant

10 Virtual image MPI-enabled for parallel computing Check out with up to 16 CPU Tested with 4 CPU instance – Completed rice chr 1 in 8 hr 45 min 10 Atmosphere: MAKER_2.28 (emi-F13821D0)

11 MAKER-P Tutorial https://pods.iplantcollaborative.org/wiki/display/sciplant/M AKER-P+Atmosphere+Tutorial

12

13

14

15 Documentation and Help

16 Additional MAKER-P Resources MAKER-P: http://www.yandell- lab.org/software/maker-p.htmlhttp://www.yandell- lab.org/software/maker-p.html Repeat Library construction: http://weatherby.genetics.utah.edu/MAKER/ wiki/index.php/Repeat_Library_Construction-- Advanced http://weatherby.genetics.utah.edu/MAKER/ wiki/index.php/Repeat_Library_Construction-- Advanced Pseudogene identification: http://shiulab.plantbiology.msu.edu/wiki/inde x.php/Protocol:Pseudogene http://shiulab.plantbiology.msu.edu/wiki/inde x.php/Protocol:Pseudogene


Download ppt "Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,"

Similar presentations


Ads by Google