Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Gene Structure Annotation Philippe Lamesch International Arabidopsis conference July 23, 2008, Montreal.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Daniel Ence Yandell Lab University of Utah.  Annotations are descriptions of features of the genome  Structural: exons, introns, UTRs, splice forms.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Genome analysis and annotation. Genome Annotation Which sequences code for proteins and structural RNAs ? What is the function of the predicted gene products.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
How to access genomic information using Ensembl August 2005.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Genome Annotation BCB 660 October 20, From Carson Holt.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning.
Assembly & Annotation at iPlant
Customized cloud platform for computing on your terms !
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
I. Introduction and Red Line Education for Data-unlimited Science.
The iPlant Collaborative
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Genome Annotation Rosana O. Babu.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Mark D. Adams Dept. of Genetics 9/10/04
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
The iPlant Collaborative Vision Enable life science researchers and educators to use and extend cyberinfrastructure.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Finding genes in the genome
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Data Demo and MAKER-P.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Daphnia Genome Annotation & Analysis Notes July 2007 Don Gilbert Genome Informatics Lab, Biology Dept., Indiana University
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
IPlant Genomics in Education Workshop Genome Exploration in Your Classroom.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Annotating The data.
Introduction to Genes and Genomes with Ensembl
VectorBase genome annotation
Genome Sequence Annotation Server
Functional Annotation of the Horse Genome
PlantGDB: Annotation Principles & Procedures
Genome Annotation w/ MAKER
Ensembl Genome Repository.
Part II SeqViewer AraCyc Help
Welcome - webinar instructions
Presentation transcript:

Annotating genomes using MAKER-P and iPlant

What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns, UTRs, splice forms etc. –Coding & non-coding genes –Expression, repeats, transposons Annotations should include evidence trail –Assists in quality control of genome annotations Examples of evidence supporting a structural annotation: –Ab initio gene predictions –ESTs –Protein homology

Secondary Annotation Protein Domains – InterPro Scan: combines many HMM databases GO and other ontologies Pathway mapping – E.g. BioCyc Pathway tools

Challenges in Plant Genome Annotation Genomes are BIG Highly repetitive Many pseudogenes Assembly contamination Incomplete evidence No method is 100% accurate

Options for Protein-coding Gene Annotation Yandell & Ence. Nature Reviews Genetics 13, (May 2012) | doi: /nrg3174

Typical Annotation Pipeline Contamination screening Repeat/TE masking Ab initio prediction Evidence alignment (cDNA, EST, RNA-seq, protein) Evidence-driven prediction Chooser/combiner Evaluation/filtering Manual curation

MAKER-P Automated Pipeline Ab initio prediction Evidence MPI-enabled to allow parallel operation on large compute clusters Collaboration with Yandell Lab Repeat Library

What is a GFF File? Generic Feature Format

W559 - Annotation of the Lobolly Pine Megagenome—Jill Wegrzyn – Gb assembly—split into 40 jobs—216 CPU/job (8640 CPU total)—17 hours P157 - Disease Resistance Gene Analysis on Chromosome 11 Across Ten Oryza Species – 10 rice species (each w/12 chromosome pseudomolecules) – 96 CPU per chromosome (1152 CPU total) ~ 2hr per genome 9 22,656 CPU cores on1,888 nodes GenomeAssembly Size (Mb) CPU Run Time Arabidopsis thalianaTAIR :44 Arabidopsis thalianaTAIR :27 Zea maysRefGen_v :53 TACC Lonestar Supercomputer Campbell et al. Plant Physiology. December 4, 2013, DOI: /pp PAG 2014: MAKER-P at iPlant

Virtual image MPI-enabled for parallel computing Check out with up to 16 CPU Tested with 4 CPU instance – Completed rice chr 1 in 8 hr 45 min 10 Atmosphere: MAKER_2.28 (emi-F13821D0)

MAKER-P Tutorial AKER-P+Atmosphere+Tutorial

Documentation and Help

Additional MAKER-P Resources MAKER-P: lab.org/software/maker-p.htmlhttp:// lab.org/software/maker-p.html Repeat Library construction: wiki/index.php/Repeat_Library_Construction-- Advanced wiki/index.php/Repeat_Library_Construction-- Advanced Pseudogene identification: x.php/Protocol:Pseudogene x.php/Protocol:Pseudogene