VectorBase Frank Collins, Scott Emrich, Dan Lawson,Greg Madey BRC PI/PM Meeting Bethesda, MD April 27, 2012.

Slides:



Advertisements
Similar presentations
Model Organism Databases and Community Annotation
Advertisements

Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
1 POPcorn: Project Portal for corn A set of project and sequence-indexed data searching resources ( Jack M. Gardiner Poster.
Visualization and analysis of large data collections: a case study applied to confocal microscopy data Wim de Leeuw, Swammerdam Institute for Life Sciences,
0 - 0.
May 16, 2005Scott Cain, CSHL. May 16, 2005Scott Cain, CSHL gmod update Gmod RC2 last week New for 0.003: –Generic triggers for Apollo –Greatly enhanced.
1 / 30 Data Mining with BioMart
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Differential insertion of transposable elements in Anopheles gambiae M & S genomes Jenica L. Abrudan, Ryan C. Kennedy, Maria F. Unger, Michael R. Olson,
BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Specie: Anopheles gambiae PEST Genome size: 260 Mb Status: 3rd assembly and annotation NIAID funded.
VectorBase BRC VectorBase annotation metrics Daniel Lawson VectorBase-EBI, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton.
November 2007BRC5 Bethesda Variation data in VectorBase Dan Lawson, VectorBase EMBL-EBI.
ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:
Genome Annotation BCB 660 October 20, From Carson Holt.
NGS Analysis Using Galaxy
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
VectorBase Seth Redmond Imperial College, London
Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
New data and tools at TAIR (The Arabidopsis Information Resource)
Annotation of Anopheline Genomes at VectorBase Dan Lawson, VectorBase & The Anopheles Genomes Cluster Consortium EMBL-EBI.
The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
GMOD: Managing Genomic Data from Emerging Model Organisms Dave Clements 1, Hilmar Lapp 1, Brian Osborne 2, Todd J. Vision 1 1 National Evolutionary Synthesis.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Bulk data files // TeraGrid uses for Genome Databases GMOD meet, June 2006 Don Gilbert,
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Developed by James Estill, Dept. of Plant Biology, University of Georgia.
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX September 2011.
Annotating genomes using MAKER-P and iPlant. What Are Annotations? Annotations are descriptions of features of the genome –Structural: exons, introns,
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
August 2008Bioinformatics Tools for Comparative Genomics of Vectors1 Genomes Daniel Lawson EBI.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
VectorBase Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
Overview and History of VectorBase Frank Collins March 31, 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Annotating The data.
VectorBase genome annotation
Gramene Technical Improvements
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Bioinformatics Tools for Comparative Genomics of Vectors
Daphnia Genome Preview at wFleaBase.org
Genome Sequence Annotation Server
Genome Sequence Annotation Server
Gene Annotation with DNA Subway
Genome Annotation w/ MAKER
Cuong Nguyen, Deng Xin, Dongmei, Zheng Wang
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Follow-up from last night: XSEDE credits
Presentation transcript:

VectorBase Frank Collins, Scott Emrich, Dan Lawson,Greg Madey BRC PI/PM Meeting Bethesda, MD April 27, 2012

Genome Sizes Pediculus humanus: ~110 Mb, N50 = 488 kb Anopheles gambiae S: ~260 Mb, N50 = 1,505 kb Culex quinquefasciatus: ~580 Mb, N50 = 487 kb Aedes aegypti: ~1.3 Gb, N50 = 1,500 kb Ixodes scapularis: ~1.8 Gb, N50 = 72 kb

Future genomes 4 White papers Sandflies Lutzomyia longipalpis Phlebotomus papatasi Anopheles (AGCC) Anopheles arabiensis Anopheles quadriannulatus Anopheles merus Anopheles melas Anopheles christyl Anopheles epiroticus Anopheles stephensi Anopheles maculatus Anopheles funestus Anopheles minimus Anopheles culicifacies Anopheles farauti Anopheles dirus Anopheles atroparvus Anopheles albimanus Glossina Glossina palpalis Glossina fuscipes Glossina pallidipes Glossina brevipalpis Glossina austeni Stomoxys calcitrans Musca domestica Simulium Simulium vittatum Simulium sirbanum Simulium damnosum Simulium ochraceum Simulium squamosum Simulium thyolense Simulium santipauli Simulium woodi Simulium exiguum Simulium yahense Tick & Mites Leptotrombidium deliense Ixodes scapularis* Dermacentor variabilis Ornithodorus turicata Anopheles Anopheles darlingi* Anopheles stephensi Others Aedes Aedes albopictus i5K initiative

First New Release in New Contract

Challenges of vector genomes Relatively large, hard to inbreed genomes Heterozygosity in sequencing samples (up to 80 different males were sequenced for the new gambiae genomes) causes dubious scaffolds. Inversions and heterochromatic regions induce gaps Newer generation sequencing has reduced cost but has not yet kept overall quality Non-trivial annotations

An. gambiae forms M-form More permanent Available year-round Allows slower development Predator-rich S-form Ephemeral rainy-season dependent Requires rapid development Largely predator-free

C. Cheng et al, unpublished Divergence across chromosome arms 2L 2R X 3R 3L

Optical mapping DBP : Wisconsin

Size matters GenomeMB optically mappedgenes found S Sanger 145, S Illumina 58, PEST 60, Sanger + Ill 204,

Annotation strategies 13 Speeding up computational annotation Use of MAKER system Prediction by projection from ‘high quality’ reference Expanded use of RNA-Seq Scripture, Trinity & Cufflinks/Bowtie Community engagement Primarily deployed for new genomes (Glossina, Rhodnius) Works for all other VectorBase genomes

14 de novo annotation MAKER with RNA-Seq & reference proteomes Aim: Gene prediction pipeline for the masses. Used for a number of arthropod genome projects Touted as the default pipeline for many more (part of the GMOD toolkit) Overview ab-initio gene predictions from SNAP, Augustus & FGENESH Final gene models from MAKER EST alignments from both EXONERATE and BLASTN Protein alignments from EXONERATE and BLASTX Repeats from RepeatFinder & RepeatMasker Additional data sets integrated via GFF3 files (RNA-Seq) Uses MPI for parallelization over a compute farm Optimization for long scaffolds Summary Iterative runs give acceptable reference gene sets. Used for Glossina and An. stephensi Used by others for Strigamia, Manduca, published ant genomes

15 Community annotation Simple tool to capture community annotation Makes gene prediction and evidence available as GFF3 Compatible with Artemis and Apollo tools Submissions in GFF3 format Gene structure corrections Gene meta data (symbol, description, citations) Glossina annotation effort (Nov 11 – Apr 12) 790 GFF submissions 2670 items of metadata gene symbols, descriptions Structure confirmation

16 ARTEMIS APOLLO scf ptn2genome ptn_match ID=xxxx;Name=tr|Q3UIQ2| scf ptn2genome ptn_match ID=xxxx2;Name=tr|Q3TIU7| scf ptn2genome ptn_match ID=xxxx3;Name=sp|Q91VD9| scf ptn2genome ptn_match ID=xxxx2;Name=tr|Q3VIU732| scf ptn2genome ptn_match ID=xxxx;Name=tr|Q3UIQ2| scf ptn2genome ptn_match ID=xxxx2;Name=tr|Q3TIU7| scf ptn2genome ptn_match ID=xxxx2;Name=tr|Q3VIU732| >MY SUPERCONTIG ATATATGCGTTGAGCTGCGTTACGTTCGG GATGCGTTAGGCTTGTGAGCTGGATCGGT CCTGCCTGCGTCGATATAAACGACCT… Identify gene Modify model Submit CAP GFF3 FASTA

Population biology 17 Chado Natural diversity schema 183 projects, samples incorporates Irbase samples Ensembl variation schema 1,511,335 SNP calls Visualization through browser Data downloads through browser Queries via BioMart interface