VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX September 2011
VectorBase Scott Emrich (on behalf of VectorBase consortium) University of Notre Dame
VectorBase BRC Meeting September 2011 Upcoming vector genomes NHGRI White papers Sandflies Lutzomyia longipalpis Phlebotomus papatasi Anopheles (AGCC) Anopheles arabiensis Anopheles quadriannulatus Anopheles merus Anopheles melas Anopheles christyl Anopheles epiroticus Anopheles stephensi Anopheles maculatus Anopheles funestus Anopheles minimus Anopheles culicifacies Anopheles farauti Anopheles dirus Anopheles atroparvus Anopheles albimanus Glossina Glossina palpalis Glossina fuscipes Glossina pallidipes Glossina brevipalpis Glossina austeni Stomoxys calcitrans Musca domestica Simulium Simulium vittatum Simulium sirbanum Simulium damnosum Simulium ochraceum Simulium squamosum Simulium thyolense Simulium santipauli Simulium woodi Simulium exiguum Simulium yahense Tick & Mites Leptotrombidium deliense Ixodes scapularis* Dermacentor variabilis Ornithodorus turicata Anopheles Anopheles darlingi* Anopheles stephensi Others Aedes Aedes albopictus Culex cluster? Aedes cluster?...
VectorBase BRC Meeting September 2011 Summary of current contents Genome Gene set Transcriptomic s Gene expression PopGen Aedes aegypti ✓✓✓✓✕ Anopheles gambiae ✓✓✓✓✓ Culex quinquefasciatus ✓✓✕✓✕ Glossina morsitans ✓✓✓✕✕ Ixodes scapularis ✓✓✕✕✕ Pediculus humanus ✓✓✕✕✕ Rhodnius prolixus ✓✓✓✕✕
VectorBase BRC Meeting September 2011 Upcoming challenges We expect to receive over 30 vector genomes in the next 1-2 years Further, our community is generating “-omics” transcriptome data for emerging genomes that need to be integrated To address these issues, we introduced “prerelease” sites
VectorBase BRC Meeting September 2011 Pre-sites for upcoming genomes
VectorBase BRC Meeting September 2011 Pre-sites for upcoming genomes Genome browserBLAST search
VectorBase Supporting species without genomic resources BRC Meeting September 2011
VectorBase RNAseq data Leslie Vosshall, Rockefeller University
VectorBase Integrating experimental data RNA-Seq BRC Meeting September 2011
VectorBase Integrating legacy (BRC#1) annotation data EBI Projection from reference Projection build Aim: Gene prediction using ‘high’ quality reference set from a related species. Overview When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly. This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly. Whole-genome alignment (WGA) between reference and target using BLASTz. Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. Project predictions through transformation of coordinates between reference and target assemblies. Summary Effective for low coverage and poor quality assemblies. Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction. BRC Meeting September 2011
Examples of integrating data Still under active development Currently > 15k samples from 1600 field collections UC-Davis data IR-base data Neafsey et al. SNP-chip data
GMOD natdiv consortium:
GMOD Natural Diversity module Lightweight schema –All objects defined by ontologies General –SO / GO / PATO Spp. specific –IDOMAL / MIRO Flexible –can handle all data from consortium Vector spp. & butterflies Rice & peaches
TGMA – Mosquito Anatomy Ontology; CARO/BFO TADS – Tick Anatomy Ontology; CARO/BFO MIRO – Ontology of Insecticide Resistance IDOMAL – Malaria Ontology; extension: transmission “VBCV” – Ontology/CV for “completion” of PopGen OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al. New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA” TGMA – Mosquito Anatomy Ontology; CARO/BFO TADS – Tick Anatomy Ontology; CARO/BFO MIRO – Ontology of Insecticide Resistance IDOMAL – Malaria Ontology; extension: transmission “VBCV” – Ontology/CV for “completion” of PopGen OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al. New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA” Ontologies hosted by VB
VectorBase Goal: Anopheles gambiae reference Many issues with the PEST assembly as a reference S molecular form is proposed as the next reference Sanger* Illumina † 454 Hybrid assembly strategy Metrics of success Project existing gene predictions de novo prediction in novel regions Re-map important datasets BRC Meeting September 2011
VectorBase Kolymbari Meeting July 2011 Anopheles gambiae reference sequence Validation of the assembly by normal metrics Emphasis on the concordance with large scale restriction map (optical map)
VectorBase BRC Meeting September 2011 Acknowledgements V EMBL-EBI Imperial College Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey Fotis Kafatos Bob MacCallum George Christophides Seth Redmond NoTre Dame HaRvard IMBB New MexicO A Sequencers EnsEmbl Maggie Werner-Washburne Phil Baker Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell Kitsos Louis Pantelis Topalis Emmanuel Dialynas TIGR/JCVI WashU Broad Institute Baylor Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo