VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX 26-27 September 2011.

Slides:



Advertisements
Similar presentations
VectorBase Frank Collins, Scott Emrich, Dan Lawson,Greg Madey BRC PI/PM Meeting Bethesda, MD April 27, 2012.
Advertisements

Towards an ontology of vector- borne diseases: MalIDO, the first step.
Peter Tsai Bioinformatics Institute, University of Auckland
Differential insertion of transposable elements in Anopheles gambiae M & S genomes Jenica L. Abrudan, Ryan C. Kennedy, Maria F. Unger, Michael R. Olson,
Systems Biology Data Dissemination Working Group 25FEB2015.
VectorBase Invertebrate Vectors of Human Pathogens.
Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities By Kevin Chen, Lior Pachter PLoS Computational Biology, 2005 David Kelley.
BRC6 28 th October 2008 Collective annotation of the Ixodes scapularis genome: VectorBase, MSCs and the tick community. Daniel Lawson, VectorBase.
Specie: Anopheles gambiae PEST Genome size: 260 Mb Status: 3rd assembly and annotation NIAID funded.
November 2007BRC5 Bethesda Variation data in VectorBase Dan Lawson, VectorBase EMBL-EBI.
ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:
Eastern Africa Barcode Workshop, Oct DNA Barcoding - Parasites and Vectors Dan Masiga Molecular Biology and Biotechnology Department.
EBI is an Outstation of the European Molecular Biology Laboratory. Every genome deserves a home Dan Lawson EMBL-EBI.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Scott Emrich Assistant Professor, Computer Science and Engineering Scientific Manager, VectorBase University of Notre Dame A flexible, scalable genomics.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
thing entity continuant dependent_continuant specifically_dependent_continuant realizable_entity dispositionvector_borne_diseaseOntologies &
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
VectorBase Seth Redmond Imperial College, London
Abstract Although transposable elements (TEs) were discovered over 50 years ago, the robust discovery of them in newly sequenced genomes remains a difficult.
Genomics of Microbial Eukaryotes Igor Grigoriev, Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Annotation of Anopheline Genomes at VectorBase Dan Lawson, VectorBase & The Anopheles Genomes Cluster Consortium EMBL-EBI.
Genomic assessment of mass-reared vs wild Hawaiian Mediterranean fruit flies Bernarda Calla, Brian Hall, Shaobin Hu, and Scott Geib Tropical Crop and Commodity.
The new VectorBase: our improved resource for invertebrate vectors Scott Emrich On behalf of VectorBase “bigger, better, faster” Or “ "consolidate, improve.
VectorBase PopBio Introduction NIH/NIAID VectorBase site visit March 2015.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Pantelis Topalis and Emmanuel Dialynas.  Ontology content  Data annotation with ontologies  Tools to handle and visualize ontologies OWL – OBO parsers.
VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
MAKER Annotation Process Example of Glossina VectorBase Karyn Mégy Dan Hughes.
IDOMAL: an update. IDOMAL statistics IDOMAL published (October 2010) 2390 unique terms (active) 2377 definitions 3146 total relations 9 unique relations.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
DAY 1c: Accessing Completed Genomes 1. UCSC Genome Bioinformatics 2. Ensembl 3. NCBI Genomic Biology.
Pantelis Topalis Ontology developer IMBB-FORTH, Crete Greece.
The iPlant Collaborative
RNA Sequencing I: De novo RNAseq
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
VectorBase BRC The evolving VectorBase gene build: mixing automated and manual approaches when annotating vector genomes Daniel Lawson VectorBase-EBI,
Vectorbase and Galaxy Jarek Nabrzyski On behalf of VectorBase Center for Research Computing University of Notre Dame VectorBase Bioinformatics Resource.
Managing Next Generation Sequence Data with GMOD Dave Clements 1, Scott Cain 2, Paul Hohenlohe 3, Nicholas Stiffler 3, Paul Etter 3, Eric Johnson 3, William.
Importing Community annotations into VectorBase. Aims Provide the VectorBase community with tools for improving genome annotation. Must have low entry.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Interactions with other BRCs Scott Emrich “all hands” meeting VectorBase.
Overview of the Drosophila modENCODE hybrid assemblies Wilson Leung01/2014.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
VectorBase Kolymbari Meeting July 2011 new genomes new features and future plans Daniel Lawson (on behalf of VectorBase)
Generic Database. What should a genome database do? Search Browse Collect Download results Multiple format Genome Browser Information Genomic Proteomic.
Map-based Exploration of Population Biology Data in VectorBase What is VectorBase? We are a consortium of institutions that hosts the genomes of invertebrate.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Overview and History of VectorBase Frank Collins March 31, 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
VectorBase’s Population Biology Resources and How to Submit to Them Bob MacCallum Imperial College, London, UK July 16, 2013.
De novo assembly validation
Funding for Tsetse Genome Sequencing in the USA Neil Hall TIGR (soon to be renamed JCVI) But I am soon to be leaving to go to The University of Liverpool.
Accessing and visualizing genomics data
Immunology Ontology Rho Meeting October 10, 2013.
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Types of Mosquitoes Mosquitoes Aedes Aedes Aegypti Aedes Albopictus Anopheles Anopheles Maculatus Anopheles Sundaicus Mosquitoes are members of a family.
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Current status, problems and why a new approach is necessary
Identify D. melanogaster ortholog
The ability of the SOP to sequence and identify unknown samples.
Introduction Methods Expected outcomes Conclusions
Of Genes and Genomes: Mosquito Evolution and Diversity
Presentation transcript:

VectorBase BRC Overview Scott Emrich BRC 2011 – Annual Meeting UT Southwestern Medical Center Dallas, TX September 2011

VectorBase Scott Emrich (on behalf of VectorBase consortium) University of Notre Dame

VectorBase BRC Meeting September 2011 Upcoming vector genomes NHGRI White papers Sandflies Lutzomyia longipalpis Phlebotomus papatasi Anopheles (AGCC) Anopheles arabiensis Anopheles quadriannulatus Anopheles merus Anopheles melas Anopheles christyl Anopheles epiroticus Anopheles stephensi Anopheles maculatus Anopheles funestus Anopheles minimus Anopheles culicifacies Anopheles farauti Anopheles dirus Anopheles atroparvus Anopheles albimanus Glossina Glossina palpalis Glossina fuscipes Glossina pallidipes Glossina brevipalpis Glossina austeni Stomoxys calcitrans Musca domestica Simulium Simulium vittatum Simulium sirbanum Simulium damnosum Simulium ochraceum Simulium squamosum Simulium thyolense Simulium santipauli Simulium woodi Simulium exiguum Simulium yahense Tick & Mites Leptotrombidium deliense Ixodes scapularis* Dermacentor variabilis Ornithodorus turicata Anopheles Anopheles darlingi* Anopheles stephensi Others Aedes Aedes albopictus Culex cluster? Aedes cluster?...

VectorBase BRC Meeting September 2011 Summary of current contents Genome Gene set Transcriptomic s Gene expression PopGen Aedes aegypti ✓✓✓✓✕ Anopheles gambiae ✓✓✓✓✓ Culex quinquefasciatus ✓✓✕✓✕ Glossina morsitans ✓✓✓✕✕ Ixodes scapularis ✓✓✕✕✕ Pediculus humanus ✓✓✕✕✕ Rhodnius prolixus ✓✓✓✕✕

VectorBase BRC Meeting September 2011 Upcoming challenges We expect to receive over 30 vector genomes in the next 1-2 years Further, our community is generating “-omics” transcriptome data for emerging genomes that need to be integrated To address these issues, we introduced “prerelease” sites

VectorBase BRC Meeting September 2011 Pre-sites for upcoming genomes

VectorBase BRC Meeting September 2011 Pre-sites for upcoming genomes Genome browserBLAST search

VectorBase Supporting species without genomic resources BRC Meeting September 2011

VectorBase RNAseq data Leslie Vosshall, Rockefeller University

VectorBase Integrating experimental data RNA-Seq BRC Meeting September 2011

VectorBase Integrating legacy (BRC#1) annotation data EBI Projection from reference Projection build Aim: Gene prediction using ‘high’ quality reference set from a related species. Overview When annotating a species for which we have a closely related reference species we can align the genomes and project from the ‘high’ quality set onto the new assembly. This is more effective than a similarity build as it allows for building genes across contigs regardless of the assembly. Whole-genome alignment (WGA) between reference and target using BLASTz. Custom filter to ensure that each bp in the target genome is aligned to no more than one position in the reference genome. Project predictions through transformation of coordinates between reference and target assemblies. Summary Effective for low coverage and poor quality assemblies. Limited to reflect only orthologous loci between reference and target, i.e. no novel gene prediction. BRC Meeting September 2011

Examples of integrating data Still under active development Currently > 15k samples from 1600 field collections UC-Davis data IR-base data Neafsey et al. SNP-chip data

GMOD natdiv consortium:

GMOD Natural Diversity module Lightweight schema –All objects defined by ontologies General –SO / GO / PATO Spp. specific –IDOMAL / MIRO Flexible –can handle all data from consortium Vector spp. & butterflies Rice & peaches

TGMA – Mosquito Anatomy Ontology; CARO/BFO TADS – Tick Anatomy Ontology; CARO/BFO MIRO – Ontology of Insecticide Resistance IDOMAL – Malaria Ontology; extension: transmission “VBCV” – Ontology/CV for “completion” of PopGen OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al. New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA” TGMA – Mosquito Anatomy Ontology; CARO/BFO TADS – Tick Anatomy Ontology; CARO/BFO MIRO – Ontology of Insecticide Resistance IDOMAL – Malaria Ontology; extension: transmission “VBCV” – Ontology/CV for “completion” of PopGen OPL (Parasite Lifecycle) with Priti Parykh, Chris Stoeckert et al. New IDO extensions: “IDODEN” (with S. Lonzano & R. Scheuerman) and “IDOCHA” Ontologies hosted by VB

VectorBase Goal: Anopheles gambiae reference Many issues with the PEST assembly as a reference S molecular form is proposed as the next reference Sanger* Illumina † 454 Hybrid assembly strategy Metrics of success Project existing gene predictions de novo prediction in novel regions Re-map important datasets BRC Meeting September 2011

VectorBase Kolymbari Meeting July 2011 Anopheles gambiae reference sequence Validation of the assembly by normal metrics Emphasis on the concordance with large scale restriction map (optical map)

VectorBase BRC Meeting September 2011 Acknowledgements V EMBL-EBI Imperial College Daniel Lawson Derek Wilson Gautier Koscielny Karyn Megy Martin Hammond Daniel Hughes Ewan Birney Paul Kersey Fotis Kafatos Bob MacCallum George Christophides Seth Redmond NoTre Dame HaRvard IMBB New MexicO A Sequencers EnsEmbl Maggie Werner-Washburne Phil Baker Bill Gelbart Susan Russo Dave Emmert Pinlei Zhou Lynn Crosby Kathy Campbell Kitsos Louis Pantelis Topalis Emmanuel Dialynas TIGR/JCVI WashU Broad Institute Baylor Frank Collins Nora Besansky Greg Madey Rob Bruggner Nate Konopinski EO Stinson Scott Emrich Andrew Sheehan Rory Carmichael Dave Cieslak Dave Campbell Ryan Butler Katie Cybulski Neil Lobo