Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi

Slides:



Advertisements
Similar presentations
NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
Advertisements

Integrating Genome and Transcriptome Resources into TreeGenes Jill Wegrzyn David Neale Doreen Main Keithanne Mockaitis.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Structural and Functional Genomics of Tomato Barone et al Tomato (Solanum Lycopersicon) – economically important crop worldwide, – intensively investigated.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Green Space Services for Local Monitoring Aratos Technologies S.A.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Designing CAPS markers using SGN CAPS Designer
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Chapter 6 : Software Metrics
SAGExplore web server tutorial for Module II: Genome Mapping.
NGS data analysis CCM Seminar series Michael Liang:
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Sook Jung, Jill Wegrzyn, David Neale and Dorrie Main An easy to use, web-based solution for specialty.
SAGExplore web server tutorial for Module I: Genome Explore.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
EB3233 Bioinformatics Introduction to Bioinformatics.
SAVANT GENOME BROWSER Marc Fiume Department of Computer Science University of Toronto.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
No reference available
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Progress on TripalBIMS Breeding Information Management System in Tripal Sook Jung, Taein Lee, Chun-Huai Chen, Jing Yu, Ksenija Gasic, Todd Campbell, Kate.
Canadian Bioinformatics Workshops
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
From Reads to Results Exome-seq analysis at CCBR
Deploying Galaxy for use with High Throughput Screening
Introductory RNA-seq Transcriptome Profiling
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Gil McVean Department of Statistics
Online BIOS QTL atlases
Integrative Genomics Viewer (IGV)
MATLAB Distributed, and Other Toolboxes
NGS Analysis Using Galaxy
Outline Introduction Standards Project General Idea
Genome Sequence Annotation Server
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
GE3M25: Data Analysis, Class 4
DIGITAL LIBRARY.
Maximize read usage through mapping strategies
Genome Database for Rosaceae:
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
ChIP-seq Robert J. Trumbly
How to Effectively Search and Download Data in CottonGen
CottonGen: Enabling Cotton Research through Big-Data Analysis and Integration Jing Yu, Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Ping Zheng,
Computational Pipeline Strategies
GENEDI EUROPEAN COMMISSION - EUROSTAT GENERIC EDI TOOLBOX
Web Application Development Using PHP
Presentation transcript:

DeBaser: An online tool for NGS data assembly and fast polymorphism detection. Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi OVERVIEW The advent of Next Generation Sequencing (NGS) represented a dynamic leap in the capacity to study the genomic basis of variation within and between species. NGS has allowed large scale comparison of genetic variation both in terms of expression level and sequence composition. Regions of difference, or polymorphisms, between genomes can explain variation between development, morphology and responses to external biotic or abiotic influences. Knowledge of this variation is not only important in understanding possible causes of phenotypic diversity but is also crucial for the successful design of RNAi based laboratory tools such as artificial microRNAs or viral induced gene silencing constructs (VIGS). Such techniques require knowledge of a variety’s exact sequence to ensure efficient knockdown and prediction off-targeting effects. To help facilitate the rapid discovery of polymorphisms between plant varieties we have utilised the increasing amount of NGS data available to construct an on-line database; “DeBaser”. The database stores assembly transciptomic and genomic NGS data for a range of compare varieties within selected species. This enables DeBaser to also function as a polymorphism finder through the option to provide output via an integrated multi-alignment tool. Polymorphisms between assembled transcriptomes are determined by selecting multiple varieties in the web interface. Users retrieve sequence information for each by entering selected gene identifiers or FASTA files. Multi-sequence alignment files showing polymorphisms between varieties are generated via MultiAlin or Muscle. The backend of DeBaser incorporates a NGS assembly pipeline. This pipeline serves to process NGS data and is available to users as an assembly tool. To utilise this pipeline, users upload NGS data and, if required, a reference genome or transcriptome. After processing, the assembly is then stored permanently on the website and can be retrieved in full or the user can specify individual transcripts by entering identifiers for genes of interest. The designers have also provided pre-assembled plant transcriptomes which can be utilised along with the user provided data in polymorphism detection. The DeBaser pipeline combines existing bioinformatic software to produce a mapped assembly and consensus sequences for each gene/transcript in the full data set. Initially Bowtie [1] is used to align raw NGS reads to the reference set. Then Samtools [1] sorts, indexes and converts .sam files to .bam files. These are piped to ANGSD [2] which is used to measure the read depth and base variants at each position and produce consensus files for every gene/cds identifier. These files can then be added to the existing species database for retrieval or for polymorphism detection. We believe DeBaser offers a number of advantages over existing polymorphism detection tools. Many of these operate on linux systems which require the user to possess the relevent programming skills. Others, such as InSNP, NovoSNP or VarScan can be installed on Windows or Mac operating systems but require stand-alone installation and users do not get the benefit of the computing power available from high performance servers. HaploSNPer is a platform independent SNP detection tool available online however it requires that the sequences being compared are in the form of assembled sequence data. Therefore, we anticipate that Polymorph will be utilised by users seeking to rapidly detect polymorphisms within specific genes of interest. The tool accessible entirely online and provides a complete pipeline, starting from raw NGS data through to multiple sequence alignment. Initially DeBaser will store assemblies for IITA and other CGIAR center mandated crops as well as several plant model organism species. DeBaser is in the final stages of development and will be released in the second half of 2017. 1 Or Input Select varieties to align FASTA files Gene identifiers Sequence collection from archived assembly Multi-alignment tool Graphic alignment file/s -Muscle -MultiAlin Output FASTA text files Archived Reference transcriptome or genome NGS raw data Archived assembly Bowtie Alignment Samtools sam to bam conversion ANGSD Generation of consensus sequences Reference transcriptome or genome Full assembly B A Figure 1 Overview of workflow within the DeBaser pipeline. The pipeline functions as both a polymorphism finder using preassembled datasets (A) and as a NGS assembly tool (B) Africa. 2 Figure 2 Example of a Polymorph output file. The transcription factor WRKY36 sequences from four cassava varieties are assembled from raw NGS data and aligned via Multalin. The result reveals significant polymorphism between varieties. 3 Figure 3 Image of Polymorph home page. A simple interface allows users to select multiple varieties within one of a range of species as well as one or more preferred output styles Acknowledgements DeBaser is a tool initially developed to facilitate candidate gene selection within the IIITA Cassava VIGS project. We thank the German Organisation for Technical Cooperation (GTZ) for funding provided for this project. References 1. Langmead, B., et al., Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome biol, 2009. 10(3): p. R25. 2. Korneliussen, T.S., A. Albrechtsen, and R. Nielsen, ANGSD: analysis of next generation sequencing data. BMC bioinformatics, 2014. 15(1): p. 356.