Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.

Slides:



Advertisements
Similar presentations
Considerations for Analyzing Targeted NGS Data HLA
Advertisements

Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Current methods for high-throughput resequencing of custom targets Adam Gordon Nickerson Lab, UW Genome Sciences WHI Genetics SIG call 3/26/14.
Next–generation DNA sequencing technologies – theory & practice
Finding the Lost Treasure of NGS Data Yan Guo, PhD.
DNAseq analysis Bioinformatics Analysis Team
1 Detection of nTARs in the mouse intestinal transcriptome BMC Genomics, 2011.
High Throughput Sequencing
Transcriptome Sequencing with Reference
Introduction  Human leukocyte antigen (HLA) is the major histocompatibility complex (MHC) in humans  Group of genes ('superregion') on chromosome 6.
Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.
Gene Prediction Methods G P S Raghava. Prokaryotic gene structure ORF (open reading frame) Start codon Stop codon TATA box ATGACAGATTACAGATTACAGATTACAGGATAG.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis (DNA) Yan Guo.
I inherited What??? You and Your Genes: The Explosive New World of Genetics David Finegold, M.D.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
High Throughput Sequencing
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
NGS Analysis Using Galaxy
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Whole Exome Sequencing for Variant Discovery and Prioritisation
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
Todd J. Treangen, Steven L. Salzberg
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.
Introduction to Short Read Sequencing Analysis
1 Velvet: Algorithms for De Novo Short Assembly Using De Bruijn Graphs March 12, 2008 Daniel R. Zerbino and Ewan Birney Presenter: Seunghak Lee.
Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Problems of Genome Assembly James Yorke and Aleksey Zimin University of Maryland, College Park 1.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
SPIDA Substitution Periodicity Index and Domain Analysis Combining comparative sequence analysis with EST alignment to identify coding regions Damian Keefe.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Personalized genomics
GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Clear Lines Consulting · clear-lines.comApril 21, 2010 · 1 The Joy of Pex
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
Interpreting exomes and genomes: a beginner’s guide
Short Read Sequencing Analysis Workshop
Cancer Genomics Core Lab
Next Generation Sequencing Analysis
Disease risk prediction
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Ssaha_pileup - a SNP/indel detection pipeline from new sequencing data
EMC Galaxy Course November 24-25, 2014
Chapter 4 “DNA Finger Printing”
Jin Zhang, Jiayin Wang and Yufeng Wu
Genome organization and Bioinformatics
Eric Samorodnitsky, Jharna Datta, Benjamin M
The Right Tool for the Job: Two Platforms for Targeted DNA Sequencing
Canadian Bioinformatics Workshops
Presentation transcript:

Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO

Exome Analysis  3 sets of full exome sequences for the same individual, targeted by 3 different kits  One set had data problems because reads were from 2 different sequencers  Remaining 2 sets were analyzed both by the customer and by Omixon

Exome Targets  Illumina TruSeq ~62 Mbp  Nimblegen SeqCap EZ Exome ~64 Mbp  ~35 Mbp overlap between targets  Exons, ORFs and putative translated regions captured  40M and 37M read pairs resp., 101bp length

Full Analysis Pipelines  In this case we are comparing two full NGS analysis pipelines  Including the mapping/alignment and a multi-step variant call pipeline  The Omixon pipeline for this analysis uses two variant callers  The Omixon pipeline also uses recalibration and indel realignment

Finding long indels 1.

Better indel resolution 1.

Better indel resolution 2.

Indel Handling  If indels are important to an analysis then this needs to be taken into account, from the planning stage onwards  BWA does better when indel realignment is used, in combination with paired data

Less low quality false positives

Quality and Coverage  Some of these low quality variants can be removed by filtering, after variant call  Quality and coverage cut-offs have to be parameterized properly in the alignment and variant call  Quality recalibration can also help to reduce low quality false positives

Variations next to coding areas

Splicing and Promoters  Most of the exon kits also provide variant calls close to the coding regions  These should be included in the analysis if possible

Less false positives in complex regions 1.

Less false positives in complex regions 2.

Less false positives in complex regions 3.

Less false positives in complex regions 4. Higher coverage.

Less false positives in complex regions 5. Lower coverage.

Complex regions  Mismappings due to pseudogenes or repeats – or just complex regions?  Sometime more coverage can actually be bad  Need to watch out for non-specific read mappings (reads mapping to multiple places)

Regions where both aligners are confused 1.

Regions where both aligners are confused 2.

Very Complex Regions  Some regions are extremely difficult to map with any techniques  A different approach may be required to mapping/alignment  A different approach may be required to variant call (local de novo, phasing etc)

Problems with sex chromosomes  There are may heterozygous calls in the X and Y chromosomes that are certainly false positives or incorrect calls.  This is true for both pipelines, the read specificity and variant call procedure has to be improved for these chromosomes.

Summary  These kinds of comparative studies can be useful in analyzing the effectiveness of exome sequencing  Different exome kits can give different results  The data analysis and variant call tools chosen for the analysis can also have a big impact  There is some potential to improve the quality of the customer's exome analysis pipeline