Whole Exome Sequencing for Variant Discovery and Prioritisation

Slides:

Advertisements

Similar presentations

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.

Advertisements

Exome Sequencing as Molecular Diagnostic Tool of Mendelian Diseases

Genetic Approaches to Rare Diseases: What has worked and what may work for AHC Erin L. Heinzen, Pharm.D, Ph.D Center for Human Genome Variation Duke University.

Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.

PRIORITIZING REGIONS OF CANDIDATE GENES FOR EFFICIENT MUTATION SCREENING.

Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.

DNAseq analysis Bioinformatics Analysis Team

Ruibin Xi Peking University School of Mathematical Sciences

SOLiD Sequencing & Data

Targeted Data Introduction  Many mapping, alignment and variant calling algorithms  Most of these have been developed for whole genome sequencing and.

The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis (DNA) Yan Guo.

I inherited What??? You and Your Genes: The Explosive New World of Genetics David Finegold, M.D.

High Throughput Sequencing

NGS Analysis Using Galaxy

Dr Katie Snape Specialist Registrar in Genetics St Georges Hospital

Mapping NGS sequences to a reference genome. Why? Resequencing studies (DNA) – Structural variation – SNP identification RNAseq – Mapping transcripts.

Genome & Exome Sequencing Read Mapping Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.

Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.

MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,

File formats Wrapping your data in the right package Deanna M. Church

Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.

RNAseq analyses -- methods

NGS data analysis CCM Seminar series Michael Liang:

Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.

Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.

Next Generation DNA Sequencing

SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.

CS177 Lecture 10 SNPs and Human Genetic Variation

Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.

Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.

1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.

SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.

E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.

KEY CONCEPT Biotechnology relies on cutting DNA at specific places.

Introduction to RNAseq

HW2: exome sequencing and complex disease Jacquemin Jonathan de Bournonville Sébastien.

Lecture-3 EXOME SEQUENCING Huseyin Tombuloglu, Phd GBE423 Genomics & Proteomics.

P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA

No reference available

Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.

Ke Lin 23 rd Feb, 2012 Structural Variation Detection Using NGS technology.

Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,

Personalized genomics

Calling Somatic Mutations using VarScan

Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.

GSVCaller – R-based computational framework for detection and annotation of short sequence variations in the human genome Vasily V. Grinev Associate Professor.

INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.

Analysis of Next Generation Sequence Data BIOST /06/2015.

A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.

Introduction to Variant Analysis of Exome- and Amplicon sequencing data Lecture by: Date: Training: Extended version see: Dr. Christian Rausch 29 May 2015.

Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.

Canadian Bioinformatics Workshops

Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.

Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.

1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.

From Reads to Results Exome-seq analysis at CCBR

Canadian Bioinformatics Workshops

Interpreting exomes and genomes: a beginner’s guide

Lesson: Sequence processing

Cancer Genomics Core Lab

VCF format: variants c.f. S. Brown NYU

Interpretation Next Generation Sequencing (Bench Clinic)

First Bite of Variant Calling in NGS/MPS Precourse materials

Introduction to RAD Acropora millepora.

Maximize read usage through mapping strategies

BF528 - Genomic Variation and SNP Analysis

Canadian Bioinformatics Workshops

Toward Accurate and Quantitative Comparative Metagenomics

Presentation transcript:

Whole Exome Sequencing for Variant Discovery and Prioritisation

First, a recap. What have we learned? NGS platforms – short and long reads What the data looks like How to QC data General procedures in processing data How to find biological signal in data - RNA-Seq lectures + practical (in progress) There’s a LOT more, but it’s not necessarily more complex or very different!

Exomes: Publication Trends Total: 925 (Oct 2012) 2013: ~ 800 papers 2014: ~ 1200 papers Forero DA, 2012

NGS Variation Discovery Workflow (resequencing based)

Variant Discovery Application: Disease An equivalent of the genome would amount almost 2000 books, containing 1.5 million letters each (average books with 200 pages)! This information is contained in any single cell of the body.

Monogenic Diseases Single mutation How do we find it in all those ‘books’? A bioinformatics challenge NGS sequencers can only read small portions So, the library is fragments of pages of the books!

Mendelian Disease Gene Discovery Gilissen, Genome Biol 2011

Mendelian Disease Gene Discovery Gilissen, Genome Biol 2011

Opportunities and Challenges Enabling technologies: NGS machines, open-source algorithms, capture reagents, lowering cost, big sample collections Exomes more cost effective: Sequence patient DNA and filter common SNPs; compare parents child trios; compare paired normal cancer Challenges: Still can’t interpret many Mendelian disorders Rare variants need large samples sizes Exome might miss region (e.g. novel non-coding genes) Shendure, Genome Biol 2011

Why exome sequencing? WGS still too costly & added value of intergenic mutations is low WES: targeted sequencing of coding regions (~1% of human genome) Mendelian disorders  disrupt protein-coding sequences (mostly) Large fraction of rare non-synonymous variants in human genome are predicted to be deleterious Splice sites also enriched for highly functional variation The exome represents a highly enriched subset of the genome in which to search for variants with large effect sizes

A representation of the relationship between the size of the mutational target and the frequency of disease for disorders caused by de novo mutations Gilissen, Genom Biol 2011

Majewski, J Med Genet 2011

Maximizing chances of finding disease-causing rare variants using exome sequencing Bamshad, Nat Rev Genet 2011

Example: Comparative Sequencing Somatic mutation detection between normal / cancer pairs More mutation yield and better causal gene identification than Mendelian disorders Meyerson et al, Nat Rev Genet 2010

BUT Exome Analysis for single patient can be informative Perrault syndrome (HSD17B4) Pierce, Am J Hum Genet 2010

Exome sequencing procedure

Read Mapping Mapping hundreds of millions of reads the reference genome is CPU and RAM intensive, and ‘slow’ Read quality decreases with length (small single nucleotide mismatches or indels – real or artifact?) Very few mappers appropriately deal with indels Mapping output: SAM (BAM) or BED

Mapped Data: SAM specification Generic sequence alignment format Describes alignment of reads to a reference Flexible - stores all the alignment information Simple enough to be easily generated or converted from other existing alignment formats Keeps track of chromosome position, alignment quality and alignment features (extended cigar) Includes mate pair / paired end information Original FASTQ data can be reproduced from SAM (and BAM)

SAM FIELDS

BAM format Binary version of SAM - more compact Makes downstream analysis independent from the mapping program Allows most of operations on alignment to work on a stream without loading the whole alignment into memory Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus

VCF format Emerging standard for storing variant data Originally designed for SNPs and short INDELs, it also works for structural variations Consists of header and data sections The data section is TAB delimited with each line consisting of at least 8 mandatory fields

VCF FIELDS

Variant filtering

Variant Prioritization Heuristic filtering to identify novel genes for Mendelian disorders Stitziel et al, Genome Biol 2011

More than just SNVs and ‘short’ indels

Structural Variation BreakDancer Chen et al, Nat Meth 2009 Only looks at anomalous read pairs

Copy Number Variation Detection Change in read coverage

Example WES-based variant discovery workflow Map the reads to a reference genome index the reference genome Map (BWA, BOWTIE, NOVOAOLIGN, ETC) Sort BAM file Remove PCR duplicates Realign around indels (‘optional’) Call variants Recalibrate quality scores (‘optional’) Filter variants Basic variant annotation Biological interpretation only starts here