Presentation is loading. Please wait.

Presentation is loading. Please wait.

Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO.

Similar presentations


Presentation on theme: "Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO."— Presentation transcript:

1 Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO

2

3 Introduction Many mapping, alignment and variant calling algorithms Most of these have been developed for whole genome sequencing and to some extent population genetic studies.

4 Premise In contrast, NGS based diagnostics deals with particular genes or mutations of an individual. Different diagnostic targets present specific challenges.

5 Goal Present analysis issues related to differences in:  Sequencing technologies  Targeting technologies  Target specifics  Pseudogenes and segmental duplication

6 NGS Sequencers  Illumina  Ion Torrent  Roche 454  (SOLiD) Roche 454 Illumina IonTorrent t

7 Moore B, Hu H, Singleton M, De La Vega, FM, Reese MG, Yandell M. Genet Med. 2011 Mar;13(3):210-7.

8 Sequencing Technology Differences:  Homopolymer error rates  G/C content errors  Read length  Sequencing protocols (single vs paired reads)

9 Targeting Methods  PCR primers (e.g. amplicons)  Hybridization probes (e.g. exome kits)

10 Targeting Technology Differences:  Exact matching regions vs regions with SNPs. Results in:  Need for mapping against whole chromosomes to avoid false positives.

11 Analysis Targets Differences:  Rate of polymorphism  Repetitive structures  Mutation profiles  G/C content  Single genes vs multi gene complexes

12 BRCA1/2HLACFTR 1/20001/291/2000 Distributions of insertions and deletions Distribution of repeat elements

13

14 Segmental Duplications  Sometimes called Low Copy Repeats (LCRs)  Highly homologous, >95% sequence identity  Rare in most mammals  Comprise a large portion of the human genome (and other primate genomes)  Important for understanding HLA

15 Segmental Duplications  Many LCRs are concentrated in "hotspots" Recombinations in these regions are responsible for a wide range of disorders, including:  Charcot-Marie-Tooth syndrome type 1A  Hereditary neuropathy with liability to pressure palsies  Smith-Magenis syndrome  Potocki-Lupski syndrome

16 Data Analysis Tools Differences:  Detection rates of complex variants (sensitivity)  False positive rates (accuracy)  Speed  Ease of use Data analysis shouldn’t be like this!

17 “Depending upon which tool you use, you can see pretty big differences between even the same genome called with different tools— nearly as big as the two Life Tech/Illumina genomes.” Mark Yandel in BioIT-World.com, June 8, 2011

18 Examples  Missing variants  SNPs, a DNP and deletions

19

20 Identify more valid variants

21 Find homopolymer indels

22 Examples  Coverage differences

23 Four times exon coverage [0-432] [0-96]

24 Higher exome coverage [0-24] [0-10]

25 First conclusion Read accuracy is not the limiting factor in accurate variant analysis.

26 Example  Dense region of SNPs

27 www.omixon.com

28 Second conclusion As variant density increases the performance of most tools goes down.

29 Variant Calling T There are few popular variant callers: GATK, SAMtools mpileup, VarScan The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step These recalibration and realignment steps are highly recommended to be run before any variant call Deduplication and removing non-primary alignments may also be required There are few popular variant callers: GATK, SAMtools mpileup, The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step These recalibration and realignment steps are highly recommended to be run before any variant call Deduplication and removing non-primary alignments may also be required  There are few popular variant callers: GATK, SAMtools mpileup, VarScan  The most comprehensive (GATK) has a whole pipeline, including a quality recalibration step and an indel realignment step  These recalibration and realignment steps are highly recommended to be run before any variant call  Deduplication and removing non-primary alignments may also be required

30 Indel realigner problem

31 Variants that can be hard to find  DNPs  TNPs  Small indels next to SNPs  30+ bp indels  Homopolymer indels  Homopolymer indel and SNP together  Indels in palindromes  Dense regions of variants


Download ppt "Considerations for Analyzing Targeted NGS Data Introduction Tim Hague, CTO."

Similar presentations


Ads by Google