Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers.

Similar presentations


Presentation on theme: "Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers."— Presentation transcript:

1 Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers

2 Sequencing Errors and Biases Genomic Data Analysis Course Moscow July 2013 Mark Reimers, Ph.D

3 Outline Sequencing errors Initiation biases Quantification biases Are biases consistent across samples? Compensating biases

4 Types of mismatches in Illumina data are profoundly asymmetric and biased Courtesy Thierry-Mieg from uniquely mapped tags with a single mismatch

5 Position of single mismatch in uniquely mapped tags Courtesy Thierry-Mieg

6 Initiation Biases

7 Nucleotide frequencies versus position for stringently mapped reads. Hansen K D et al. Nucl. Acids Res. 2010;38:e131-e131 © The Author(s) 2010. Published by Oxford University Press.

8 Start Position Bias is Visible in MT-RNA

9 Start Position Bias is Consistent Across Samples Counts per start site in lane 1 vs lane 2 (Marioni et al, Gen Res, 2008)

10 Quantification Biases

11 Consistent Technology-Specific Biases (a) 25-kb region of chromosome 11 amplified by three long- range PCR products (red rectangles). (b) A heat-map colored matrix displays the correlation of coverage depth across 260 kb of sequence between four samples by three technologies from Harrismendy et al Genome Biology 2009

12 Quantitative Biases Not all regions represented equally GC rich regions represented more Independent of GC some chromosome regions represented more – Euchromatin bias Sequence initiation site biases ‘ Mapability ’ biases – some regions won ’ t have any uniquely mapped tags

13 GC Bias Density of reads depends strongly on GC content of regions Most bias seems to come from PCR reaction Newer techniques show less bias but still strong GC content (%) of 1 kb region Number of Reads in 1 kb region From Dohm et al 2008

14 GC Bias depends on temperature Aird et al (Genome Biology 2011) did systematic tests of effects of various conditions on GC bias They provided protocols that improve CG bias but don’t eliminate it NB. Log scale

15 Even Best Protocols have Bias GC bias in Illumina reads from a 400-bp fragment library amplified using the standard PCR protocol (Phusion HF, short denaturation) on a fast- ramping thermocycler (red squares), Phusion HF with long denaturation and 2M betaine (black triangles), AccuPrime Taq HiFi with long denaturation and primer extension at 65°C (blue diamonds) or 60°C (purple diamonds) From Aird et al Genome Biology 2011

16 Biases Are NOT Consistent The plot on left shows Log-fold changes between RPKM values from two biological replicates (NA11918, NA12761) from the data of Montgomery et al, Nature 2010 From Hansen et al 2012


Download ppt "Sequencing Errors and Biases Biological Sequence Analysis BNFO 691/602 Spring 2013 Mark Reimers."

Similar presentations


Ads by Google