Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bisulfite-Sequencing theory and Quality Control Felix Krueger January 2015.

Similar presentations


Presentation on theme: "Bisulfite-Sequencing theory and Quality Control Felix Krueger January 2015."— Presentation transcript:

1 Bisulfite-Sequencing theory and Quality Control Felix Krueger January 2015

2 9:15-10:15 Bisulfite-Seq theory and QC 10:15-10:30coffee 10:30-11:30Mapping and QC practical 11:30-12:30Visualising and Exploring talk 12:30-13:30Lunch 13:30-14:00Methylation tools in SeqMonk 14:00-15:30Visualising and Exploring practical 15:30-15:45coffee 15:45-16:35Differential methylation talk & practical 16:35-16:45Introduction to other cytosine modifications and oxBS 2

3 DNA Methylation Dynamic patterning Correlation with gene expression Reset during development Suppression of repetitive elements X-chromosome inactivation Imprinting Carcinogenesis Cytosine 5-methyl Cytosine DNA methyl-transferases DNA-demethylase(s)? Hydroxymethyl cytosine? TETs? Passive demethylation? 3

4 DNA methylation is sequence context dependent 4 sequence CCAGTCGCTATA context CpG CHG CHH mammals YES no** plants YES * H being anything but G ** somewhat more relevant in certain cell types

5 Promoter methylation causes transcriptional silencing 5 Adapted from Stephen B Baylin Nature Clinical Practice Oncology (2005) Developmentally important gene or tumour suppressor gene

6 Imprinted Genes: mono-allelic expression 6 Imprinted Genes: Mono-allelic expression with parent-of-origin specificity. Have key roles in energy metabolism, placenta functions. Differential allelic DNA methylation X CGI (CpG island) methylated CpG unmethylated CpG

7 Imprinted Genes: example 7 from Ferguson-Smith A. Nat. Rev. Genet ICR / DMR methylated CGI unmethylated CGI Rather unusual case where a non-methylated DMR causes silencing of a locus ICR: Imprinting Control Region DMR: Differentially Methylated Region

8 DNA methylation is maintained 8 from W. Reik & J. Walter, Nat. Rev. Genet. 2001

9 DNA methylation is reset during reprogramming 9

10 Measuring DNA methylation by Bisulfite-sequencing Image by Illumina 10

11 Bisulfite Informatics CCAGTCGCTATAGCGCGATATCGTA me TTAGTTGCTATAGTGCGATATTGTA...CCAGTCGCTATAGCGCGATATCGTA... ||||||||||||||||||||||||| Convert Map 11

12 - 2 different PCR products and 4 possible different sequence strands from one genomic locus Bisulfite conversion of a genomic locus >>CCGGCATGTTTAAACGCT>> <>TCGGTATGTTTAAACGTT>> <>CCAGCATGTTTAAACGCT>> <>UCGGUATGTTTAAACGUT>><

13 TTGGCATGTTTAAACGTT cc..C Z.c. …TTGGTATGTTTAAATGTT… …AACCATACAAATTTACAA… …CCAACATATTTAAACACT… …GGTTGTATAAATTTGTGA… align to bisulfite converted genomes forward strand C -> T converted genome forward strand G -> A converted genome (equals reverse strand C -> T conversion) 5’…TTGGTATGTTTAAATGTT…3’5’…TTAACATATTTAAACATT…3’ (1) (4)(3) (2) sequence of interest (1)(2)(3)(4) bisulfite convert read (treat sequence as both forward and reverse strand) read all 4 alignment outputs and extract the unmodified genomic sequence if the sequence could be mapped uniquely 5’…CCGGCATGTTTAAACGCT…3’ CCGGCATGTTTAAACGCT TTGGCATGTTTAAACGTT methylation call read sequence genomic sequence c unmethylated C C methylated C z unmethylated C in CpG context Z methylated C in CpG context Mapping a Bisulfite-Seq read methylation call Bismark

14 Common sequencing protocols >>CCGGCATGTTTAAACGCT>> <>TCGGTATGTTTAAACGTT>> <>UCGGUATGTTTAAACGUT>> <>CCAGCATGTTTAAACGCT>> CTOT CTOB 2) PBAT libraries >>TCGGTATGTTTAAACGTT>> <>CCAGCATGTTTAAACGCT>> <

15 Validation 15

16 BS-Seq Analysis Workflow QCTrimmingMapping Methylation extraction Mapped QCAnalysis 16

17 Raw Sequence Data up to 1,000,000,000 lines per lane

18 Part I: Initial QC - What does QC tell you about your library? # of sequences Basecall qualities Base composition Potential contaminants Expected duplication rate 18

19 QC Raw data: Sequence Quality 10% 1% 0.1% Error rate 19

20 QC: Base Composition 20 RRBSWGSBS

21 QC: Duplication rate 21

22 QC: Overrepresented sequences 22

23 Common problems in BS-Seq 23 Not observed in ‘normal’ libraries, e.g. ChIP or RNA-Seq

24 Removing poor quality basecalls 24

25 Removing adapter contamination 25

26 Summary Adapter/Quality Trimming 26 Important to trim because failure to do so might result in:  Low mapping efficiency  Mis-alignments  Errors in methylation calls since adapters are methylated  Basecall errors tend toward 50% (C:mC)

27 Part II: Sequence alignment – Bismark primary alignment output (BAM file) 27 HISEQ :366:C3G4NACXX:3:1101:1316:2067_1:N:0: M = NTTATTTAGTTTTTTAGGGTTTGTGTGTAGGAGTGTGGGAATTATGTTTTTTATGGTTGATATTTATTTAAAAGTGAGTATAAATTATATATATTTTTTT #1=DDDDDAAFFHIIIA: NM:i:14 XX:Z:G8C2C7C21C13C6CC1C17CC3C4CC4 XM:Z: h..h x h x......hh.h hh...h....hh.... XR:Z:CT XG:Z:CT XA:Z:1 HISEQ :366:C3G4NACXX:3:1101:1316:2067_1:N:0: M = GGTTATTTTATTTAGGGTTATTGTTTTAGAGTTTTATTGTTGTGAACAGATATATGATTAAGGTAATTTTTATAAGGATAATATTTAATTGGAGTTGGTT CCCEEECADCFFFFHHGHGHIIGIHFIJJIJIHFGHGGGEHIJIIJGIGFJJJJJJJJJJIGJJJJGJJJJIIIJJIJIJJJJJJIJHHHHHFFFFFCCC NM:i:21 XX:Z:2G2CC1C1C1C11C11C2C10C1C4CC4C2C1C3C5C2C12C3C1 XM:Z:.....hh.h.h.x h x..x......X...h.h....hh....h..h.h...h.....h..h x...h. XR:Z:GA XG:Z:CT XB:Z:1 methylation call positionchromosome Read 1 Read 2 sequence quality

28 Sequence duplication 28 Complex/diverse library: Duplicated library: deduplication percent methylation percent methylation

29 Deduplication - considerations 29 RRBS deduplication Advisable for large genomes and moderate coverage - unlikely to sequence several genuine copies of the same fragment amongst >5bn possible fragments with different start sites - maximum coverage with duplication may still be (read length)-fold (even more with paired-end reads) CCGG NOT advisable for RRBS or other target enrichment methods where higher coverage is either desired or expected CCGG

30 Methylation extraction Z.....h..h x.....Z x......hh.h z....hx...h....hh.Z hh.h z....hx...h....hh.Z...hh....x.....Z.h.....h..h......x...h Read 1 Read 2....Z.....h..h x.....Z x......hh.h z....hx...h....hh.Z... hh....x.....Z.h.....h..h......x...h Read 1 Read 2 CpG methylation output read ID meth state chr poscontext redundant methylation calls

31 Methylation extraction 31 CpG methylation output bedGraph/coverage output methylation percentage meth pos chr unmeth

32 Part III: Mapped QC - Methylation bias 32 good opportunity to look at conversion efficiency

33 Artificial methylation calls in paired-end libraries 5’- GGGNNNNNNNNNNNNNNNNNNNNNNNNNNNCCCA -3’ 3’- ACCCNNNNNNNNNNNNNNNNNNNNNNNNNNNGGG -5’ end repair + A-tailing 33

34 Specialist applications (I): Reduced representation BS-Seq (RRBS) 34 Sequence composition biasHigh duplication rate

35 Fragment size distribution in RRBS 35 identical (redundant) methylation calls

36 Artificial methylation calls in RRBS libraries 36 C genomic cytosine C unmethylated cytosine

37 WGBSPBAT Specialist application (II): Post-bisulfite adapter tagging (PBAT) suitable for low input material 37

38 PBAT-Seq 38 trim off/ ignore first couple of basepairs

39 Bismark workflow Pre Alignment FastQCInitial quality control Trim Galore Adapter/quality trimming using Cutadapt; handles RRBS and paired-end reads; Trim Galore and RRBS User guide Alignment BismarkOutput BAM Post Alignment Deduplicationoptional Methylation extractor Output individual cytosine methylation calls; optionally bedGraph or genome-wide cytosine report M-bias analysis bismark2reportGraphical HTML report generation Example: protocol: Quality Control, trimming and alignment of Bisulfite-Seq data 39

40 Useful links FastQCwww.bioinformatics.babraham.ac.uk/projects/fastqc/www.bioinformatics.babraham.ac.uk/projects/fastqc/ Trim Galore Cutadapthttps://code.google.com/p/cutadapt/https://code.google.com/p/cutadapt/ Bismarkwww.bioinformatics.babraham.ac.uk/projects/bismark/www.bioinformatics.babraham.ac.uk/projects/bismark/ Bowtiehttp://bowtie-bio.sourceforge.net/http://bowtie-bio.sourceforge.net/ Bowtie 2http://bowtie-bio.sourceforge.net/bowtie2/http://bowtie-bio.sourceforge.net/bowtie2/ SeqMonk Cluster Flowwww.bioinformatics.babraham.ac.uk/projects/clusterflow/www.bioinformatics.babraham.ac.uk/projects/clusterflow/ 40 protocol: Quality control, trimming and alignment of Bisulfite-Seq data

41 FastQC: quality control for high throughput sequencing FastQ Screen: organism and contamination detection Hi-C mapping SeqMonk: Genome browser, quantitation and data analysis Bismark: Bisulfite-sequencing alignments and methylation calls Sierra: A web-based LIMS system for small sequencing facilities ASAP: Allele-specific alignments Trim Galore! Quality and adapter trimming for (RRBS) libraries 41


Download ppt "Bisulfite-Sequencing theory and Quality Control Felix Krueger January 2015."

Similar presentations


Ads by Google