Visualising and Exploring BS-Seq Data

Slides:

Advertisements

Similar presentations

Functional Genomics with Next-Generation Sequencing

Advertisements

The Normal Distribution

Differential Methylation Analysis

Section D: Chromosome StructureYang Xu, College of Life Sciences Section D Prokaryotic and Eukaryotic Chromosome Structure D1 Prokaryotic Chromosome Structure.

Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.

We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.

Simon v2.3 RNA-Seq Analysis Simon v2.3.

Peter Tsai Bioinformatics Institute, University of Auckland

Finding approximate palindromes in genomic sequences.

Microarray Type Analyses using Second Generation Sequencing

SeqMonk tools for methylation analysis

Quantitative Genetics

Density Curves and Normal Distributions

Committee Meeting April 24 th 2014 Characterizing epigenetic variation in the Pacific oyster (Crassostrea gigas) Claire Olson School of Aquatic and Fishery.

DNA Methylation Assays High Throughput Data Analysis BIOS , VCU Winter 2010 Mark Reimers, PhD.

Expression Analysis of RNA-seq Data

Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.

RNA-Seq Analysis Simon V4.1.

Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.

ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,

RNA surveillance and degradation: the Yin Yang of RNA RNA Pol II AAAAAAAAAAA AAA production destruction RNA Ribosome.

1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:

Log 2 (expression) H3K4me2 score A SLAMF6 log 2 (expression) Supplementary Fig. 1. H3K4me2 profiles vary significantly between loci of genes expressed.

Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.

No reference available

Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.

A Framework and Methods for Characterizing Uncertainty in Geologic Maps Donald A. Keefer Illinois State Geological Survey.

Open access toolkit for nonparametric explorative pattern mining to detect events relating to disease in large scale genome sequences Thahir P. Mohamed,

HOMER – a one stop shop for ChIP-Seq analysis

Plot type specific considerations

Misleading bioinformatics: Mistakes, Biases, Mis-interpretations and how to avoid them Festival of Genomics 2017 Course Exercise Material:

Differential Methylation Analysis

Simon v RNA-Seq Analysis Simon v

Volume 23, Issue 2, Pages (February 2016)

SeqMonk tools for methylation analysis

SeqMonk tools for methylation analysis

RNA Quantitation from RNAseq Data

Simon v1.0 Motif Searching Simon v1.0.

Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017

Biases and their Effect on Biological Interpretation

Gene expression from RNA-Seq

QC analysis Uppsala University Work done by Jonas Almlöf

Understanding and Validating Experimental Expectations

The FASTQ format and quality control

Analysing ChIP-Seq Data

Pig iPSCs are reprogrammed differently and show genome-wide hypomethylation compared with mouse and human. Pig iPSCs are reprogrammed differently and show.

ChIP-Seq Data Processing and QC

Exploring and Understanding ChIP-Seq data

Visualising and Exploring BS-Seq Data

Volume 10, Issue 5, Pages (May 2018)

Volume 9, Issue 1, Pages (July 2017)

Volume 146, Issue 6, Pages (September 2011)

Volume 23, Issue 2, Pages (February 2016)

Volume 1, Issue 6, Pages (December 2013)

Epigenomic Analysis of Sézary Syndrome Defines Patterns of Aberrant DNA Methylation and Identifies Diagnostic Markers Remco van Doorn, Roderick C. Slieker,

Volume 10, Issue 11, Pages (March 2015)

Volume 133, Issue 3, Pages (May 2008)

Simon V Motif Searching Simon V

Genome-wide sperm deoxyribonucleic acid methylation is altered in some men with abnormal chromatin packaging or poor in vitro fertilization embryogenesis

Artefacts and Biases in Gene Set Analysis

Volume 126, Issue 6, Pages (September 2006)

Normalization for cDNA Microarray Data

Sequence Analysis - RNA-Seq 2

Volume 11, Issue 7, Pages (May 2015)

Global analysis of the chemical–genetic interaction map.

Presentation transcript:

Visualising and Exploring BS-Seq Data Simon Andrews simon.andrews@babraham.ac.uk @simon_andrews

Starting Data L001_bismark_bt2_pe.deduplicated.bam Read 1 Read 2 Read 3 Genome L001_bismark_bt2_pe.deduplicated.bam CHG_OB_L001_bismark_bt2_pe.deduplicated.txt.gz CHG_OT_L001_bismark_bt2_pe.deduplicated.txt.gz CHH_OB_L001_bismark_bt2_pe.deduplicated.txt.gz CHH_OT_L001_bismark_bt2_pe.deduplicated.txt.gz CpG_OB_L001_bismark_bt2_pe.deduplicated.txt.gz CpG_OT_L001_bismark_bt2_pe.deduplicated.txt.gz

Starting Data Mapped Reads Methylation Calls BAM file format Contains positions of mapped reads Useful for looking at coverage biases Methylation Calls Delimited text file Individual calls for single cytosines Useful for looking at methylation patterns

Checking Coverage

Coverage Outliers Around 600x average genome density

Coverage Outliers

Coverage Outliers Normally the result of mis-mapping repetitive sequences not in the genome assembly Centromeric / temomeric sequences are common Can be a significant proportion of all data Can throw off calculations of overall methylation Should be removed

Which data to use? Methylation contexts Methylation strands CpG: Only generally relevant context for mammals CHG: Only known to be relevant in plants CHH: Generally unmethylated Methylation strands CpG methylation is generally symmetric Normally makes sense to merge OT / OB strands

Quantitating Methylation Total methylated calls = 15 Total unmethylated calls = 10 Methylation level = (15/(15+10))*100 = 60%

Quantitating Methylation Total methylated calls = 20 Total unmethylated calls = 5 Methylation level = (20/(20+5))*100 = 80%

Quantitating Methylation Total methylated calls = 12 Total unmethylated calls = 14 Methylation level = (12/(12+14))*100 = 46%?

Quantitating Methylation 100% 0% Methylation level = (100+100+100+100+0)/5 = 80%

Quantitating Methylation 100% 0% 100% 0% Methylation level = (100*5)+(0*5)/10 = 50% ? Methylation level = (100*5)/5 = 100%

Quantitating Methylation 57% 33% Common = 300/6 = 50% in both

More Complex Methods Factors to consider during quantitation Surrounding methylation levels Assume that methylation doesn’t change over very short distances Coverage of individual bases Down-weight very low or high values Density of CpGs Apply the level to all bases within a region

Where to make measures Per base Unbiased windows Targeted regions Very large number of measures Poor accuracy for individual bases Unbiased windows Tiled over whole genome Need to decide how they will be defined Targeted regions Which regions What context

Unbiased analysis What basis do you use for creating the windows? Fixed size? Uneven CpG distribution can be problematic Fixed CpG count? Different resolution Need to tailor the window sizes to the data density Can relate windows to features later

Fixed size windows Must balance technical considerations and biology 300 CpG content /5kb 150 Differences between replicates 1kb 2kb 5kb 10kb 25kb 50kb Must balance technical considerations and biology

Fixed CpG windows More even noise profile Fairer statistics More comparable methylation values Length differences are not very dramatic 50 CpG window lengths

Viewing quantitated methylation

Large sample numbers

Large sample numbers

Targeted Quantitation Measure over features CpG islands Be careful where you get your locations Promoters Should probably split into CpG island and non-CpG island Gene bodies Filter by biotype to remove small RNA genes?

Viewing comparisons

Viewing comparisons

Viewing differences

Viewing differences

Trends Effects at individual loci can be subtle Want to find more generalised effect Collate information across whole genome Look for general trend

Considerations for trend plots What features to use Fixed vs relative scale How much context Variable scales How to calculate base measures What window size Aligned vs unaligned windows Missing values Scale

Simple Example Methylation profile centred on CpG islands +/- 10kb

More complex example Methylation profile over genes +/- 5kb

Clustering

Clustering Correlation Clustering Euclidean Clustering Focusses on the differences between conditions Absolute values not important Look for similar trends Show median normalised values Euclidean Clustering Focusses on absolute differences between conditions Look for similar levels Show raw values

Clustering