Presentation is loading. Please wait.

Presentation is loading. Please wait.

ChIP-seq analysis with MACS2 Tips and tricks

Similar presentations


Presentation on theme: "ChIP-seq analysis with MACS2 Tips and tricks"— Presentation transcript:

1 ChIP-seq analysis with MACS2 Tips and tricks
Sami Heikkinen, PhD Docent in Molecular Bioinformatics Institute of Biomedicine, UEF

2 ChIP-Seq simplified Where? Schmidt et al, Methods, 2009
Park, Nat Rev Genetics, 2009 Schmidt et al, Methods, 2009

3 From binding to binding sites
Control sample: “Input” or “IgG” Input: sonicated chromatin without immunoprecipitation IgG: “unspecific” IP ChIP-seq ~200 bp 36-50 bp Park, Nat Rev Genetics, 2009 Typically millions of reads per sample

4 MACS2 Model-based Analysis of ChIP-Seq
Original version published by Yong Zhang and Tao Liu from the lab of X. Shirley Liu at the Dana-Farber Cancer Institute, Boston Genome Biology 2008, 9:R137 now at version , developed and maintained by Tao Liu at Package of command line programs to call peaks in ChIP-seq data Much improved since v1.x!!!

5 MACS2 – program(s) callpeak INPUT DATA: aligned sequence reads
filterdup ChIPed sample “treat” Input/IgG “control” randsample callpeak predictd pileup pileup.bdg peaks.xls refinepeaks peaks.narrowPeak refinepeak.bed summits.bed bdgpeakcall OUTPUT FILEs model.r bdgbroadcall model.pdf bdgcmp OUTPUT bdgdiff treat_pileup.bdg diffpeak control_lambda.bdg

6 callpeak - Options Various options to indicate/control input, output, peak modelling and peak calling macs2 callpeak usage: macs2 callpeak [-h] -t TFILE [TFILE ...] [-c [CFILE [CFILE ...]]] [-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE, BAMPE}] [-g GSIZE] [--keep-dup KEEPDUPLICATES] [--buffer-size BUFFER_SIZE] [--outdir OUTDIR] [-n NAME] [-B] [--verbose VERBOSE] [--trackline] [--SPMR] [-s TSIZE] [--bw BW] [-m MFOLD MFOLD] [--fix-bimodal] [--nomodel] [--shift SHIFT] [--extsize EXTSIZE] [-q QVALUE] [-p PVALUE] [--to-large] [--ratio RATIO] [--down-sample] [--seed SEED] [--nolambda] [--slocal SMALLLOCAL] [--llocal LARGELOCAL] [--broad] [--broad-cutoff BROADCUTOFF] [--call-summits]

7 Using MACS – connect to server
Open the SSH client at Win –> All programs –> SSH Secure shell –> Secure shell client “Quick connect” connection : intron.uef.fi username : <your user ID> password: <your password>

8 Unix 101 pwd show Present Working Directory cd Change Directory e.g. ‘cd /home/work/public’ to get to the folder we use today (from wherever you are) or, to get back to your home directory: ‘cd $HOME’ or, back one step ‘cd ..’, or two steps ‘cd ../../’ Usage tip: use up/down arrow keys to move in command history ls LiSt files in directory e.g. ‘ls -l’ to show file and folder names AND other info (Long format) head / tail show first/last lines of a (text) file e.g. ‘head -20 ref_hg19.txt’ Usage tip: use the TAB key to fill in available file/folder names

9 Using MACS - setup cd /home/work/public mkdir macsout_<user ID>
<user ID> : e.g. ‘spheikki’ for me each student MUST have their own folder!! to avoid overlapping MACS outputs checks on seq files ls –l seq head seq/* check that macs2 works macs2 callpeak

10 callpeak - Options Various options to indicate/control input, output, peak modelling and peak calling macs2 callpeak usage: macs2 callpeak [-h] -t TFILE [TFILE ...] [-c [CFILE [CFILE ...]]] [-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE, BAMPE}] [-g GSIZE] [--keep-dup KEEPDUPLICATES] [--buffer-size BUFFER_SIZE] [--outdir OUTDIR] [-n NAME] [-B] [--verbose VERBOSE] [--trackline] [--SPMR] [-s TSIZE] [--bw BW] [-m MFOLD MFOLD] [--fix-bimodal] [--nomodel] [--shift SHIFT] [--extsize EXTSIZE] [-q QVALUE] [-p PVALUE] [--to-large] [--ratio RATIO] [--down-sample] [--seed SEED] [--nolambda] [--slocal SMALLLOCAL] [--llocal LARGELOCAL] [--broad] [--broad-cutoff BROADCUTOFF] [--call-summits]

11 callpeak – Options - Input
Input files arguments: -t TFILE [TFILE ...], --treatment TFILE [TFILE ...] ChIP-seq treatment file. If multiple files are given as '-t A B C', then they will all be read and combined. REQUIRED. -c [CFILE [CFILE ...]], --control [CFILE [CFILE ...]] Control file. If multiple files are given as '-c A B C', then they will all be read and combined. -f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE}, --format {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE} Format of tag file, "AUTO", "BED" or "ELAND" or "ELANDMULTI" or "ELANDEXPORT" or "SAM" or "BAM" or "BOWTIE" or "BAMPE". The default AUTO option will let MACS decide which format the file is. Please check the definition in README file if you choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE. DEFAULT: "AUTO" -g GSIZE, --gsize GSIZE Effective genome size. It can be 1.0e+9 or , or shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs --keep-dup KEEPDUPLICATES It controls the MACS behavior towards duplicate tags at the exact same location -- the same coordination and the same strand. The 'auto' option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff; and the 'all' option keeps every tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1 --buffer-size BUFFER_SIZE Buffer size for incrementally increasing internal array size to store reads alignment information. In most cases, you don't have to change this parameter. However, if there are large number of chromosomes/contigs/scaffolds in your alignment, it's recommended to specify a smaller buffer size in order to decrease memory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 2 Bytes. DEFAULT:

12 callpeak – Options - Output
Output arguments: --outdir OUTDIR If specified all output files will be written to that directory. Default: the present working directory -n NAME, --name NAME Experiment name, which will be used to generate output file names. DEFAULT: "NA" -B, --bdg Whether or not to save extended fragment pileup, and local lambda tracks (two files) at every bp into a bedGraph file. DEFAULT: False --verbose VERBOSE Set verbose level of runtime message. 0: only show critical message, 1: show additional warning message, 2: show process information, 3: show debug messages. DEFAULT:2 --trackline Tells MACS to include trackline with bedGraph files. To include this trackline while displaying bedGraph at UCSC genome browser, can show name and description of the file as well. However my suggestion is to convert bedGraph to bigWig, then show the smaller and faster binary bigWig file at UCSC genome browser, as well as downstream analysis. Require -B to be set. Default: Not include trackline. --SPMR If True, MACS will save signal per million reads for fragment pileup profiles. Require -B to be set. Default: False

13 Using MACS – test different settings
Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam --outdir macsout_<user ID> -n defaults

14

15 Using MACS – test different settings
Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam --outdir macsout_<user ID> -n defaults ls –l macsout_<user ID> head -40 macsout_<user ID>/*

16 callpeak – Options – Peak calling 1
Peak calling arguments 2: --nolambda If True (=set), MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure. --slocal SMALLLOCAL The small nearby region in basepairs to calculate dynamic lambda. This is used to capture the bias near the peak summit region. Invalid if there is no control data. If you set this to 0, MACS will skip slocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 1000 --llocal LARGELOCAL The large nearby region in basepairs to calculate dynamic lambda. This is used to capture the surround bias. If you set this to 0, MACS will skip llocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: --broad If set, MACS will try to call broad peaks by linking nearby highly enriched regions. The linking region is controlled by another cutoff through --linking-cutoff. The maximum linking region length is 4 times of d from MACS. DEFAULT: False --broad-cutoff BROADCUTOFF Cutoff for broad region. This option is not available unless --broad is set. If -p is set, this is a pvalue cutoff, otherwise, it's a qvalue cutoff. DEFAULT: 0.1 --call-summits If set, MACS will use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region. DEFAULT: False

17 --call-summits

18 Using MACS – test different settings
Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits From command history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam --outdir macsout_<user ID> --call-summits -n cs.defaults

19 callpeak – Options – Peak calling 2
Peak calling arguments 1: -q QVALUE, --qvalue QVALUE Minimum FDR (q-value) cutoff for peak detection. DEFAULT: q, and -p are mutually exclusive. -p PVALUE, --pvalue PVALUE Pvalue cutoff for peak detection. DEFAULT: not set. -q, and -p are mutually exclusive. If pvalue cutoff is set, qvalue will not be calculated and reported as -1 in the final .xls file. --to-large When set, scale the small sample up to the bigger sample. By default, the bigger dataset will be scaled down towards the smaller dataset, which will lead to smaller p/qvalues and more specific results. Keep in mind that scaling down will bring down background noise more. DEFAULT: False --ratio RATIO When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling. DEFAULT: ingore --down-sample When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. Warning: This option will make your result unstable and irreproducible since each time, random reads would be selected. Consider to use 'randsample' script instead. If used together with –SPMR, 1 million unique reads will be randomly picked. Caution: due to the implementation, the final number of selected reads may not be as you expected! DEFAULT: False --seed SEED Set the random seed while down sampling data. Must be a non-negative integer in order to be effective. DEFAULT: not set

20 callpeak – Options – The Model
Shifting model arguments: -s TSIZE, --tsize TSIZE Tag size (=read length). This will overide the auto detected tag size. DEFAULT: Not set --bw BW Band width for picking regions to compute fragment size. This value is only used while building the shifting model. DEFAULT: 300 -m MFOLD MFOLD, --mfold MFOLD MFOLD Select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. Fold-enrichment in regions must be lower than upper limit, and higher than the lower limit. Use as "-m 10 30". DEFAULT:5 50 --fix-bimodal Whether turn on the auto pair model process. If set, when MACS failed to build paired model, it will use the nomodel settings, the --exsize parameter to extend each tags towards 3' direction. Not to use this automate fixation is a default behavior now. DEFAULT: False --nomodel Whether or not to build the shifting model. If True, MACS will not build model. by default it means shifting size = 100, try to set extsize to change it. DEFAULT: False --shift SHIFT (NOT the legacy --shiftsize option!) The arbitrary shift in bp. Use discretion while setting it other than default value. When NOMODEL is set, MACS will use this value to move cutting ends (5') towards 5'->3’ direction then apply EXTSIZE to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with EXTSIZE option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE for paired-end data. DEFAULT: 0. --extsize EXTSIZE The arbitrary extension size in bp. When nomodel is true, MACS will use this value as fragment size to extend each read towards 3' end, then pile them up. It's exactly twice the number of obsolete SHIFTSIZE. In previous language, each read is moved 5'->3’ direction to middle of fragment by 1/2 d, then extended to both direction with 1/2 d. This is equivalent to say each read is extended towards 5'->3’ into a d size fragment. DEFAULT: 200. EXTSIZE and SHIFT can be combined when necessary. Check SHIFT option.

21 macs model: the ‘d’ (page 1)

22 macs model: the ‘d’ (page 2)

23 Using MACS – test different settings
Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits From command history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam --outdir macsout_<user ID> --call-summits --bw 200 –n bw200.cs

24 Using MACS – test different settings
Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limits From command history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam --outdir macsout_<userid> --call-summits --bw 200 -m –n m40.80.bw200.cs

25 ChIP-seq results compared

26 ChIP-seq results: run 3 vs 1-2

27 ChIP-seq results: run 4 vs 1-3

28 Summary MACS v2 is easy to use even on command line
Has many settings, but finally only a few that really need to be used, and fewer still that (typically) need optimizing for you project BUT, if you need to optimize, there are many options for doing it Can also detect broad peak regions and allows for custom analysis protocols via the bedGraph outputs


Download ppt "ChIP-seq analysis with MACS2 Tips and tricks"

Similar presentations


Ads by Google