Presentation is loading. Please wait.

Presentation is loading. Please wait.

ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Similar presentations


Presentation on theme: "ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations."— Presentation transcript:

1 ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215

2 Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations / Uniquely mapped reads Good to keep one read / location in peak calling 2

3 Peak Calls Tag distribution along the genome ~ Poisson distribution (λ BG = total tag / genome size) ChIP-Seq show local biases in the genome –Chromatin and sequencing bias –200-300bp control windows have to few tags –But can look further Dynamic λ local = max(λ BG, [λ ctrl, λ 1k,] λ 5k, λ 10k ) ChIP Control 300bp 1kb 5kb 10kb http://liulab.dfci.harvard.edu/MACS/ Zhang et al, Genome Bio, 2008

4 Peak Call Statistics P-value and FDR Simulation: random sampling of reads? FDR = A / B, BH correction or Qvalue P-value / FDR changes with sequencing depth Fold change does not 4 ABAB

5 ChIP-seq QC Number of peaks with good FDR and fold change FRiP score: –Fraction of reads in peaks –Often higher for histone modifications than transcription factors –Often increase slightly with increasing read depth Overlap with union of peaks in public DNase-seq data –Working ChIP-seq peaks overlap > 70% of union DHS 5

6 DNase-seq Captures all regulatory sequences in the prostate genome 6 6 Sabo et al, Nat Methods 2006; Thurman et al, Nat 2012

7 ChIP-seq QC Evolutionary conservation –Can be used for ChIP QC Conserved sites more functional? –Majority of functional sites not conserved 7 Odom et al, Nat Genet 2007

8 Enrichment Distribution CEAS (Shin et al, Bioinfo, 2009) –Meta-gene profiles: TF and histone marks –% of peaks at promoter, exons, introns, and distal intergenic sequences –SitePro of signal at specific sites Replicate agreement: > 60% or > 0.6 8

9 ChIP-seq Downstream Analysis 9

10 Target Gene Assignment 10 Protein Gene Regulate Transcribe Yeast TF Regulatory Network

11 Human TF Binding Distribution Most TF binding sites are outside promoters How to assign targets? Nearest distance? Binding within 10KB? Number of binding? Other knowledge? 11

12 Higher Order Chromatin Interactions Chromatin confirmation capture

13 Hi-C Interactions follows exponential decay with distance Lieberman-Aiden et al, Science 2009

14 How to Assign Targets for Enhancer Binding Transcription Factors? Regulatory potential: sum of binding sites weighted by distance to TSS with exponential decay Decay modeled from Hi-C experiments 14 TSS

15 Direct Target Identification Binary decision? Rank product of regulatory potential and differential expression BETA 15

16 Is My Factor an Activator, Repressor, or Both? Most labs have differential expression profiling of transcription factor together with TF ChIP-seq Do genes with higher regulatory potential show more up- or down-expression than all the genes in the genome? 16

17 ChIP-chip/seq Motif Finding ChIP-chip gives 10-5000 binding regions ~200- 1000bp long. Precise binding motif? –Raw data is like perfect clustering, plus enrichment values MDscan –High ChIP ranking => true targets, contain more sites –Search TF motif from highest ranking targets first (high signal / background ratio) –Refine candidate motifs with all targets 17

18 Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT8-mer TGTAACGTmatched 8 AGTAACGTmatched 7 TGCAACATmatched 6 TGACACGGmatched 5 AATAACAGmatched 4 m-matches for TGTAACGT Pick a reasonable m to call two w-mers similar 18

19 MDscan Seeds ATTGCAAAT TTTGCGAAT TTTGCAAAT Seed motif pattern ATTGCAAAT A 9-mer TTTGCAAAT TTTGCGAAT Higher enrichment ChIP-chip selected upstream sequences TTGCAAATC CAAATCCAA GAAATCCAC GCAAATCCA GCAAATTCG GCAAATCCA GGAAATCCA GGAAATCCT TGCAAATCC TGCAAATTC GCCACCGT ACCACCGT ACCACGGT GCCACGGC … TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC 19

20 Seed1m-matches Update Motifs With Remaining Seqs Extreme High Rank All ChIP-selected targets 20

21 Seed1m-matches Refine the Motifs Extreme High Rank All ChIP-selected targets 21

22 Further Refine Motifs Could also be used to examine known motif enrichment Is motif enrichment correlated with ChIP-seq enrichment? Is motif more enriched in peak summits than peak flanks? Motif analysis could identify transcription factor partners of ChIP-seq factors 22

23 Estrogen Receptor Carroll et al, Cell 2005 Overactive in > 70% of breast cancers Where does it go in the genome? ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1 TF?? ER

24 Estrogen Receptor (ER) Cistrome in Breast Cancer Carroll et al, Nat Genet 2006 ER may function far away (100-200KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators AP1 ER NRIP

25 Estrogen Receptor (ER) Cistrome in Breast Cancer Carroll et al, Nat Genet 2006 ER may function far away (100-200KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators AP1 ER NRIP

26 Cell Type-Specific Binding Same TF bind to very different locations in different tissues and conditions, why? TF concentration? Collaborating factors, esp pioneering factors Interesting observations about pioneering factors 26

27 Summary ChIP-seq identifies genome-wide in vivo protein- DNA interaction sites ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR Functional analysis of ChIP-seq data: –Strong vs weak binding, conserved vs non-conserved –Target identification –Motif analysis Cell type-specific binding  Epigenetics 27


Download ppt "ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations."

Similar presentations


Ads by Google