ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Periodic clusters. Non periodic clusters That was only the beginning…
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Understanding the Human Genome: Lessons from the ENCODE project
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Current Topics of Genomics and Epigenomics. Outline  Motivation for analysis of higher order chromatin structure  Methods for studying long range chromatin.
Analysis of ChIP-Seq Data
Epigenetics 12/05/07 Statisticians like data.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Transcription factor binding motifs (part I) 10/17/07.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
1 1 - Lectures.GersteinLab.org Overview of ENCODE Elements Mark Gerstein for the "ENCODE TEAM"
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA
More on TF Motif Finding ChIP-chip / seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Regulation of Gene Expression Eukaryotes
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
I519 Introduction to Bioinformatics, Fall, 2012
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
EDACC Quality Characterization for Various Epigenetic Assays
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
A B IL-4(+) IL-4(-) IL-4(+) IL-4(-) ChIP-Seq (STAT6) Ramos IL-4 (+) P-value Ramos IL-4 (-) P-value BEAS2B IL-4 (+) P-value BEASB IL-4 (-) P-value fold.
Overview of ENCODE Elements
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
DNAse Hyper-Sensitivity BNFO 602 Biological Sequence Analysis, Spring 2014 Mark Reimers, Ph.D.
Motif Search and RNA Structure Prediction Lesson 9.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Transcription factor binding motifs (part II) 10/22/07.
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
ChIP-seq Downstream Analysis Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Motif Finding Continued
Epigenetics Continued
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states by Thu-Hang Pham, Christopher.
by Kentson Lam, Alexander Muselman, Randal Du, Yuka Harada, Amanda G
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Volume 16, Issue 8, Pages (August 2016)
Volume 62, Issue 1, Pages (April 2016)
Fine-Resolution Mapping of TF Binding and Chromatin Interactions
Evolution of Alu Elements toward Enhancers
Volume 132, Issue 6, Pages (March 2008)
Volume 32, Issue 6, Pages (June 2010)
Genomewide profiling of chromatin accessibility in prostate cancer specimens Genomewide profiling of chromatin accessibility in prostate cancer specimens.
Increased signal intensity and significant enrichment of transcription factor motifs is observed with DSG in breast tissue. Increased signal intensity.
ChIP-seq analyses in primary breast tissue.
ChIP-seq analyses in primary in prostate tissue.
Integrative analysis of 111 reference human epigenomes
Transcriptional and genomic targets of EN1 in TNBC cells.
EN1-associated chromatin complexes in breast cancer cells.
Chromatin state mapping pinpoints PAX3–FOXO1 (P3F) in active enhancers
Presentation transcript:

ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215

Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations / Uniquely mapped reads Good to keep one read / location in peak calling 2

Peak Calls Tag distribution along the genome ~ Poisson distribution (λ BG = total tag / genome size) ChIP-Seq show local biases in the genome –Chromatin and sequencing bias – bp control windows have to few tags –But can look further Dynamic λ local = max(λ BG, [λ ctrl, λ 1k,] λ 5k, λ 10k ) ChIP Control 300bp 1kb 5kb 10kb Zhang et al, Genome Bio, 2008

Peak Call Statistics P-value and FDR Simulation: random sampling of reads? FDR = A / B, BH correction or Qvalue P-value / FDR changes with sequencing depth Fold change does not 4 ABAB

ChIP-seq QC Number of peaks with good FDR and fold change FRiP score: –Fraction of reads in peaks –Often higher for histone modifications than transcription factors –Often increase slightly with increasing read depth Overlap with union of peaks in public DNase-seq data –Working ChIP-seq peaks overlap > 70% of union DHS 5

DNase-seq Captures all regulatory sequences in the prostate genome 6 6 Sabo et al, Nat Methods 2006; Thurman et al, Nat 2012

ChIP-seq QC Evolutionary conservation –Can be used for ChIP QC Conserved sites more functional? –Majority of functional sites not conserved 7 Odom et al, Nat Genet 2007

Enrichment Distribution CEAS (Shin et al, Bioinfo, 2009) –Meta-gene profiles: TF and histone marks –% of peaks at promoter, exons, introns, and distal intergenic sequences –SitePro of signal at specific sites Replicate agreement: > 60% or > 0.6 8

ChIP-seq Downstream Analysis 9

Target Gene Assignment 10 Protein Gene Regulate Transcribe Yeast TF Regulatory Network

Human TF Binding Distribution Most TF binding sites are outside promoters How to assign targets? Nearest distance? Binding within 10KB? Number of binding? Other knowledge? 11

Higher Order Chromatin Interactions Chromatin confirmation capture

Hi-C Interactions follows exponential decay with distance Lieberman-Aiden et al, Science 2009

How to Assign Targets for Enhancer Binding Transcription Factors? Regulatory potential: sum of binding sites weighted by distance to TSS with exponential decay Decay modeled from Hi-C experiments 14 TSS

Direct Target Identification Binary decision? Rank product of regulatory potential and differential expression BETA 15

Is My Factor an Activator, Repressor, or Both? Most labs have differential expression profiling of transcription factor together with TF ChIP-seq Do genes with higher regulatory potential show more up- or down-expression than all the genes in the genome? 16

ChIP-chip/seq Motif Finding ChIP-chip gives binding regions ~ bp long. Precise binding motif? –Raw data is like perfect clustering, plus enrichment values MDscan –High ChIP ranking => true targets, contain more sites –Search TF motif from highest ranking targets first (high signal / background ratio) –Refine candidate motifs with all targets 17

Similarity Defined by m-match For a given w-mer and any other random w-mer TGTAACGT8-mer TGTAACGTmatched 8 AGTAACGTmatched 7 TGCAACATmatched 6 TGACACGGmatched 5 AATAACAGmatched 4 m-matches for TGTAACGT Pick a reasonable m to call two w-mers similar 18

MDscan Seeds ATTGCAAAT TTTGCGAAT TTTGCAAAT Seed motif pattern ATTGCAAAT A 9-mer TTTGCAAAT TTTGCGAAT Higher enrichment ChIP-chip selected upstream sequences TTGCAAATC CAAATCCAA GAAATCCAC GCAAATCCA GCAAATTCG GCAAATCCA GGAAATCCA GGAAATCCT TGCAAATCC TGCAAATTC GCCACCGT ACCACCGT ACCACGGT GCCACGGC … TTGCAAATC TTGCGAATA TTGCAAATT TTGCCCATC 19

Seed1m-matches Update Motifs With Remaining Seqs Extreme High Rank All ChIP-selected targets 20

Seed1m-matches Refine the Motifs Extreme High Rank All ChIP-selected targets 21

Further Refine Motifs Could also be used to examine known motif enrichment Is motif enrichment correlated with ChIP-seq enrichment? Is motif more enriched in peak summits than peak flanks? Motif analysis could identify transcription factor partners of ChIP-seq factors 22

Estrogen Receptor Carroll et al, Cell 2005 Overactive in > 70% of breast cancers Where does it go in the genome? ChIP-chip on chr21/22, motif and expression analysis found its “pioneering factor” FoxA1 TF?? ER

Estrogen Receptor (ER) Cistrome in Breast Cancer Carroll et al, Nat Genet 2006 ER may function far away ( KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators AP1 ER NRIP

Estrogen Receptor (ER) Cistrome in Breast Cancer Carroll et al, Nat Genet 2006 ER may function far away ( KB) from genes Only 20% of ER sites have PhastCons > 0.2 ER has different effect based on different collaborators AP1 ER NRIP

Cell Type-Specific Binding Same TF bind to very different locations in different tissues and conditions, why? TF concentration? Collaborating factors, esp pioneering factors Interesting observations about pioneering factors 26

Summary ChIP-seq identifies genome-wide in vivo protein- DNA interaction sites ChIP-seq peak calling to shift reads, and calculate correct enrichment and FDR Functional analysis of ChIP-seq data: –Strong vs weak binding, conserved vs non-conserved –Target identification –Motif analysis Cell type-specific binding  Epigenetics 27