Chromatin Immuno-precipitation (CHIP)-chip Analysis 11/07/07

Experimental Protocol Step 1: crosslink protein with DNA Step 2: sonication (break) DNA Kim and Ren 2007

Experimental Protocol Step 1: crosslink –fix protein with DNA Step 2: sonication –break DNA Step 3: immuno- precipitation –Pull down target protein by specific antibody Kim and Ren 2007

Experimental Protocol Step 1: crosslink –fix protein with DNA Step 2: sonication –break DNA Step 3: immuno- precipitation –Pull down target protein by specific antibody Step 4: hybridization –Hybridize input and pulled-down DNA on microarray Kim and Ren 2007

Intergenic microarray Array probes are PCR products of intergenic regions. Binding signal is represented by a single probe.

ChIP-array Consistently enriched in repeated ChIP-arrays are selected to be the TF binding targets Usually hundreds of targets, each ~1000 long We want to know the precise binding (e.g. 10 bases) TF Target

Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling arrays

Tiling Array Data Each TF binding signal is represented by multiple probes. Need more sophisticated statistical tools. Kim and Ren 2007

Methods Moving average t-test (Keles et al. 2004) HMM (Li et al. 2005; Yuan et al. 2005) Tilemap (Ji and Wong 2005) MAT (Johnson et al. 2006)

Keles’ method Calculate a two-sample t- statistic Y2Y2 Y1Y1 i CHIP-signal Input-signal Keles et al. 2004

Keles’ method Calculate a two-sample t- statistic Y2Y2 Y1Y1 i CHIP-signal Input-signal w Moving average scan- statistic

Multiple hypothesis testing Multiple hypothesis testing needs to be considered to control false positive error rates. What is the null distribution of this statistic?

Multiple hypothesis testing Assume has t-distribution Approximate by normal distribution. Alternatively can use resampling method to estimate the null distribution.

Tilemap Improvement over Keles’ method in following ways Use a more robust test statistic Estimate the null distribution without prior assumptions. Ji and Wong 2005

Step 1: calculating a t-like test statistic Model: log-intensity Probe index Condition indexReplicate index

Step 1: calculating a t-like test statistic Model: log-intensity pooling data

Two samples: Multiple samples: Step 1: calculating a t-like test statistic Want to have a robust estimate of variance.

Notation Step 1: calculating a t-like test statistic Estimation of by variance shrinkage Shrinkage factor

Step 2: Merging data Moving average Alternatively use Hidden Markov Model

Step 3: control FDR Goal: To find null and signal distributions Idea: assume a mixture model This is unidentifiable!

Step 3: control FDR Goal: To find null and signal distributions Idea: assume a mixture model This is unidentifiable! A clever trick: Look for with

How to find g 0 and g 1 To get g 1, can we select probes with highest t-score? Why or why not?

How to find g 0 and g 1 Idea: signals at neighboring probes are correlated, whereas noises are not (hopefully!) First select probes that have the highest t- score t i. Use their downstream value t i+1 to estimate g 1. Use same trick to estimate g 0.

Step 3: control FDR Goal: To find null and signal distributions Idea: assume a mixture model This is unidentifiable! A clever trick: Find with Additional assumption:

Step 3: Unbalanced mixture score with is estimated by fitting

False discovery rate (FDR) Determine TF bindings sites are FDR cutoff

Example: Analysis of a cMyc binding data

Comparison of models

Simulation results

MAT Basic Idea: Baseline level correction Standardize probe intensity with respect to the expected baseline value (Johnson et al. 2006)

MAT How to estimate the baseline values?

Estimated nucleotide effect A C

MAT Standardization

(X.S. Liu)

Reading List Keles el 2004 –Developed a multiple hypothesis method for tiling array analysis Ji and Wong 2005 –Tilemap; improved over Keles et al.’s method Johnson et al. 2006 –MAT: showed baseline adjustment improved signal detection.

