ChIP-chip Data
DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation, methylation, … Sporadic proteins (Transcription Factors) –Mediate docking of transcription apparatus –Modify histones –Methylate DNA
Histones Histones are an ancient family of proteins which serve as the scaffold for DNA Four types of histones assemble in pairs to form a nucleosome DNA is wrapped twice around each nucleosome
Histones and Modifications DNA contacts histones on their tails Histone tails can be modified Histones can stay loose or assemble tightly – this compacts the DNA
Transcription Factors General – help to set up transcription of many genes Specific – draw in general factors or RNA Pol II to specific genes TATA Binding Protein
DNA Methylation Adding a Methyl to Cytosine Cytosine methylation is passed on to daughter cells
Chromatin Immuno-precipitation
Tiling Array One probe every n base pairs over some length of chromosome –Interrupted by repeat regions Promoter array: each (known) promoter tiled An Affymetrix tiling design
What the data look like histone acetylation on 15 samples over one promoter (raw)
Multiple Promoters
Normalized by Medians
Methods and Issues Normalization –Different enrichment ratios –Different probe thermodynamics –Dye and probe bias Estimation –Categorical or continuous? –Individual values are noisy: For TF binding: where is the peak?
Normalization Basic idea: compensate technical variables Technique differences should affect different probes differently Try to estimate what part of signal can be attributed to technical factors Easiest variable to access: sequence
MAT One color Affy array –Needs separate array for comparison Normalizes probe thermodynamics & enrichment ratio Estimation by (robust) moving average
Normalized Data – Rare Event
Normalized Data – Common Event
Estimation Try to build an intelligent moving average Not all neighbors will be similar Typical TF binds to 8bp –Pol II may spread wider Typical fragment is bp Cannot resolve < 200 bp Pol II binding on a 100 bp grid
TileMap Ignores normalization ‘Shrinkage’ estimator of variance –Improves individual scores Smooths noise by moving average