Presentation is loading. Please wait.

Presentation is loading. Please wait.

I519 Introduction to Bioinformatics, Fall, 2012

Similar presentations


Presentation on theme: "I519 Introduction to Bioinformatics, Fall, 2012"— Presentation transcript:

1 I519 Introduction to Bioinformatics, Fall, 2012
From ChIP-chip to ChIP-Seq: the study of mammalian transcription factor binding sites and epigenetics

2 From Chip-Chip to Chip-Seq
ChIP-chip (ChIP on tiled microarrays) ChIP-sequencing (ChIP-seq) combines chromatin immunoprecipitation (ChIP) and massively parallel sequencing to identify mammalian DNA sequences bound by transcription factors in vivo.

3 Chromatin immunoprecipitation (ChIP)
between the side chains of two lysines between lysine & cytosine Formaldehyde (CH2O) is a very reactive dipolar compound (the carbon atom is the nucleophilic center). Amino and imino groups of proteins (e.g., the side chains of lysine and arginine) and of nucleic acids (e.g., cytosine) react with formaldehyde, leading to the formation of a Schiff base (reaction I)

4 Chip-Seq workflow Solexa sequencing technology provided short read length sequences of approx 30 base pairs that were ideal for characterizing ChIP-derived fragments. Nature Methods - 4, (2007)

5 Advantages of ChIP-Seq
Single base-pair resolution of direct sequencing ChIP-seq data are likely to have less noise or artifacts potential binding regions need not be specified prior to experiment lower cost, minimal hands-on processing and a requirement for fewer replicate experiments as well as less input material. Epigenetics meets next-generation sequencing. Epigenetics Nov;3(6):318-21

6 Next generation sequencing (NGS) techniques
Illumina/Solexa ABI SOLiD Sequencing Chemistry Pyrosequencing Polymerase-based sequence-by-synthesis Ligation-based sequencing Amplification approach Emulsion PCR Bridge amplification Paired end (PED) separation 3 kb bp Mb per run 100 Mb 1300 Mb 3000 Mb Time per PED run <0.5 day 4 days 5 days Read length (update) bp 35, 75 and 100 bp 35 and 50 bp Cost per run $ 8,438 USD $ 8,950 USD $ 17,447 USD Cost per Mb $ USD $ 5.97 USD $ 5.81 USD

7 Tools for extracting transcription factor targets from ChIP-Seq data
CisGenome uses a conditional binomial model to identify enriched regions when a control data set is provided (Nat. Biotechnol. 26:1293–1300, 2008) MACS (Model-based Analysis of ChIP-Seq) uses the control dataset to model the tag distribution across the genome using the Poisson distribution lBG (Genome Biol, 9:R137, 2009) PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Biotechnol, 27:66–75, 2009) QuEST (Quantitative Enrichment of Sequence Tags) Nat. Methods, 5:829–834, 2008 GLITR (GLobal Identifier of Target Regions) identifies enriched regions in target data by calculating a fold-change based on random samples of control (input chromatin) data

8 Why peak detection is difficult
PeakSeq: Biotechnol, 27:66–75, 2009 The signal for a given transcription factor is the 'convolution' of various effects: the density of mappable bases in a region, the underlying chromatin structure and the actual signal from transcription factor binding. Some fraction of the peaks in the ChIP-seq signal map for a transcription factor might be due to the nature of the open chromatin structure instead of the presence of transcription factor binding--one must compare the signal against one from a control.

9 PeakSeq scoring procedure
Biotechnol, 27:66–75, 2009

10 High-Resolution Profiling of Histone methylations in the human genome
Ref: Cell, 129(4): , 2007 Generated high-resolution maps for the genome-wide distribution of 20 histone lysine and arginine methylations and others across the human genome using the Solexa 1G sequencing technology (The cells were digested with MNase to generate mainly mononucleosomes with minor fraction of dinucleosomes for histone modification mapping) Typical patterns of histone methylations exhibited at promoters, insulators, enhancers, and transcribed regions are identified. The monomethylations of H3K27, H3K9, H4K20, H3K79, and H2BK5 are all linked to gene activation trimethylations of H3K27, H3K9, and H3K79 are linked to repression. H2A.Z (a Histone variant) associates with functional regulatory elements, and CTCF marks boundaries of histone methylation domains.

11 BS-seq for epigenetic profiling
BS-seq (bisulphite sequencing) combines bisulphite treatment of genomic DNA with ultra-high-throughput sequencing Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences

12 Bisulphite sequencing

13 References Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods - 4, (2007)


Download ppt "I519 Introduction to Bioinformatics, Fall, 2012"

Similar presentations


Ads by Google