Presentation is loading. Please wait.

Presentation is loading. Please wait.

ChIP-seq Robert J. Trumbly

Similar presentations


Presentation on theme: "ChIP-seq Robert J. Trumbly"— Presentation transcript:

1 ChIP-seq Robert J. Trumbly
Department of Biochemistry and Cancer Biology Block Health Science 448, UTHSC

2 ChIP-seq ChIP-seq (chromatin immunoprecipitation followed by DNA sequencing) has become the preferred method for analyzing protein-DNA interactions and chromatin structure on a genomic scale ChIP-seq has become practical because of rapid developments in NGS (next generation sequencing)

3 NGS The transition from microarrays to NGS creates not just more data but a different type of data Microarray data are analog: how much expression (signal) for a gene? NGS data are digital: e.g., which splicing variant is expressed?

4 NGS RNA-seq: can detect splicing variants, allelic expression, novel mRNAs ChIP-seq: can detect differential binding to allelic variants, leading to information about binding specificity

5 Park, Oct 2009

6 TFs: sharp binding sites
RNA Pol II: sharp and extended Histone modifications: extended domains Park, Oct 2009

7 Park, Oct 2009

8 ChIP-seq and RNA-seq analysis
Pepke et al., Nature Methods 6:S22-S

9 This example shows a workflow for the analysis of data from chromatin immunoprecipitation followed by sequencing (ChIP–seq). This analysis can be done by a bench scientist using current resources, and a similar strategy could be used for other types of next-generation sequencing data. Blue boxes show steps that can be performed using Galaxy. Integration or cross-sectioning of data can often be done in the University of California-Santa Cruz (UCSC) Genome Browser or by joining lists in Galaxy (purple box). Downstream steps, such as known motif analysis and Gene Ontology analysis, can be achieved with online or stand-alone tools (orange boxes). Galaxy can also be used to establish analytical pipelines for calling SNPs that could then be integrated into sequencing-based data, such as reads from ChIP–seq. CEAS, Cis-regulatory Element Annotation System; MACS, Model-based Analysis of ChIP–Seq; TSS, transcription start site. Hawkins 2010

10 FASTQ files Output of NGS usually in FASTQ files
@SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 Output of NGS usually in FASTQ files Line followed by sequence id Line 2: sequence Line 3: +, sometimes followed by text Line 4: quality score for each base, encoded as ASCII symbol

11 Quality scores Phred quality score, Q = -10 log10p, where p = the probability that the corresponding base call is incorrect. Example: p = 0.001, log(0.001) = -3 Q = - 10 X -3 = 30 For the FASTQ file, an offset of 33 (for the most common encoding) is added to the raw quality score, and the ASCII symbol corresponding to that number is stored and displayed. There are several variations on the quality score encoding, so programs that interpret the scores must know the proper version

12 Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells Chen et al., Cell 133,13 June 2008, Pages 1106–1117 Chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12).

13 Figure 1 Genome-Wide Mapping of 13 Factors in ES Cells by Using ChIP-seq Technology TFBS profiles for the sequence-specific transcription factors and mock ChIP control at the Pou5f1 and Nanog gene loci are shown.

14 Figure 2 Identification of Enriched Motifs by Using a De Novo Approach Matrices predicted by the de novo motif-discovery algorithm Weeder.

15 ChIP-seq tutorial Chip-seq Analysis with Galaxy: from reads to peaks (and motifs) 2 - Obtaining the raw data: Accessing ChIP-seq reads from ArrayExpress database 3 - Upload the reads in the Galaxy server 4 - Some statistics on the raw data 5 - Mapping the reads with Bowtie 6 - Peak calling with MACS 7 - Retrieving the peak sequences 8 - Visualize the peak regions in UCSC genome browser 9 - Try to identify over represented motifs

16 ChIP-seq tutorial Revision to tutorial:
Part 2, step 4: click on name of entry Part 2, step 5: click on ENA link at bottom of page Part 4, step 2: there is no FASTX-Toolkit for FASTQ data section, the tools here are under the general heading NGS: QC and manipulation. There is also a new FastQC:Read QC tool here that is useful.

17 References For tutorial: Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells. Chen et al., Cell Volume 133, 13 June 2008, Pages 1106–1117 The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Cock et al., Nucleic Acids Research, 2010, Vol. 38, No –1771. Computation for ChIP-seq and RNA-seq studies. Pepke et al., Nature Methods SUPPLEMENT | VOL.6 NO.11s | NOVEMBER 2009 | S23. ChIP–seq: advantages and challenges of a maturing technology. Park et al., Nature Reviews | Genetics 10 | October 2009 | Next-generation genomics: an integrative approach. Hawkins et al., NATURE REVIEWS | Genetics 11 | July 2010 |


Download ppt "ChIP-seq Robert J. Trumbly"

Similar presentations


Ads by Google