Presentation is loading. Please wait.

Presentation is loading. Please wait.

ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.

Similar presentations


Presentation on theme: "ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013."— Presentation transcript:

1 ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013

2 Work flow for chip-seq analysis ChIP-seq data can be retrieved from specialized databases such as Gene Expression Omnibus (GEO). The GEO database allows to retrieve sequences at various processing stages.  Read sequences: typically, several millions of short sequences (36bp).  Read locations: chromosomal coordinates of each aligned read. Typically, several millions of coordinates of short fragment (36bp).  Peak locations: several thousands of variable size regions (typically between 100bp and 10kb). A technological bottleneck lies in the next step: exploitation of full peak collections to discover motifs and predict binding sites. 2 Data retrieval GEO Raw reads + quality(fastq) Read mapping Alignments Peak calling Read clean-up Cleaned reads Peaks Motif discovery Over-represented motifs Pattern matching Binding sites

3 Read pre-processing and mapping Legend Result Program User input Raw reads (fastq) Quality checking fastqc Quality report (html) Adaptor trimming cutadapt Trimmed reads (fastq) Quality filtering prinseq Quality-filtered reads (fastq) Duplicate filtering rmdup (samtools) Duplicate-filtered reads (fastq) Read mapping bowtie (Tuxedo) Alignments (sam) Compression view (samtools) Compressed alignments (bam) Sorting by genomic coordinates sort (samtools) Sorted alignments (bam) Indexing index (samtools) Alignment index (bai) Visualization IGV IGB tracker (Galaxy) UCSC genome browser Image Conversion bamToBed (bedtools) Read coordinates (bed) Conversion ??? (Kent tools) Genomic density profile (bedgraph, bg) Conversion bedgraphToBigWig (Kent tools) Genomic density profile (bigwig, bw)

4 From reads to peaks Legend Result Program User input Test alignments (bam) Quality checking - fastqc Quality report (html) Adaptor trimming - cutadapt Trimmed reads (fastq) Quality filtering - prinseq Input alignments (bam) Peak calling MACS SICER PeakFinder SPP SWEMBL... Enriched regions or peaks (bed)Genomic density profile (wig)

5 Evaluating the quality of peak collections

6 Slicing the peak collection Recipe  Sort peaks by decreasing score  Select n top peaks (“top slice”) n bottom peaks (“bottom slice”) a few intermediate slices of n peaks  Analyse enrichment for a reference motif (annotated or discovered from the data) in the successive slices. Slice 1 (top) Slice 5 (bottom) Slice 2 Slice 3 Slice 4

7 GATA3 – reasonably good peak collection sample: GSM774297

8 GATA3 – poor quality peak collection The top slice shows some enrichment The other slices are no more enriched than the theoretical (random) expectation Negative control: scanning sequences with permuted matrices fits the theoretical expectation. sample: GSM523222


Download ppt "ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013."

Similar presentations


Ads by Google