ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.

Slides:



Advertisements
Similar presentations
Female DNase: Male DNase: Female Control: Male Control: Figure S1: Overview of DNase hypersensitivity on chr5. Shown are all sequence reads across the.
Advertisements

Max-Planck-Institut für molekulare Genetik Software Praktikum, Folie 1 Comparing Methods for Identifying Transcription Factor Target Genes Alena.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
High Throughput Sequencing
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
Bioinformatics Analysis Team McGill University and Genome Quebec Innovation Center
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
Data Formats & QC Analysis for NGS Rosana O. Babu 8/19/20151.
NGS Analysis Using Galaxy
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
RNA-seq workshop ALIGNMENT
EDACC Primary Analysis Pipelines Cristian Coarfa Bioinformatics Research Laboratory Molecular and Human Genetics.
ChIP-seq hands-on Iros Barozzi, Campus IFOM-IEO (Milan) Saverio Minucci, Gioacchino Natoli Labs.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
EDACC Quality Characterization for Various Epigenetic Assays
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Runx1-VE+ Runx1+VE+CD41-Runx1+VE+CD41+Runx1+VE-CD41+ Supplementary Figure 1 Supplementary Figure 1: Validation of cell populations for gene expression.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
No reference available
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Accessing and visualizing genomics data
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
Computing on TSCC Make a folder for the class and move into it –mkdir –p /oasis/tscc/scratch/username/biom262_harismendy –cd /oasis/tscc/scratch/username/biom262_harismendy.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
HOMER – a one stop shop for ChIP-Seq analysis
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Canadian Bioinformatics Workshops
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Visualizing data from Galaxy
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
From Reads to Results Exome-seq analysis at CCBR
Canadian Bioinformatics Workshops
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Introductory RNA-seq Transcriptome Profiling
Using command line tools to process sequencing data
Day 5 Mapping and Visualization
Integrative Genomics Viewer (IGV)
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Chip – Seq Peak Calling in Galaxy
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
GE3M25: Data Analysis, Class 4
GE3M25: Data Analysis, Class3
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
TSS Annotation Workflow
ChIP-Seq Data Processing and QC
Epigenetics System Biology Workshop: Introduction
Maximize read usage through mapping strategies
ChIP-seq Robert J. Trumbly
Chip – Seq Peak Calling in Galaxy
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013

Work flow for chip-seq analysis ChIP-seq data can be retrieved from specialized databases such as Gene Expression Omnibus (GEO). The GEO database allows to retrieve sequences at various processing stages.  Read sequences: typically, several millions of short sequences (36bp).  Read locations: chromosomal coordinates of each aligned read. Typically, several millions of coordinates of short fragment (36bp).  Peak locations: several thousands of variable size regions (typically between 100bp and 10kb). A technological bottleneck lies in the next step: exploitation of full peak collections to discover motifs and predict binding sites. 2 Data retrieval GEO Raw reads + quality(fastq) Read mapping Alignments Peak calling Read clean-up Cleaned reads Peaks Motif discovery Over-represented motifs Pattern matching Binding sites

Read pre-processing and mapping Legend Result Program User input Raw reads (fastq) Quality checking fastqc Quality report (html) Adaptor trimming cutadapt Trimmed reads (fastq) Quality filtering prinseq Quality-filtered reads (fastq) Duplicate filtering rmdup (samtools) Duplicate-filtered reads (fastq) Read mapping bowtie (Tuxedo) Alignments (sam) Compression view (samtools) Compressed alignments (bam) Sorting by genomic coordinates sort (samtools) Sorted alignments (bam) Indexing index (samtools) Alignment index (bai) Visualization IGV IGB tracker (Galaxy) UCSC genome browser Image Conversion bamToBed (bedtools) Read coordinates (bed) Conversion ??? (Kent tools) Genomic density profile (bedgraph, bg) Conversion bedgraphToBigWig (Kent tools) Genomic density profile (bigwig, bw)

From reads to peaks Legend Result Program User input Test alignments (bam) Quality checking - fastqc Quality report (html) Adaptor trimming - cutadapt Trimmed reads (fastq) Quality filtering - prinseq Input alignments (bam) Peak calling MACS SICER PeakFinder SPP SWEMBL... Enriched regions or peaks (bed)Genomic density profile (wig)

Evaluating the quality of peak collections

Slicing the peak collection Recipe  Sort peaks by decreasing score  Select n top peaks (“top slice”) n bottom peaks (“bottom slice”) a few intermediate slices of n peaks  Analyse enrichment for a reference motif (annotated or discovered from the data) in the successive slices. Slice 1 (top) Slice 5 (bottom) Slice 2 Slice 3 Slice 4

GATA3 – reasonably good peak collection sample: GSM774297

GATA3 – poor quality peak collection The top slice shows some enrichment The other slices are no more enriched than the theoretical (random) expectation Negative control: scanning sequences with permuted matrices fits the theoretical expectation. sample: GSM523222