ChIP-Seq Analysis – Using CLCGenomics Workbench

Slides:



Advertisements
Similar presentations
Introduction to CLC Main Workbench 20 June, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
RNA-seq Analysis in Galaxy
NGS Analysis Using Galaxy
InfoBoosters: Connecting Texts with Databases BOOST BOX, DEC 9, 2014 NATIONAL NETWORK OF LIBRARIES OF MEDICINE, MIDDLE ATLANTIC REGION ANSUMAN CHATTOPADHYAY,
Copyright © 2011 Partek Incorporated. All rights reserved. Statistics Visualizations Annotations Start-to-Finish Analysis of Integrated Genomics.
National Center for Genome Analysis Support: Carrie Ganote Ram Podicheti Le-Shin Wu Tom Doak Quality Control and Assessment.
LO2 Understand the key components used in networking
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Computer Lab (I) Introduction of galaxy and UCSC genome browser.
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
The Role of the Operating System
Next Generation DNA Sequencing
Android architecture & setting up. Android operating system comprises of different software components arranges in stack. Different components of android.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
 Database Administration Installing Oracle 11g & Creating Database.
EDACC Quality Characterization for Various Epigenetic Assays
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
The iPlant Collaborative
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
DIT314 ~ Client Operating System & Administration
Introductory RNA-seq Transcriptome Profiling
Pathway Informatics 16th August, 2017
Using command line tools to process sequencing data
Nebula : A public web-server for advanced ChIP-seq data analysis
Canadian Bioinformatics Workshops
Placental Bioinformatics
Regulation of Gene Expression
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Integrative Genomics Viewer (IGV)
NGS Analysis Using Galaxy
Figure 1. The overall workflow of RNA-seq QC
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Biology in silico: Creation of an online bioinformatics.
From: JAMM: a peak finder for joint analysis of NGS replicates
How To Install Panda Antivirus For Mac?
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Pathway Visualization
Workshop on Microbiome and Health
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Covering the Bases: Carrie Iwema, PhD, MLS
Transcriptome Assembly
The Celera Genome Browser: A Tool for Visualizing and Annotating the Human Genome
ChIP-Seq Data Processing and QC
Bulk RNA-Seq Analysis Using CLCGenomics Workbench
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
Material for today’s workshop is at:
Volume 61, Issue 1, Pages (January 2016)
Garbage In, Garbage Out: Quality control on sequence data
ChIP-seq Robert J. Trumbly
Volume 17, Issue 6, Pages (November 2016)
Fourier Transform of Boundaries
Human Promoters Are Intrinsically Directional
Evolution of Alu Elements toward Enhancers
Volume 132, Issue 2, Pages (January 2008)
Pathway Visualization
Additional file 2: RNA-Seq data analysis pipeline
Volume 9, Issue 3, Pages (November 2014)
Transcriptomics Data Visualization Using Partek Flow Software
Bulk RNA-Seq Analysis Using CLCGenomics Workbench
Campus and Phoenix Resources
Quality Control & Nascent Sequencing
Presentation transcript:

ChIP-Seq Analysis – Using CLCGenomics Workbench Nov 16,2017 Ansuman Chattopadhyay, PhD Health sciences library system University of pittsburgh ansuman@pitt.edu

Transcription Factor ChIP-Seq Histone ChIP-Seq ATAC-Seq Topics Transcription Factor ChIP-Seq Histone ChIP-Seq ATAC-Seq

www.hsls.libguides.com/chipseq

Transcription Factor and Histone ChIP-Seq

ATAC-Seq Study

Galaxy : http://galaxy.crc.pitt.edu:8080/ Graphical User Interface based software Galaxy : http://galaxy.crc.pitt.edu:8080/ CLC Genomics Workbench

Software @ HSLS MolBio http://hsls.libguides.com/molbio/licensedtools/resources

NGS Software @ HSLS MolBio NGS Analysis Sanger Seq Analysis Human , Mouse and Rat NGS Analysis

CLCbio Genomics Workbench System Requirements Windows Vista, Windows 7, Windows 8, Windows 10, Windows Server 2008, or Windows Server 2012 Mac OS X 10.7 or later. Linux: Red Hat 5.0 or later. SUSE 10.2 or later. Fedora 6 or later. 8 GB RAM required 16 GB RAM recommended 1024 x 768 display required 1600 x 1200 display recommended Intel or AMD CPU required Minimum 10 GB free disc space in the tmp directory

CLC Plugins to Install CLC Workbench Client Plugin Histone ChIP-Seq Advanced Peak Shape Tools Plugin – Beta Download available at Top Right Corner

Integrating with the CLCbio Genomics Server @ CRC http://core.sam.pitt.edu/CLCBioServer

You need Secure Remote Access via Pulse to run CLCGx from off campus locations / Pitt Wireless

CLC files at the CRC HTC Cluster Reference Sequences Look for Folders organized by PI’s name

Create Folders at CRC-HTC

Create Folder in SaM-HTC Cluster 1 2

Create Workshop Folder@ FRANK 1 2 3

ChIP-Seq Workflow

Dataset https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63716

GEO Dataset https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63716

Download FASTQ Reads MyoD_Undiff_ChIP-Seq

Download FASTQ Reads MyoD_Undiff_ChIp-Seq

ENA : Download FASTQ Reads MyoD_Undiff_ChIp-Seq

Import : FASTQ Reads MyoD_Undiff_ChIp-Seq 1

Import : FASTQ Reads MyoD_Undiff_ChIp-Seq (single)

GEO Dataset – ATAC-Seq

STEP 1: Import Reads to CLC (Paired End) 2

STEP 1: Import Reads to CLC (Paired End) 3 4 5

FASTQ format http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/

FASTQ Reads

FASTQC Project http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Step 2: Create a Seq QC Report 1 2

Trim Reads – Adapter Seq etc.

Create Adapter List

Create Adapter List

Create FAST QC Report

FASTQC Report

Read Mapping to Ref Genome http://www.ensembl.org/info/data/ftp/index.html

Read Mapping to Ref Genome

Read Mapping to Ref Genome

Read Mapping to Ref Genome

Read Mapping to Ref Genome

Read Mapping around GM20652 Result from MyOD1 ChIP-Seq

Peak Calling Strino etal.,BMC Bioinformatics, June 2016

Peak Calling Strino etal.,BMC Bioinformatics, June 2016 Landt etal.,Genome Research,2012

Peak Calling Strino etal.,BMC Bioinformatics, June 2016

Discovering Obvious Peaks  The CLC shape-based peak caller finds peaks by building a Gaussian filter based on the mean and variance of the fragment length distribution, which are inferred from the cross-correlation profile Strino etal.,BMC Bioinformatics, June 2016

Peak Shape Score The Peak Shape Score is standardised and follows a standard normal distribution, so a p-value for each genomic position can be calculated as  p-value=Φ(−Peak Shape Score of the peak centre), where Φ is the standard normal cumulative distribution function. Score = genomic coverage * filter; *: cross-correlation operator Score indicates how likely a genomic position is to be a center of a peak Strino etal.,BMC Bioinformatics, June 2016

Once the positive and negative regions have been identified, Peak Shape Filter Once the positive and negative regions have been identified, the CLC shape-based peak caller learns a filter that matches the average peak shape, which is called Peak Shape Filter. Strino etal.,BMC Bioinformatics, June 2016

Peak Shape Filter Strino etal.,BMC Bioinformatics, June 2016

Peak Detection peaks are called by first identifying the genomic positions whose p-value is higher than the specified threshold and which do not have any higher value in a window around them. The size of this window is determined by the filter as the longest distance between two positive values in the filter. These maxima define the center of the peak, while the peak boundaries are identified by expanding from the center both left and right until either the score becomes 0 or the peak touches a window boundary Strino etal.,BMC Bioinformatics, June 2016

Call Peaks using Peak Shape information

Call Peaks using Peak Shape information

Call Peaks using Peak Shape information

Peak Calls Result

Peak Calls Result

Annotate Peaks with near by genes

Annotate Peaks with near by genes

5Prime and 3Prime Gene Distance

ChIP-Seq Result

Compare Datasets

Compare Datasets

Compare Datasets

Compare Datasets

Commonly Used Open-Source Tool https://pypi.python.org/pypi/MACS2

Comparison of CLC Results with MACS2.0

Histone ChIP-Seq Li etal., Cell 2007.01.015

Histone ChIP-Seq

Histone Modifications Li etal., Cell 2007.01.015

Running Histone ChIP-Seq Classify Regions of variable length by Peak Shape

Running Histone ChIP-Seq

Running Histone ChIP-Seq

Running Histone ChIP-Seq

Histone ChIP-Seq Result

Histone ChIP-Seq Result Classified Gene Regions in the genome

H3K4Me3 – Diff : Result by Txnfactor ChIP-Seq tool

ATAC-Seq

ATAC-Seq Data Analysis

Comparison of DNAse-Seq Results

HSLS-MBIS and Genomics Analysis Core GAC Ansuman Chattopadhyay, PhD 412-648-1297 ansuman@pitt.edu Uma Chandran, PhD, MSIS 412-648-9326 Chandran@pitt.edu Sri Chaparala srichaparala@pitt.edu Carrie Iwema, PhD, MLS 412-383-6887 iwema@pitt.edu http://hscrf.pitt.edu/

Thanks To…. CLCBio Center for Research Computing Shawn Prince HSLS Sri Chaparala Carrie Iwema David Leung Michael Sweezer CLCBio Shawn Prince Center for Research Computing Mu Fangping