Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015

Slides:



Advertisements
Similar presentations
NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
1 Using the TFOE transcriptional regulation network spreadsheet tool Tige Rustad, Senior Scientist at Seattle Biomed
Institute for Quantitative & Computational Biosciences Workshop4: NGS- study design and short read mapping.
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Customized cloud platform for computing on your terms !
1 Using the TFOE transcriptional regulation network spreadsheet tool Tige Rustad, Senior Scientist at Seattle Biomed
Guideline for ClinLabGeneticist tool Jinlian Wang
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
MES Genome Informatics I - Lecture V. Short Read Alignment
Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Transcriptome Analysis
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis Karsten Hokamp, PhD Genetics TCD, 16/11/2015.
Trinity College Dublin, The University of Dublin GE3M25: Bioinformatics Karsten Hokamp, PhD Genetics TCD, 05/11/2015.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
GE3M25: Computer Programming for Biologists Python, Class 5
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 4 Karsten Hokamp, PhD Genetics TCD, 01/12/2015.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
Short Read Workshop Day 5: Mapping and Visualization
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Visualizing data from Galaxy
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-seq Transcriptome Profiling
Computing challenges in working with genomics-scale data
Using command line tools to process sequencing data
NGS File formats Raw data from various vendors => various formats
Day 5 Mapping and Visualization
Cancer Genomics Core Lab
Dowell Short Read Class Phillip Richmond
RNA Sequencing Day 7 Wooohoooo!
Integrative Genomics Viewer (IGV)
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
SAGExplore web server tutorial for Module III:
Short Read Sequencing Analysis Workshop
Chip – Seq Peak Calling in Galaxy
GE3M25: Data Handling – ChIP-Seq
Introductory RNA-Seq Transcriptome Profiling
GE3M25: Data Analysis, Class 4
GE3M25: Data Handling – ChIP-Seq
GE3M25: Data Analysis, Class3
Rod Eyles1, John Juma1, Morag Ferguson1, Trushar Shah1 1 IITA, Nairobi
ChIP-Seq Data Processing and QC
Maximize read usage through mapping strategies
Computational Pipeline Strategies

Quality Control & Nascent Sequencing
Presentation transcript:

Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015

Trinity College Dublin, The University of Dublin GE3M25 Data Handling Module Content Python Programming Bioinformatics ChIP-Seq analysis

Trinity College Dublin, The University of Dublin Poject report hand-in: 23/12/2015 or 10/01/2016 ?

Trinity College Dublin, The University of Dublin Class 4: Project overview Read mapping Peak detection Visualisation

Trinity College Dublin, The University of Dublin ChIP-Seq Different sets of genes are expressed under different conditions Regulated through transcription factors that bind to promoters Binding can be captured by ChIP Bound sequences are revealed through NGS

Trinity College Dublin, The University of Dublin

ChIP-Seq Analysis Goal

Trinity College Dublin, The University of Dublin Steps in this class: 1.Download FastQ data set (ChIP-Seq of TF in yeast) 2.Read mapping (Bowtie2) 3.Generate indexed and sorted BAM file 4.Peak calling 5.Visualisation (IGV) 6.Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin Data Download: Start here: bioinf.gen.tcd.ie/GE3M25 GE3M25 Project

Trinity College Dublin, The University of Dublin Data Download: bioinf.gen.tcd.ie/GE3M25/project GE3M25 Project Optional control file for improved results

Trinity College Dublin, The University of Dublin Data Download: bioinf.gen.tcd.ie/GE3M25/project/data GE3M25 Project

Trinity College Dublin, The University of Dublin Steps in this class: 1.Download FastQ data set (ChIP-Seq of TF in yeast) ✔ 2.Read mapping (Bowtie2) 3.Generate indexed and sorted BAM file 4.Peak calling 5.Visualisation (IGV) 6.Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin Installing Bowtie Start here: bioinf.gen.tcd.ie/GE3M25/project GE3M25 Project

Trinity College Dublin, The University of Dublin Installing Bowtie Switch to Terminal, unpack GE3M25 Project

Trinity College Dublin, The University of Dublin Generate Index Download reference sequence from bioinf.gen.tcd.ie/GE3M25/project GE3M25 Project

Trinity College Dublin, The University of Dublin Generate Index Run bowtie2-build GE3M25 Project./bowtie2-build S288C_reference_sequence_R64-2-1_ fsa yeast programreference sequencename for index

Trinity College Dublin, The University of Dublin Generate Index Run bowtie2-build GE3M25 Project./bowtie2-build S288C_reference_sequence_R64-2-1_ fsa.txt yeast programreference sequencename for index Added by browser?

Trinity College Dublin, The University of Dublin Generate Index Check output files GE3M25 Project

Trinity College Dublin, The University of Dublin Read mapping Run bowtie2 GE3M25 Project./bowtie2 -x yeast -U ChIP.fastq.gz -p 4 -S sam programFastQ fileindex name output use of four CPU cores

Trinity College Dublin, The University of Dublin Read mapping Run bowtie2 GE3M25 Project./bowtie2 -x yeast -U ChIP.fastq.gz -p 4 > sam program FastQ file index name output to screen redirected to file use of four CPU cores

Trinity College Dublin, The University of Dublin GE3M25 Project Read mapping Output summary

Trinity College Dublin, The University of Dublin GE3M25 Project Read mapping - other options (see Bowtie2 manual)

Trinity College Dublin, The University of Dublin Steps in this class: 1.Download FastQ data set (ChIP-Seq of TF in yeast) ✔ 2.Read mapping (Bowtie2) ✔ 3.Generate indexed and sorted BAM file 4.Peak calling 5.Visualisation (IGV) 6.Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin GE3M25 Project Installing samtools Start here: bioinf.gen.tcd.ie/GE3M25/project Download and chmod +x samtools in Terminal

Trinity College Dublin, The University of Dublin GE3M25 Project./samtools view -b -S sam > bam program input file command output Paramters -b  output in binary format -S  input in SAM format Sorting and indexing - Change from SAM to BAM format redirection of output to file

Trinity College Dublin, The University of Dublin GE3M25 Project./samtools view -b -S sam > bam Sorting and indexing - Change from SAM to BAM format use 4 cores for compression

Trinity College Dublin, The University of Dublin GE3M25 Project./samtools sort bam sorted program input file command output prefix use all four cores Sorting and indexing - Sort BAM file.bam suffix will be added

Trinity College Dublin, The University of Dublin GE3M25 Project./samtools index sorted.bam program input file command Sorting and indexing - create index file with.bai index will be created

Trinity College Dublin, The University of Dublin Sorting and indexing Check output files GE3M25 Project

Trinity College Dublin, The University of Dublin Steps in this class: 1.Download FastQ data set (ChIP-Seq of TF in yeast) ✔ 2.Read mapping (Bowtie2) ✔ 3.Generate indexed and sorted BAM file ✔ 4.Peak calling 5.Visualisation (IGV) 6.Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin GE3M25 Project Installing macs pip install macs --user In the Terminal: find ~/ -iname '*macs*' Find the location of the tool: /Users/kahokamp//Library/Python/2.7/bin/macs same as ~/Library/Python/2.7/bin/macs

Trinity College Dublin, The University of Dublin GE3M25 Project Peak calling

Trinity College Dublin, The University of Dublin GE3M25 Project Peak calling path/to/macs -t sorted.bam -n yeast_macs -g program treatment file output prefix genome size replace with path to macs

Trinity College Dublin, The University of Dublin GE3M25 Project Peak calling

Trinity College Dublin, The University of Dublin GE3M25 Project Peak calling

Trinity College Dublin, The University of Dublin GE3M25 Project Peak calling pairing model

Trinity College Dublin, The University of Dublin GE3M25 Project Peak calling Check output

Trinity College Dublin, The University of Dublin GE3M25 Project hide first 19 rows sort by column G or H

Trinity College Dublin, The University of Dublin GE3M25 Project

Trinity College Dublin, The University of Dublin Steps in this class: 1.Download FastQ data set (ChIP-Seq of TF in yeast) ✔ 2.Read mapping (Bowtie2) ✔ 3.Generate indexed and sorted BAM file ✔ 4.Peak calling 5.Visualisation (IGV) 6.Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin GE3M25 Project 1. Download IGV (local copy on bioinf) 2. Unpack (on the command line_: unzip IGV_ app.zip 3. Start by double-click in Finder 4. Load S. cerevisiae (sacCer3) genome 5. Load BAM file Visualisation with IGV (Integrated Genome Viewer)

Trinity College Dublin, The University of Dublin GE3M25 Project pick a region with a peak navigate there in IGV

Trinity College Dublin, The University of Dublin GE3M25 Project

Trinity College Dublin, The University of Dublin Steps in this class: 1.Download FastQ data set (ChIP-Seq of TF in yeast) ✔ 2.Read mapping (Bowtie2) ✔ 3.Generate indexed and sorted BAM file ✔ 4.Peak calling 5.Visualisation (IGV) ✔ 6.Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin GE3M25 Project Storage of BAM file Upload.bam, bam.bai and MACS files through bioinf.gen.tcd.ie/GE3M25/project

Trinity College Dublin, The University of Dublin Optional steps in this class: 1. Download and map Input file 2. Run MACS with Input file as control 3. Change parameters in Bowtie2, MACS 4. Trim FastQ data 5. Compare results GE3M25 Project

Trinity College Dublin, The University of Dublin Don't forget to log out!