Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Slides:



Advertisements
Similar presentations
NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.
Advertisements

Facilitator: Richard Bruskiewich
ChIP-seq analysis Ecole de bioinformatique AVIESAN – Roscoff, Jan 2013.
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
Institute for Quantitative & Computational Biosciences Workshop4: NGS- study design and short read mapping.
PowerPoint 2002 Linking Video in Presentation and Delivering Presentation on the Road.
Bioinformatics Analysis Team McGill University and Genome Quebec Innovation Center
Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers
NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
Customized cloud platform for computing on your terms !
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
Computer Lab (I) Introduction of galaxy and UCSC genome browser.
MES Genome Informatics I - Lecture V. Short Read Alignment
File formats Wrapping your data in the right package Deanna M. Church
Advanced ChIPseq Identification of consensus binding sites for the LEAFY transcription factor.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
DAY 1. GENERAL ASPECTS FOR GENETIC MAP CONSTRUCTION SANGREA SHIM.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
NGS data analysis CCM Seminar series Michael Liang:
Next Generation DNA Sequencing
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Moodle with Style Integrating new technologies to empower learning and transform leadership.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
The Internet Using the Internet Web addresses Searching Favourites Saving / Printing web pages.
Using Macros in Minitab
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis Karsten Hokamp, PhD Genetics TCD, 16/11/2015.
Trinity College Dublin, The University of Dublin GE3M25: Bioinformatics Karsten Hokamp, PhD Genetics TCD, 05/11/2015.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
GE3M25: Computer Programming for Biologists Python, Class 5
IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 4 Karsten Hokamp, PhD Genetics TCD, 01/12/2015.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
Introductory RNA-seq Transcriptome Profiling
Using command line tools to process sequencing data
Day 5 Mapping and Visualization
Stubbs Lab Bioinformatics - 2 Retrieving sequence data files and Linux commands Nov 17, 2016 Joe Troy.
Dowell Short Read Class Phillip Richmond
Integrative Genomics Viewer (IGV)
Advanced ChIP-seq Identification of consensus binding sites for the LEAFY transcription factor Explain that you can use your own data Explain that data.
Short Read Sequencing Analysis Workshop
Chip – Seq Peak Calling in Galaxy
GE3M25: Data Handling – ChIP-Seq
Introductory RNA-Seq Transcriptome Profiling
GE3M25: Data Analysis, Class 4
GE3M25: Data Handling – ChIP-Seq
GE3M25: Data Analysis, Class3
ChIP-Seq Data Processing and QC
Epigenetics System Biology Workshop: Introduction
Maximize read usage through mapping strategies
Canadian Bioinformatics Workshops

Presentation transcript:

Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class3 Karsten Hokamp, PhD Genetics TCD, 30/11/2015

Trinity College Dublin, The University of Dublin GE3M25 Data Handling Module Content Python Programming Bioinformatics ChIP-Seq analysis

Trinity College Dublin, The University of Dublin Class 3: Project Data Download project data Quality control Trimming Read mapping Visualisation

Trinity College Dublin, The University of Dublin Next Generation Sequencing - Applications Xu F, Wang Q, Zhang F, Zhu Y, Gu Q, Wu L, Yang L, Yang X. Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research. Cardiovasc Diagn Ther 2012;2(2):

Trinity College Dublin, The University of Dublin Source: Bio-Rad ChIP-Seq Basics ChIP = Chromatin ImmunoPrecipitation = highly ordered packaging of DNA and histones together

Trinity College Dublin, The University of Dublin = highly ordered packaging of DNA and histones together Rosa, S.; Shaw, P. Insights into Chromatin Structure and Dynamics in Plants. Biology 2013, 2,

Trinity College Dublin, The University of Dublin Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. ChIP-Seq Basics

Trinity College Dublin, The University of Dublin

Steps in this class: 1. Download FastQ data set (ChIP-Seq of TF in yeast) 2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin Optional steps in this class: 1. Trimming by quality (UrQt) 2. Trimming for Illumina Universal Adapter (trim_galore) 3. Trimming for other adapters (trim_galore) 4. Other read mapper (BWA) 5. Comparison of results 6. Upload of most suitable BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin Working on the Command Line Start: Open 'Terminal' from Spotlight or Dock

Trinity College Dublin, The University of Dublin GE3M25 Project Step 1 Download data 1.Browse to bioninf.gen.tcd.ie/GE3M25/project 2.Locate the file with your student ID 3.Click to download 4.Check Downloads folder for file

Trinity College Dublin, The University of Dublin GE3M25 Project Step 2 Quality Control with FastQC 1. Download FastQC 2. Load the (compressed) FastQ file 3. Save report 4. Rename to start with full Student ID

Trinity College Dublin, The University of Dublin GE3M25 Project Step 2 Info for project report 1. Data details (# sequences, read length, etc.) 2. Comments on quality aspects 3. Highlight of potential issues 4. Discuss ways to clean up data

Trinity College Dublin, The University of Dublin Quality Information Conversion of quality score:

Trinity College Dublin, The University of Dublin GE3M25 Project Step 3 Storage of FastQC report 1. Open HTML report in browser 2. Copy and paste information into a Word document or Ctrl-click to copy images (or use Grab for screenshots) 3. Mail document to you or store on USB/Network or upload HTML file through bioinf.gen.tcd.ie/GE3M25/project

Trinity College Dublin, The University of Dublin GE3M25 Project Step 4 Read mapping 1. Download bowtie2 programs and reference sequence  bioinf.gen.tcd.ie/GE3M25/data_handling 2. Switch to Terminal for command line work 3. Extract bowtie2 programs: tar zxvf bowtie2.tgz Or: tar xvf bowtie2.tar 4. Build index:./bowtie2-build S288C_reference_sequence_R64-2-1_ fsa yeast 5. Map reads with default parameters:./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam

Trinity College Dublin, The University of Dublin GE3M25 Project Step 4

Trinity College Dublin, The University of Dublin GE3M25 Project Step 4 Read mapping 1. Download bowtie2 programs and reference sequence  bioinf.gen.tcd.ie/GE3M25/data_handling 2. Switch to Terminal for command line work 3. Extract bowtie2 programs: tar zxvf bowtie2.tgz Or: tar xvf bowtie2.tar 4. Build index:./bowtie2-build S288C_reference_sequence_R64-2-1_ fsa yeast 5. Map reads with default parameters:./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam Replace!

Trinity College Dublin, The University of Dublin Working on the Command Line – the Prompt userhost directory symbol Spaces are important!

Trinity College Dublin, The University of Dublin Steps in this class: 1. Download FastQ data set (ChIP-Seq of TF in yeast) 2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 Generate indexed and sorted BAM file Sequence Alignment/Map Format - Standard format for read mapping results - Can be compressed to save space: binary SAM  BAM format - Can be indexed for random access - samtools allow viewing and processing SAM data

Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 samtools Download from bioinf, chmod and run ls -l samtools chmod +x samtools./samtools

Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 samtools view options

Trinity College Dublin, The University of Dublin SAM Format

Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 View SAM file./samtools view -S bowtie2_def.sam | less Change into BAM format./samtools view -bS bowtie2_def.sam > bowtie2_def.bam Sort BAM file./samtools sort bowtie2_def.bam bowtie2_def_sorted Index BAM file./samtools index bowtie2_def_sorted.bam

Trinity College Dublin, The University of Dublin Steps in this class: 1. Download FastQ data set (ChIP-Seq of TF in yeast) 2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Store BAM and index files GE3M25 Project

Trinity College Dublin, The University of Dublin GE3M25 Project Step 6 1. Download IGV (local copy on bioinf) 2. Unpack (on the command line_: unzip IGV_ app.zip 3. Start by double-click in Finder 4. Load S. cerevisiae (sacCer3) genome 5. Load BAM file Visualisation with IGV (Integrated Genome Viewer)

Trinity College Dublin, The University of Dublin GE3M25 Project Step 6 Visualisation with IGV (Integrated Genome Viewer)

Trinity College Dublin, The University of Dublin Exercises Clean your data via trimming Run bowtie with different parameters How do these steps affect the number of mapped reads? How do they affect the peaks that you see in IGV?

Trinity College Dublin, The University of Dublin GE3M25 Project Step 7 Storage of BAM file upload BAM and bam.bai files through bioinf.gen.tcd.ie/GE3M25/project

Trinity College Dublin, The University of Dublin Don't forget to log out!