Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Similar presentations


Presentation on theme: "Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID"— Presentation transcript:

1 Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID http://bioinf.gen.tcd.ie/GE3M25/project

2 Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class3 Karsten Hokamp, PhD Genetics TCD, 30/11/2015 http://bioinf.gen.tcd.ie/GE3M25/project

3 Trinity College Dublin, The University of Dublin GE3M25 Data Handling Module Content Python Programming Bioinformatics ChIP-Seq analysis http://bioinf.gen.tcd.ie/GE3M25/project

4 Trinity College Dublin, The University of Dublin Class 3: Project Data Download project data Quality control Trimming Read mapping Visualisation http://bioinf.gen.tcd.ie/GE3M25/project

5 Trinity College Dublin, The University of Dublin Next Generation Sequencing - Applications Xu F, Wang Q, Zhang F, Zhu Y, Gu Q, Wu L, Yang L, Yang X. Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research. Cardiovasc Diagn Ther 2012;2(2):138-146

6 Trinity College Dublin, The University of Dublin Source: Bio-Rad ChIP-Seq Basics ChIP = Chromatin ImmunoPrecipitation = highly ordered packaging of DNA and histones together

7 Trinity College Dublin, The University of Dublin = highly ordered packaging of DNA and histones together Rosa, S.; Shaw, P. Insights into Chromatin Structure and Dynamics in Plants. Biology 2013, 2, 1378-1410.

8 Trinity College Dublin, The University of Dublin Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein. ChIP-Seq Basics

9 Trinity College Dublin, The University of Dublin

10 Steps in this class: 1. Download FastQ data set (ChIP-Seq of TF in yeast) 2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Store BAM and index files GE3M25 Project

11 Trinity College Dublin, The University of Dublin Optional steps in this class: 1. Trimming by quality (UrQt) 2. Trimming for Illumina Universal Adapter (trim_galore) 3. Trimming for other adapters (trim_galore) 4. Other read mapper (BWA) 5. Comparison of results 6. Upload of most suitable BAM and index files GE3M25 Project

12 Trinity College Dublin, The University of Dublin Working on the Command Line Start: Open 'Terminal' from Spotlight or Dock

13 Trinity College Dublin, The University of Dublin GE3M25 Project Step 1 Download data 1.Browse to bioninf.gen.tcd.ie/GE3M25/project 2.Locate the file with your student ID 3.Click to download 4.Check Downloads folder for file

14 Trinity College Dublin, The University of Dublin GE3M25 Project Step 2 Quality Control with FastQC 1. Download FastQC 2. Load the (compressed) FastQ file 3. Save report 4. Rename to start with full Student ID

15 Trinity College Dublin, The University of Dublin GE3M25 Project Step 2 Info for project report 1. Data details (# sequences, read length, etc.) 2. Comments on quality aspects 3. Highlight of potential issues 4. Discuss ways to clean up data

16 Trinity College Dublin, The University of Dublin Quality Information Conversion of quality score:

17 Trinity College Dublin, The University of Dublin GE3M25 Project Step 3 Storage of FastQC report 1. Open HTML report in browser 2. Copy and paste information into a Word document or Ctrl-click to copy images (or use Grab for screenshots) 3. Mail document to you or store on USB/Network or upload HTML file through bioinf.gen.tcd.ie/GE3M25/project

18 Trinity College Dublin, The University of Dublin GE3M25 Project Step 4 Read mapping 1. Download bowtie2 programs and reference sequence  bioinf.gen.tcd.ie/GE3M25/data_handling 2. Switch to Terminal for command line work 3. Extract bowtie2 programs: tar zxvf bowtie2.tgz Or: tar xvf bowtie2.tar 4. Build index:./bowtie2-build S288C_reference_sequence_R64-2-1_20150113.fsa yeast 5. Map reads with default parameters:./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam

19 Trinity College Dublin, The University of Dublin GE3M25 Project Step 4

20 Trinity College Dublin, The University of Dublin GE3M25 Project Step 4 Read mapping 1. Download bowtie2 programs and reference sequence  bioinf.gen.tcd.ie/GE3M25/data_handling 2. Switch to Terminal for command line work 3. Extract bowtie2 programs: tar zxvf bowtie2.tgz Or: tar xvf bowtie2.tar 4. Build index:./bowtie2-build S288C_reference_sequence_R64-2-1_20150113.fsa yeast 5. Map reads with default parameters:./bowtie2 -x yeast -U XXX.fastq.gz -p 4 > bowtie2_def.sam Replace!

21 Trinity College Dublin, The University of Dublin Working on the Command Line – the Prompt userhost directory symbol Spaces are important!

22 Trinity College Dublin, The University of Dublin Steps in this class: 1. Download FastQ data set (ChIP-Seq of TF in yeast) 2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Store BAM and index files GE3M25 Project

23 Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 Generate indexed and sorted BAM file Sequence Alignment/Map Format - Standard format for read mapping results - Can be compressed to save space: binary SAM  BAM format - Can be indexed for random access - samtools allow viewing and processing SAM data

24 Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 samtools Download from bioinf, chmod and run ls -l samtools chmod +x samtools./samtools

25 Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 samtools view options

26 Trinity College Dublin, The University of Dublin SAM Format

27 Trinity College Dublin, The University of Dublin GE3M25 Project Step 5 View SAM file./samtools view -S bowtie2_def.sam | less Change into BAM format./samtools view -bS bowtie2_def.sam > bowtie2_def.bam Sort BAM file./samtools sort bowtie2_def.bam bowtie2_def_sorted Index BAM file./samtools index bowtie2_def_sorted.bam

28 Trinity College Dublin, The University of Dublin Steps in this class: 1. Download FastQ data set (ChIP-Seq of TF in yeast) 2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Store BAM and index files GE3M25 Project

29 Trinity College Dublin, The University of Dublin GE3M25 Project Step 6 1. Download IGV (local copy on bioinf) 2. Unpack (on the command line_: unzip IGV_2.3.66.app.zip 3. Start by double-click in Finder 4. Load S. cerevisiae (sacCer3) genome 5. Load BAM file Visualisation with IGV (Integrated Genome Viewer)

30 Trinity College Dublin, The University of Dublin GE3M25 Project Step 6 Visualisation with IGV (Integrated Genome Viewer)

31 Trinity College Dublin, The University of Dublin Exercises Clean your data via trimming Run bowtie with different parameters How do these steps affect the number of mapped reads? How do they affect the peaks that you see in IGV?

32 Trinity College Dublin, The University of Dublin GE3M25 Project Step 7 Storage of BAM file upload BAM and bam.bai files through bioinf.gen.tcd.ie/GE3M25/project

33 Trinity College Dublin, The University of Dublin Don't forget to log out!


Download ppt "Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID"

Similar presentations


Ads by Google