GE3M25: Data Analysis, Class3

GE3M25: Data Analysis, Class3
TCD, 23/11/2017 Karsten Hokamp, PhD Genetics

NGS 1 Intro-duction Week 10 Week 11 NGS 2 QC, Trimming Week 12 NGS 3
GE3M11Exam Week 10 Week 11 Python 1 Intro-duction Python 2 Strings and Files NGS 2 QC, Trimming Week 12 Python 3 File I/O, Branching Python 4 Modules,Lists, Sets NGS 3 Project files Week 13 Python 5 Dictiona-ries Python 6 Regex, System NGS 4 Peak Calling Week 14

Marks for GE3M25 Python exam: 50% 2/3 data handling 1/3 statistics ChIP-Seq report: 50%

Class 3: Project Data Download project data Quality assessment
Read mapping (Trimming) Visualisation

Class 3: Project Data http://bioinf.gen.tcd.ie/GE3M25/project
Antimicrob. Agents Chemother. (2014)

Class 3: Project Data Background: Candida genus of yeasts
most common cause of fungal infections worldwide commensal opportunistic azole-based antifungal drugs (CC) Image: Y. Tambe Microscopic image (200-fold magnification) of Candida albicans ATCC 10231, grown on cornmeal agar medium with 1% Tween80 (

Class 3: Project Data Background: Azoles
class of five-membered heterocyclic compounds discovery of ketoconazole in 1980s inhibits 14-alpha-demethylase (CC) Image: David E. Volk

Class 3: Project Data Background: Candida glabrata
most common non-albicans candida highest incidence of azole resistance mostly due to mutations in PDR1 or cells w/o mitochondrial function Candida glabrata x. Image: Billy J. Carver

Class 3: Project Data Question: What are the direct gene targets of PDR1?  ChIP-Seq analysis of pdr1Δ strain with tagged-PDR1 construct and treated with ethidium bromide

Class 3: Project Data

Next Generation Sequencing - Applications
Xu F, Wang Q, Zhang F, Zhu Y, Gu Q, Wu L, Yang L, Yang X. Impact of Next-Generation Sequencing (NGS) technology on cardiovascular disease research. Cardiovasc Diagn Ther 2012;2(2):

ChIP-Seq Basics ChIP = Chromatin ImmunoPrecipitation
Source: Bio-Rad = highly ordered packaging of DNA and histones together

Rosa, S.; Shaw, P. Insights into Chromatin Structure
and Dynamics in Plants. Biology 2013, 2,

ChIP-Seq Basics Immunoprecipitation (IP) is the technique of precipitating a protein antigen out of solution using an antibody that specifically binds to that particular protein.

ChIP-seq controls: Input chromatin Immunoglogulin G antibodies
Knock-outs ChIP without antibody Remove peaks that are due to cross-reactivity of AB Remove unspecific binding (background noise)

GE3M25 Project Step 1 Download data
Browse to bioinf.gen.tcd.ie/GE3M25/ngs Click on data link (class 3):

GE3M25 Project Steps in this class: 1. Download FastQ data set
2. Quality control (FastQC) 3. Storage of FastQC report file 4. Read mapping (Bowtie2) 5. Generate indexed and sorted BAM file 6. Visualisation (IGV) 7. Removal of PCR duplicates 8. Store BAM and index files

GE3M25 Project Data Paired-end Sequencing vs Single-end Sequencing

GE3M25 Project Data files: Paired-end compressed FastQ files
xxxxxxx_exp_1.fastq.bz2 xxxxxxx_exp_2.fastq.bz2 xxxxxxx_ctrl_1.fastq.bz2 xxxxxxx_ctrl_2.fastq.bz2 2. Genome sequence ChIP'ed data Control data

GE3M25 Project Steps in this class:
1. Download FastQ data set  4 files 2. Quality control (FastQC)  on each file individually 3. Storage of FastQC report file 4. Read mapping (Bowtie2)  in pairs 5. Generate indexed and sorted BAM file  2 sets 6. Visualisation (IGV) 7. Removal of PCR duplicates 8. Store BAM and index files (exp and ctrl)

GE3M25 Project Step 2 Info for project report
Data details (# sequences, read length, etc.) What type of Quality encoding used (Phred+33 or Phred+ 64) Comments on quality aspects Highlight of potential issues Discuss ways to clean up data

Conversion of quality score:
Quality Information Conversion of quality score:

GE3M25 Project Step 3 Storage of FastQC report
1. Open HTML report in browser 2. Copy and paste information into a Word document or Ctrl-click to copy images (or use Grab for screenshots) 3. Mail document to you store on USB/Network upload HTML file through

GE3M25 Project – Read Mapping
Build an index of the Genome: Syntax: bowtie2-build fasta_file index_name e.g. bowtie2-build ASM254v2.fa C_glabrata This name to be used in mapping step!

Bowtie2 mapping: Single-end data: bowtie2 -U _exp_1_fastq.bz2 -x C_glabrata -p 4 > exp1.sam 2. Paired-end data: bowtie2 -1 file1 -2 file2 -x C_glabrata -p 4 > exp.sam

Bowtie2 mapping: Single-end data: bowtie2 -U _exp_1_fastq.bz2 -x C_glabrata -p 4 > exp1.sam 2. Paired-end data: bowtie2 -1 file1 -2 file2 -x C_glabrata -p 4 > exp.sam e.g.: bowtie _exp_1_fastq.bz _exp_2_fastq.bz2 -x C_glabrata -p 4 > exp.sam

GE3M25 Project – Sorting and Indexing
Change SAM to BAM format: tools/samtools view -b exp.sam > exp.bam 2. Sort tools/samtools sort exp.bam > exp_sorted.bam 3. Sorting with 4 threads for speed-up: tools/samtools sort 4 exp.bam > exp_sorted.bam 4. Generate an index tools/samtools index exp_sorted.bam  produces a file with .bai ending

GE3M25 Project Optional steps in this class: Soft-trimming (Bowtie2)
Trimming for quality and adapters (trim_galore)  Try to maximise number of uniquely mapped reads! 5. Comparison of results 6. Upload of most suitable BAM and index files

GE3M25 Project Step 5 Generate indexed and sorted BAM file
Sequence Alignment/Map Format - Standard format for read mapping results - Can be compressed to save space: binary SAM  BAM format - Can be indexed for random access - samtools allow viewing and processing SAM data

GE3M25 Project Step 6 Visualisation with IGV (Integrated Genome Viewer) Requires several data files: Genome BAM file BAM index file

(Integrated Genome Browser)
start IGV (Integrated Genome Browser)

load 'ASM254v2.fa' from Download directory as genome

load from file: mapped reads (exp.bam)

 click and drag region to zoom in

 coverage  individual reads

Exercises Clean your data via trimming
Run bowtie with different parameters How do these steps affect the number of mapped reads? How do they affect the peaks that you see in IGV?

GE3M25 Project Storage of BAM file
upload BAM and index files (XXX.bam.bai) through upload page! bioinf.gen.tcd.ie/cgi-bin/GE3M25/upload.pl

Don't forget to log out!

GE3M25: Data Analysis, Class3

Similar presentations

Presentation on theme: "GE3M25: Data Analysis, Class3"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GE3M25: Data Analysis, Class3

Similar presentations

Presentation on theme: "GE3M25: Data Analysis, Class3"— Presentation transcript:

Similar presentations

About project

Feedback