Presentation is loading. Please wait.

Presentation is loading. Please wait.

GE3M25: Data Analysis, Class 4

Similar presentations


Presentation on theme: "GE3M25: Data Analysis, Class 4"— Presentation transcript:

1 GE3M25: Data Analysis, Class 4
TCD, 30/11/2017 Karsten Hokamp, PhD Genetics

2 Python 6 Functions, Regex
NGS 1 Intro-duction GE3M11Exam Week 10 Week 11 Python 1 Intro-duction Python 2 Strings and Files NGS 2 QC, Trimming Week 12 Python 3 File I/O, Branching Python 4 Modules,Lists, Sets NGS 3 Mapping Week 13 Python 5 Dictiona-ries Python 6 Functions, Regex NGS 4 Peak Calling Week 14

3 ChIP-Seq project report
NGS 5 Gene Lists, Tuning NGS 6 / Python 7 Pipelines NGS 7 / Python 8 Revision Week 15 Python Exam Week 16 ChIP-Seq project report January 2018:

4 Marks for GE3M25 Python exam: 50% 2/3 data handling 1/3 statistics ChIP-Seq report: 50%

5 Python exam Date: Mon, 11th Dec, 11am – 12.45 Venue: Mac Lab Structure: 10 multiple-choice questions (20 points) 4 programming tasks: 2 short ones (30 points each) 2 more involved ones (50 points each) Submission: multiple-choice test (1 sheet print-out) 1 – 2 Python scripts with execution output (file upload)

6 Python exam Material: Anything from the course Website Official Python documentation Python Books Content: Material covered during classes Note: Add comments! Include copy of output (Terminal/Idle) Include Student ID in script and file name Submit frequently – only last version counts Even scripts that don't work can receive points

7 Class 4: Project overview Visualisation Peak detection Motif detection

8 ChIP-Seq Different sets of genes are expressed under different conditions Regulated through transcription factors that bind to promoters Binding can be captured by ChIP Enriched regions are revealed through NGS

9

10 Class 1: ChIP-seq data analysis in a nutshell

11 ChIP-Seq Analysis Goal

12 Recap – From Reads to Peaks (Visualisation)
NGS data (FastQ format) Mapped reads (SAM format) bowtie2 samtools Index files (*.bt2) Sorting/indexing (*.bam, *.bai) Reference (Fasta format) bowtie2-build IGV

13 Recap – From Reads to Peaks (Visualisation)
NGS data (FastQ format) Mapped reads (SAM format) bowtie2 samtools Sorting/indexing (*.bam, *.bai)  BigWig file Index files (*.bt2) Reference (Fasta format) bowtie2-build IGV

14 Recap – From Reads to Peaks (Calling)
NGS data (FastQ format) Mapped reads (SAM format) bowtie samtools Index files (*.bt2) Sorting/indexing (*.bam, *.bai) Reference (Fasta format) bowtie-build Gem Peak list, motifs

15 Project Data http://bioinf.gen.tcd.ie/GE3M25/project
Antimicrob. Agents Chemother. (2014)

16 Project Data Three strains: Wild type TAP-Pdr1 Pdr1-k.o.

17 Project Data Three strains, two antibodies Wild type TAP-Pdr1
Pdr1-k.o. Pdr1 antibody TAP antibody

18 Project Data Paul et al. Figure 2A

19 Project Data Potential consensus for the C. glabrata PDR1 binding site
Paul et al. Figure 2B

20 GE3M25 Project Previous steps:
Download FastQ data set (ChIP-Seq of TF in yeast) ✔ Quality assessment with FastQC ✔ Read mapping (Bowtie2) ✔ Generate indexed and sorted BAM file ✔ Visualisation in IGV ✔ Store BAM and index files ✔

21 GE3M25 Project Data Download: Start here: bioinf.gen.tcd.ie/GE3M25

22 GE3M25 Project Data Download: NGS page: bioinf.gen.tcd.ie/GE3M25/ngs

23 GE3M25 Project Data Download: bioinf.gen.tcd.ie/GE3M25/ngs/data
Main data files (Fastq format)

24 GE3M25 Project Data Download: bioinf.gen.tcd.ie/GE3M25/ngs/data/fastq
Control data files ChIP data files download files that have your student id

25 Preparations – new tools folder
1. Rename previous directory (in Terminal): mv tools tools.prev If you see mv: rename tools to tools.old/tools: No such file or directory then there was no tools directory – that's ok!

26 GE3M25 Project Data Download: bioinf.gen.tcd.ie/GE3M25/ngs/data
additional files in tools.zip

27 Preparations Tools Rename previous directory (in Terminal)
Download 'tools.zip' from webpage Unpack archive (if not done by browser): unzip tools.zip If you see unzip: cannot find or open tools.zip, tools.zip.zip then it was already unpacked during download

28 Preparations Tools Rename previous directory (in Terminal)
Download 'tools.zip' from webpage Unpack archive (if not done by browser) Check content of the folder: ls -lh tools

29 Preparations

30 Preparations Download tools.zip (class 4) again if this is missing!

31 Data Processing Indexing Mapping Compressing Sorting Visualisation

32 Data Processing Indexing Mapping Compressing Sorting BigWig generation
Visualisation Peak/Motif detection

33 Data Processing Indexing Mapping Compressing Sorting BigWig generation
Visualisation Peak/Motif detection can be combined

34 Data Processing Indexing Mapping | Compressing | Sorting
BigWig generation Visualisation Peak/Motif detection

35 GE3M25 Project – Read Mapping
Build an index of the Genome: Syntax: bowtie2-build fasta_file index_name e.g. tools/bowtie2-build ASM254v2.fa C_glabrata This name to be used in mapping step!

36 GE3M25 Project – Read Mapping
Bowtie2 mapping: Single-end data: bowtie2 -U _exp_1_fastq.bz2 -x C_glabrata -p 4 > exp1.sam 2. Paired-end data: bowtie2 -1 file1 -2 file2 -x C_glabrata -p 4 > exp.sam e.g.: bowtie _exp_1_fastq.bz _exp_2_fastq.bz2 -x C_glabrata -p 4 > exp.sam

37 GE3M25 Project – Sorting and Indexing
Change SAM to BAM format: tools/samtools view -b exp.sam > exp.bam 2. Sorting with 4 threads for speed-up: tools/samtools sort 4 exp.bam > exp_sorted.bam intermediates Results file

38 Data Processing output from left is used as input on right of pipe
Mapping | Compressing | Sorting tools/bowtie2 -1 file1 -2 file2 -x index | tools/samtools view -b - | tools/samtools sort - > out.bam all on one line file names replaced with '-' redirect output into file

39 make output name descriptive
Data Processing Indexing Mapping | Compressing | Sorting, e.g.: tools/bowtie2 -x C_glabrata -p 4 _exp_1_fastq.bz2 _exp_2_fastq.bz2 | tools/samtools view -b - | tools/samtools sort - > exp.sorted.bam make output name descriptive

40 Data Processing Indexing Mapping | Compressing | Sorting
BigWig generation Visualisation Peak/Motif detection

41 file that lists BAM files
Data Processing The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph. file that lists BAM files

42 GE3M25 Project

43 Kill stuck IGV via Activity Monitor

44 GE3M25 Project New file with .bw ending:
Load .bam and .bw files into IGV

45 BigWig track visible across whole genome!

46 GE3M25 Project Data formats: Fastq SAM BAM BAM index BigWig

47 GE3M25 Project Peak calling with GEM Required input parameters:
BAM file Fasta file with reference sequence File with chromosome size(s) Genome size Read distribution Output directory

48 GE3M25 Project Peak calling with GEM java -jar tools/gem/gem.jar
--expt exp.sorted.bam --f BAM --genome . --g chrom.sizes.txt --s --d tools/gem/Read_Distribution_default.txt --out peaks BAM file Directory with fasta file(s) File with chromosome size(s) Genome size Read distribution Output directory

49 GE3M25 Project Download these two files

50 GE3M25 Project Running Gem:

51 GE3M25 Project Output produced by GEM:

52 GE3M25 Project Check out top peaks: head peaks/peaks_GPS_events.txt

53 GE3M25 Project Peak calling with GEM

54 GE3M25 Project Peak calling with GEM
Add parameters to initiate motif finding: --k_min 6 --k_max 13

55 GE3M25 Project Output produced by GEM: open peaks/peaks_result.htm

56 GE3M25 Project Peak calling with GEM Add control file to remove noise:
--ctrl ctrl.sorted.bam Check how detected peaks/motif differ!

57 GE3M25 Project Calculate chromosome sizes
tools/samtools idxstats exp_sorted.bam | cut -f 1,2 > chrom.sizes

58 GE3M25 Project Storage of results files
Upload .bam, bam.bai, .bw etc through bioinf.gen.tcd.ie/GE3M25/project

59 Don't forget to log out!


Download ppt "GE3M25: Data Analysis, Class 4"

Similar presentations


Ads by Google