Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dowell Short Read Class Phillip Richmond

Similar presentations


Presentation on theme: "Dowell Short Read Class Phillip Richmond"— Presentation transcript:

1 Dowell Short Read Class Phillip Richmond
ReSequencing Dowell Short Read Class Phillip Richmond

2 Outline The Plan Organize and copy data to your own working directory
Map reads back to a reference genome Convert sam to bam Remove duplicates Run a variant caller Visualize variants

3 Plan The first round of variant calling we’re going to do will involve cutting the yeast genome Sigma1278b into reads, mapping them back to the S288c reference genome, and then finding all SNP differences between the two genomes This data will be synthetic The reads will already be produced for you in fastq format, 1x50 bp reads

4 Getting started Organization is KEY!!
For the resequencing tutorial this is the organization that will be necessary: Make a new directory in your home directory called: ReSequencing Inside of ReSequencing make subdirectories: GENOME FASTQ SAM VCF PBS

5 Copying the data Now we want to copy the data from:
/projects/sreadgrp/homeworkfiles/ReSequencing/ Copy the Fastq file from the FASTQ directory (Sigmav7_50mers.fastq) to your own FASTQ directory Copy SGDv4.fasta from GENOME/ to your own directory GENOME/ Copy the PBS files to your own PBS directory: IndexGenome.pbs MapReads.pbs Sam2Bam.pbs IndelRealign.pbs CallSNPs.pbs

6 Index the genome (IndexGenome.pbs)
Command: /opt/bowtie/bowtie /bowtie2-build <in.fasta> <out_index> My Command: /opt/bowtie/bowtie /bowtie2-build /Users/richmonp/ReSequencing/GENOME/SGDv4.fasta /Users/richmonp/ReSequencing/GENOME/SGDv4_bowtie2_Index

7 Map the reads back to the genome (MapReads.pbs)
These reads need to have “readgroups” in order to work. It’s best to add these when we map using the bowtie2 options --rg and --rg-id: Example: --rg-id Sigmav7vsS288c_bowtie2 –rg SM:Sigmav7vsS288c_bowtie2 Full Command: /opt/bowtie/bowtie /bowtie2 --rg-id Sigmav7vsS288c_bowtie2 --rg SM:Sigmav7vsS288c_bowtie2 /Users/richmonp/ReSequencing/GENOME/SGDv4_bowtie2_Index /Users/richmonp/ReSequencing/FASTQ/Sigmav7_50mers.fastq –S /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.sam 2> /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.stderr

8 Convert your file format using Samtools (Sam2Bam.pbs)
samtools view –bS <in.sam> -o <out.bam> samtools sort <in.bam> <out.sorted> samtools index <in.sorted.bam> /opt/samtools/0.1.18/samtools view –bS /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.sam –o /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.bam /opt/samtools/0.1.18/samtools sort /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.bam /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.sorted /opt/samtools/0.1.18/samtools index /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.sorted.bam

9 Samtools remove duplicates (Sam2Bam.pbs)
Removes duplicate reads from PCR errors in reads. samtools rmdup <in.sorted.bam> <out.rmdup.sorted.bam> /opt/samtools/0.1.18/samtools rmdup /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2.sorted.bam /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup.sorted.bam

10 Realign around indels (IndelRealign.pbs)
GATK has a two-step process for realigning reads around indels Step 1: Find candidate locations that may be best represented by an insertion or deletion GATK’s RealignerTargetCreator Step 2: Apply local realignment around the candidate locations to produce a new bam file GATK’s IndelRealigner

11 Realign around Indels: RealignerTargetCreator
java –jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar –R <reference genome> -T RealignerTargetCreator (options) –I <in.sorted.rmdup.bam> -o <out.intervals> java -jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar -R /Users/richmonp/ReSequencing/GENOME/SGDv4.fasta \ -T RealignerTargetCreator -minReads 5 \ -I /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup.sorted.bam -o /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup.intervals

12 Realign around indels: IndelRealigner
java –jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar –T IndelRealigner –model USE_READS –targetIntervals <in.intervals> -R <reference.fasta> -I <in.rmdup.sorted.bam> -o <out.rmdup.realigned.sorted.bam> java -jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar -T IndelRealigner -model USE_READS \ -targetIntervals /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup.intervals \ -R /Users/richmonp/ReSequencing/GENOME/SGDv4.fasta \ -I /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup.sorted.bam -o /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup_realigned.sorted.bam

13 Call variants using GATK UnifiedGenotyper (CallSNPs.pbs)
The GATK package is a java executable, or a .jar file. To run the package you type: java –jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar Then you must select a –T, or a program within the package to run, which in our case is UnifiedGenotyper java –jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar –T UnifiedGenotyper

14 Call variants using GATK UnifiedGenotyper
java –jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar –T UnifiedGenotyper -glm BOTH -I <in.sorted.bam> -R <in.fasta> -o <out.vcf> java -jar /opt/gatk/2.4-9/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -R /Users/richmonp/ReSequencing/GENOME/SGDv4.fasta -I /Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup_realigned.sorted.bam -o /Users/richmonp/ReSequencing/VCF/Sigmav7_vs_S288c_bowtie2_gatk.vcf

15 View your VCF in IGV GATK automatically indexes your VCF files, so now we can visualize both the reads and SNPs in IGV Transfer both the final bam file (/Users/richmonp/ReSequencing/SAM/Sigmav7_vs_S288c_bowtie2_rmdup_realigned.sorted.bam) and the vcf file (/Users/richmonp/ReSequencing/VCF/Sigmav7_vs_S288c_bowtie2_gatk.vcf) to your student directory on /projects/sreadgrp/student/<username>/ Open up the visualization VNC window Open IGV Load the files

16 Organize into groups of 5
Coffee Break Then… Organize into groups of 5

17 Paired-end data The main difference between paired-end and single-end data will occur when you are mapping Each read in the pair is denoted by either “R1” or “R2” 1028_S1_L001_R1_001.fastq 1028_S1_L001_R2_001.fastq

18 How it changes your bowtie2 command:
Open up MapPairedReads.pbs in an editor Notice: -1 /data/Avery/FASTQ/1056_S1_L001_R1_001.fastq \ -2 /data/Avery/FASTQ/1056_S1_L001_R2_001.fastq \ The -1 is for read 1, and the -2 is for read 2

19 Now… Copy the MapPairedReads.pbs to your own PBS directory (from /projects/sreadgrp/homeworkfiles/ReSequencing/PBS/) Copy a pair of fastq files to your FASTQ directory (only copy the ones based on your group problem sheet)

20 First group to map, call variants, and visualize variants, wins
First group to map, call variants, and visualize variants, wins! (prizes are not amazing)


Download ppt "Dowell Short Read Class Phillip Richmond"

Similar presentations


Ads by Google