RNA-seq workshop ALIGNMENT

1 RNA-seq workshop ALIGNMENT
Erin Osborne Nishimura

2 Alignment with Tophat2

3 There are many alignment tools available
Nuno A. Fonseca et al. Bioinformatics 2012;28: © The Author Published by Oxford University Press.

4 Which aligner is best? What type of data do you have?
What is your research question?

5 Methods of alignment Splice awareness: What will be matched first?
Splice unaware (Bowtie, BWA) Faster Splice aware (Tophat, MapSplice, SpliceMap) Slower Yields more information on splice junctions What will be matched first? Whole genome? Known transcriptome? A short segment of each read first?

6 Why tophat? Popular Splice aware de novo or sequenced genome modes
Transcriptome or whole genome assembly Lots of options for customization Drawbacks Lots of parameters to set & optimize

7 Tophat2 – how does it work?
Kim et al., 2013 Genome Biology

8 Tophat2 versus Tophat1

9 The good news… … choice of aligner does not have a major impact on genes identified as differentially expressed, compared to other choices. Fonseca, 2014

10 Switch to Tophat2 Tutorial

11 Generating genome browser tracks

12 Ah, those beautiful browser tracks…
Brooks and Yang et al., 2011, Genome Research

13 Today’s simple analysis pipeline
.fastq file trimmomatic/ _trim.fastq file TOPHAT2 .bam/.sam file HTseq bedtools genomecov counts.txt file .bg file bedGraphToBigWig DESeq2/R .bw file Differentially Abundant genes IGV/UCSC Pretty browser shots

14 I have included an example script for Requires bedGraphToBigWig Requires bedtools Performs Normalization Normalize to read depth One option Scale = (#bps in genome) (#bp per read) x (# mapped reads)

15 Two most common platforms
IGV Locally installed UCSC Genome Browser Upload required

16 Visual inspection of each normalized replicate is critical…

