4 RNA sequencing Isolate RNAs Generate cDNA, fragment, size select, add linkersSamples of interestCondition 1(normal colon)Condition 2(colon tumor)Sequence endsMap to genome, transcriptome, and predicted exon junctions100s of millions of paired reads10s of billions bases of sequenceDownstream analysis
5 Metholologies for RNA-Seq studies Mapping transcription start sitesStrand-specific RNA-SeqCharacterization of alternative splicing patternsGene fusion detectionTargeted approaches using RNA-SeqSmall RNA profilingDirect RNA sequencingProfiling low-quantity RNA samples
6 Pre NGS Transcriptomics Hybridization-based approachesGenomic tiling microarraysFluorescently labelled cDNA with microarraysSequence-based approachesSanger sequencing of cDNA or EST librariesSerial analysis of gene expression (SAGE)Cap analysis of gene expression (CAGE)Massively parallel signature sequencing (MPSS)
8 ChallengesRNAs consist of small exons that may be separated by large intronsMapping reads to genome is challengingThe relative abundance of RNAs vary wildly105 – 107 orders of magnitudeSince RNA sequencing works by random sampling, a small fraction of highly expressed genes may consume the majority of readsRibosomal and mitochondrial genesRNAs come in a wide range of sizesSmall RNAs must be captured separatelyPolyA selection of large RNAs may result in 3’ end biasRNA is fragile compared to DNA (easily degraded)Bacterial samples may need to be depleted of rRNA
10 RNA-seq library prep methodologies Two main routes for mRNA-seq preparationIllumina TruSeq prepScript-seqGenerally Script-seq is our favourite
11 RNA Illumina Tru-Seq library prep 2 days for 8 samplesSize selection stepAdaptor ligation and standard library preparation5ug of total RNA~$100 per sampleNot strand-specific
12 Script-seq method 2 hours for 12 samples < 1ug of RNA ~$150 per sampleStrand-specific
13 DNA library preparation: RNA fragmentation and DNA fragmentation compared a | Fragmentation of oligo-dT primed cDNA (blue line) is more biased towards the 3' end of the transcript. RNA fragmentation (red line) provides more even coverage along the gene body, but is relatively depleted for both the 5' and 3' ends. Note that the ratio between the maximum and minimum expression level (or the dynamic range) for microarrays is 44, for RNA-Seq it is 9,560. The tag count is the average sequencing coverage for 5,000 yeast ORFs. b | A specific yeast gene, SES1 (seryl-tRNA synthetase), is shown.
14 Common questions: How much library depth is needed for RNA-seq? My advice. Don’t ask this question if you want a simple answer…Depends on a number of factors:Question being asked of the data. Gene expression? Alternative expression? Mutation calling?Tissue type, RNA preparation, quality of input RNA, library construction method, etc.Sequencing type: read length, paired vs. unpaired, etc.Computational approach and resourcesIdentify publications with similar goalsPilot experimentGood news: 1/8th -1 lane of recent Illumina HiSeq data should be enough for most purposes
16 Common questions: What mapping strategy should I use for RNA-seq? Depends on read length< 50 bp readsUse aligner like BWA and a genome + junction databaseJunction database needs to be tailored to read lengthOr you can use a standard junction database for all read lengths and an aligner that allows substring alignments for the junctions only (e.g. BLAST … slow).Assembly strategy may also work (e.g. Trans-ABySS)> 50 bp readsSpliced aligner such as TopHat or Trinity
17 Common questions: how reliable are expression predictions from RNA-seq? Are novel exon-exon junctions real?What proportion validate by RT-PCR and Sanger sequencing?Are differential/alternative expression changes observed between tissues accurate?How well do differential expression values correlate with qPCR?384 validationsqPCR, RT-PCR, Sanger sequencingSee ALEXA-Seq publication for details:Also includes comparison to microarraysGriffith et al. Alternative expression analysis by RNA sequencing. Nature Methods Oct;7(10):
18 Common questions: How many replicates? As many as you can affordTophat/Cufflinks statistics work best with three or more biological replicates
21 Spike-in controlsHow can you identify limits of detection and ensure your data can be compared to future platforms or new library prep methods? (e.g. How does Oxford Nanopore compare to Illumina sequencing?)Spike-in RNA to your total RNA which has a known concentrationCost - $20 per sample
25 Take home Good quality total RNA of 1-10ug Have 3 or more biological replicatesUnless you have good reason, use a Script-seq type protocolUse a standard spike-in as an internal control and to ensure samples can be compared across platformsDon’t forget to validate key findings with qPCR!