Presentation on theme: "Peter Tsai Bioinformatics Institute, University of Auckland"— Presentation transcript:
1 Peter Tsai Bioinformatics Institute, University of Auckland RNA SequencingPeter TsaiBioinformatics Institute, University of Auckland
2 What is RNA-seq?Study of transcriptomesIdentify known genes, exons, splicing events, ncRNA, miRNANovel genes or transcriptsAbundances of transcripts (quantitive expression)Differential expressed transcripts between different conditionsReconstructing transcriptome.
4 General workflow Raw data QC De novo transcriptome assembly Map to reference genomeDe novo transcriptome assemblyRequire downstream annotationEstimate abundanceNormalisationDifferential expression analysis
5 Quality checks and mapping Use FastQC, SolexQATrim off low quality region, keep only proper-paired readsMost QC software assume normality, but in RNA-seq data you will probably see none-normalityYou might see some duplicated reads, its probably due to highly expressed gene.Specific reference mapping tool that can map across splice junctions between exons, i.e. TophatSpecific de novo transcriptome assembly software for reconstruction of transcriptomes from RNA-seq data, i.e. Trinity
6 Expression value in RNA-seq The total number of reads mapped to a gene/transcript(Count data or raw counts or digital gene expression)Complexity of using simple countsSequencing depth: the higher the sequencing depth, the higher the countsGene length: Counts are proportional to the length of the gene times mRNA expression levelCounts distribution: difference on how counts are distributed among samples.
7 Normalisation RPKM (Mortazavi et al, 2008) Reads Per Kilobase of exon model per Million mapped readsFPKM (Mortazavi et al, 2010)Fragments Per Kilobase of exon model per Million mapped readsPaired-end RNA-Seq experiments produce two reads per fragment, but that doesn't necessarily mean that both reads will be mappable.
11 ERCC spike-in controlSet of external RNA transcripts with known concentration.Dynamic range and lower limit of detectionFold-change responseInternal control, in order to measure against defined performance criteria
12 Dynamic range and lower limit of detection The dynamic range can be measured as the difference between the highest and lowest concentration.Measure of sensitivity, and it is defined as the lowest molar amount of ERCC transcript detected in each sampleThe dynamic range can be measured as the difference between the highest and lowest concentration of ERCC transcript detected in each sample.The LLD is a measure of sensitivity, and it is defined as the lowest molar amount of ERCC transcript detected in each sample, with user-defined threshold values for determining detection.This translates to ~323,000 control molecules detected per 100 ng poly(A) RNA.
14 How much library depth is needed for RNA-seq? Depends on a number of factorsBiological questionsComplexity of the organismTypes of analysisTypes of RNA, miRNA, lncRNA.Literature search for similar workPilot experiment
15 Summary Have 3 or more biological replicates Analysis your data with different normalisation methodsPerform data explorationUse a standard spike-in as internal controlValidation with qPCR