Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Similar presentations

Presentation on theme: "Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520"— Presentation transcript:

1 Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

2 RNA-seq Protocol Martin and Wang Nat. Rev. Genet. (2011)

3 RNA-seq Applications Expression levels, differential expression
Alternative splicing, novel isoforms Novel genes or transcripts, lncRNA Detect gene fusions Many different protocols Can use on any sequenced genome Better dynamic range, cleaner data

4 Experimental Design Assessing biological variation requires biological replicates (no need for technical replicates) 3 preferred, 2 OK, 1 only for exploratory assays (not good for publications) For differential expression, don’t pool RNA from multiple biological replicates Batch effects still exist, try to be consistent or process all samples at the same time

5 Experimental Design Ribo-minus (remove too abundant genes)
PolyA (mRNA, enrich for exons) Strand specific (anti-sense lncRNA) Sequencing: PE (resolve redundancy) or SE: expression PE for splicing, novel transcripts Depth: 30-50M differential expression, deeper transcript assembly Read length: longer for transcript assembly

6 RNA-seq Analysis

7 Alignment Prefer splice-aware aligners TopHat, BWA, STAR (not DNASTAR)
Sometimes need to trim the beginning bases

8 Reference-based assembly
Transcript Assembly Reference-based assembly Cufflinks De novo assembly Trinity

9 Quality Control: RSeQC

10 Expression Index RPKM (Reads per kilobase of transcript per million reads of library) Corrects for coverage, gene length 1 RPKM ~ transcript / cell Comparable between different genes within the same dataset TopHat / Cufflinks FPKM (Fragments), PE libraries, RPKM/2 TPM (transcripts per million) Normalizes to transcript copies instead of reads Longer transcripts have more reads RSEM, HTSeq

11 Differential Expression

12 Sequencing Read Distribution
Poisson distribution: # events within an interval Sequencing data is overdispersed Poisson Negative binomial Def: # of successes before r failures occur, if Pb(each success) is p

13 Differential Expression
Negative binomial for RNA-seq Variance estimated by borrowing information from all the genes – hierarchical models Test whether μi is the same for gene i between samples j FDR?

14 Differential Expression
Should we do differential expression on RPKM/FPKM or TPM? Cufflinks: RPKM/FPKM LIMMA-VOOM and DESeq: TPM Power to detect DE is proportional to length Continued development and updates Gene A (1kb) Gene B (8kb)

15 Alternative Splicing Assign reads to splice isoforms

16 Isoform Inference If given known set of isoforms
Estimate x to maximize the likelihood of observing n

17 Known Isoform Abundance Inference

18 Isoform Inference With known isoform set, sometimes the gene-level expression level inference is great, although isoform abundances have big uncertainty (e.g. known set incomplete) De novo isoform inference is a non-identifiable problem if RNA-seq reads are short and gene is long with too many exons Algorithm: MATS

19 Gene Fusion More seen in cancer samples Still a bit hard to call
TopHatFusion in TopHat2 Maher et al, Nat 2009

20 Other Applications RNA editing Circular RNA
Change on RNA sequence after transcription Most frequent: A to I (behaves like G), C to U Evolves from mononucleotide deaminases, might be involved in RNA degradation Circular RNA Mostly arise from splicing Varying length, abundance, and stability Possible function: sponge for RBP or miRNA

21 Summary RNA-seq design considerations Read mapping
TopHat, BWA, STAR De novo transcriptome assembly: TRINITY Expression index: FPKM and TPM Differential expression Cufflinks: versatile LIMMA-VOOM and DESeq: better variance estimates Alternative splicing: MATS Gene fusion, genome editing, circular RNA

22 Acknowledgement Alisha Holloway Simon Andrews Radhika Khetani

Download ppt "Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520"

Similar presentations

Ads by Google