Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNA-Seq as a Discovery Tool

Similar presentations


Presentation on theme: "RNA-Seq as a Discovery Tool"— Presentation transcript:

1 RNA-Seq as a Discovery Tool
Julia Salzman

2 Deciphering the Genome

3 Power of RNA-Seq: Quantification and Discovery
Salzman, Gawad, Wang Lacayo, Brown, 2012 RNA Isoform specific gene expression Gene fusions Overlooked RNA structural variants

4 Paired-end RNA-Seq Matched sequences are obtained for each library molecule CTTC…..GAAG GGAC…..GCCT Data: millions of bp A/C/G/T sequences

5 Part 1: Isoform Specific Expression

6 Example: Paired-end Data Aligned
Some reads are informative about isoform-specific expression

7 Paired-end RNA-Seq for RNA Isoform Specific Gene Expression
Exon 4 Exon 1 Since the size distribution of library molecules is known, inferred insert lengths can be used to increase statistical power and inference Rnpep Goal: estimate the expression of each isoform? Nontrivial : we only observe fragments of sequences

8 Insert Length Distributions
Insert lengths of entire library (pooled) can be calculated and used to precisely estimate the distribution of sizes of cDNA in the library: Base pairs Sequenced molecule length

9 Paired-end RNA-Seq Model
Compute genome-wide insert length distribution Base pairs Sequenced molecule length Mapped to Isoform 1  length 150 Mapped to Isoform 2  length 90 Salzman, Jiang, Wong 2011

10 Using PE for quantification is statistically more powerful
PE model is a statistical improvement over naïve models and has optimal information reduction “Information” gain using PE Sequencing Overall, using “mate pair” information, more power, but sometimes experimental artifacts can effect results

11 Paired-end Size Distributions are Foundation for Tophat and other
PE-RNA Seq Algorithms Summary and Problems: rely on a reference assume uniformity of size distributions in library over look biases’ Rep.1 Rep.2

12 Paired-End RNA-Seq for Gene Fusions in Ovarian Tumors (2009)
Paired-end sequencing of poly-A selected RNA from 12 late stage tumors– genome wide search Top hit of our novel algorithm : ESRRA-C11orf20 C11orf20 ESRRA Fusion Isoform-specific estimation: ESRRA and the fusion are expressed at roughly equal magnitude (Salzman, Jiang, Wong)

13 Part 2: Gene Fusions

14 Recurrent Gene Fusions in Cancer
A handful of recurrent fusions in solid tumors PAX8 -PPARγ fusion (thyroid cancer) EML4-ALK fusion (non small cell lung cancer) TMPRSS2-ERG family fusion (prostate cancer) Not Genome-wide More to be learned by unbiased study of RNA

15 Fusion Discovery 2 flavors Totally “de novo” discovery
Search for any RNA fragments out of order with respect to the reference genome– not necessarily coinciding with exon boundaries Noisy Discovery with a reference database Discover fusions at annotated exon boundaries (protein coding) and better statistical checks Misses some fusions

16 Reference Approach Search for gene fusions with exon A in gene 1 spliced to exon B of gene 2 Exon A Exon B

17 Algorithm (with respect to reference)
Remove all PE reads consistent with the reference Identify gene pairs PE reads where (read1, read2) map to (gene1, gene2) Find PE reads of the form: (gene A, gene A-B junction) Exon A Exon B

18 Paired-End RNA-Seq for Gene Fusions in Ovarian Tumors
Paired-end sequencing of poly-A selected RNA from 12 late stage tumors– genome wide search Top hit of our algorithm : ESRRA-C11orf20 C11orf20 ESRRA Fusion Isoform-specific estimation: ESRRA and the fusion are expressed at roughly equal magnitude (Salzman, Jiang, Wong) Salzman et al, 2011

19 Part 3: Exploratory Analysis of RNA Rearrangements

20 Exploratory analysis: biological “noise” in RNA-Seq Data
Wildtype genome: DNA Canonical transcript Locally rearranged DNA Scrambled transcript Is exon scrambling present in rRNA-depleted RNA?

21 Bioinformatic Analysis
Thousands of exon scrambling events in RNA from human leukocytes and cancer samples Wildtype genome: DNA Canonical transcript Inconsistent with the reference genome!

22 Potential Biological Mechanisms for RNA Rearrangements
DNA Rearrangement RNA rearrangement Trans-splicing Template switching PCR artifact

23 Analysis of Leukocyte Data
Exons in ‘scrambled’ (non-increasing) order with respect to canonical exon order Thousands of genes with evidence of exon scrambling Naïve estimate of fractional abundance of scrambled read rate: all read rate (per transcript)

24 100s of Transcripts with High Fractions of Scrambled Isoforms
Canonical Isoform 100s of genes < 25% Scrambled Isoform > 75% 100s of transcripts from B cells, stem cells and neutrophils have >50% copies from scrambled isoform

25 What Models Can Explain Exon Scrambling in RNA?

26 Model 1 to Explain RNA Exon Scrambling

27 Model 1 Prediction Can be made statistically precise
Model 1 is statistically inconsistent with vast majority of data A subset of genes have evidence of tandem duplication in mRNA Against Model 1 For Model 1 2000- 1000- 100 - Transcripts with evidence

28 Alternative Model Model and data are consistent

29 Mining RNA-Seq Data for Evidence Consistent with Circular RNA?
In poly-A depleted samples, expect to see strong evidence of scrambled exons (circular RNA) In poly-A selected samples, expect to see little evidence of scrambled exons (circular RNA)

30 Poly-A Depleted Samples Enriched for Scrambled Exons
Align all reads to a custom database

31 Summary of RNA-Seq for NGS
RNA-Seq can be used for discovery Tophat and other fusion/splicing algorithms gives a broad picture May have significant noise Miss important features of RNA expression

32 (feel free to contact me for the algorithm to identify circular RNA!)
Currently, all published/downloadable algorithms will miss identifying circular RNA! (feel free to contact me for the algorithm to identify circular RNA!) In poly-A depleted samples, expect to see strong evidence of scrambled exons (circular RNA) In poly-A selected samples, expect to see little evidence of scrambled exons (circular RNA)


Download ppt "RNA-Seq as a Discovery Tool"

Similar presentations


Ads by Google