Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Studying Expression Data Through RNA-seq

Similar presentations

Presentation on theme: "An Introduction to Studying Expression Data Through RNA-seq"— Presentation transcript:

1 An Introduction to Studying Expression Data Through RNA-seq
By Jason Van Houten

2 Outline Why do we study RNA? What is RNA-seq? Issues with RNA quality
Brief overview on how to make RNA-seq libraries Choices to make (depth, paired-end, cost, strand specificity, ect…) Examples

3 Before we start… Please ask questions!!
If I’m not clear or have a comment, don’t be afraid to stop me. There is more than one right way to do this type of analysis What I present is only a variation I am not the expert! I am always learning new things as well so if you have a thought or opinion don’t be afraid to chime in as well

4 Gene Expression The Central Dogma Each level can tell us
something different GIVE CREDIT

5 Why Study RNA over DNA? Functional studies
Genome may be constant but an experimental condition has a pronounced effect on gene expression e.g. Drug treated vs. untreated cell line e.g. Wild type versus knock out mice Some molecular features can only be observed at the RNA level Alternative isoforms, fusion transcripts, RNA editing Predicting transcript sequence from genome sequence is difficult Alternative splicing, RNA editing, etc. EDIT!!! NOT MY WORDS!!

6 Why Study RNA over DNA? Interpreting mutations that do not have an obvious effect on protein sequence ‘Regulatory’ mutations that affect what mRNA isoform is expressed and how much e.g. splice sites, promoters, exonic/intronic splicing motifs, etc. Prioritizing protein coding somatic mutations (often heterozygous) If the gene is not expressed, a mutation in that gene would be less interesting If the gene is expressed but only from the wild type allele, this might suggest loss-of-function (haploinsufficiency) If the mutant allele itself is expressed, this might suggest a candidate drug target EDIT!!! NOT MY WORDS!!

7 What is RNA-seq Whole Transcriptome Shotgun Sequencing
High-throughput sequencing of cDNA to gain information about that samples RNA content. “Transcription Snap-shot” Know the expression levels of every gene in the genome at that particular point in time.

8 What is Next Gen Seq? Video!
Briefly discusses library prep and how sequencing works

9 We start by asking a question
Condition 1 (normal colon) Condition 2 (colon tumor)

10 We start by asking a question
Condition 1 (normal colon) Condition 2 (colon tumor) What genes are turned on or off during these conditions? What about whole gene pathways? Change of expression of one gene effect the expression of many

11 RNA sequencing Overview
Fragment,generate cDNA, add adapters, size select, PCR amplify Isolate RNAs Samples of interest Condition 1 (normal colon) Condition 2 (colon tumor) Sequence ends Map to genome, transcriptome, and predicted exon junctions EDIT and Give Credit 100s of millions of paired reads 10s of billions bases of sequence Downstream analysis

12 Challenges of Studying RNA
RNAs consist of small exons that may be separated by large introns Mapping reads to genome is challenging The relative abundance of RNAs vary wildly 105 – 107 orders of magnitude Since RNA sequencing works by random sampling, a small fraction of highly expressed genes may consume the majority of reads Ribosomal and mitochondrial genes RNAs come in a wide range of sizes Small RNAs must be captured separately PolyA selection of large RNAs may result in 3’ end bias RNA is fragile compared to DNA (easily degraded)


14 mRNA Selection

15 mRNA Selection

16 mRNA Selection

17 Quality Agilent Bioanalyzer Very good RNA, RIN of 10
Still good, RIN of 8.9 Starting to get worse, RIN of 6.3

18 Quality RIN 3 RIN 2.2

19 Best Practice
RNA is highly susceptible to degradation by RNAse enzymes. RNAse enzymes are present in cells and tissues and can be carried on hands, labware, or even dust. They are very stable and difficult to inactivate. For these reasons, it is important to follow best laboratory practices while preparing and handling RNA samples. When harvesting total RNA, use a method that quickly disrupts tissue and isolates and stabilizes RNA Wear gloves and use sterile technique at all times Reserve a set of pipettes for RNA work. Use sterile RNAse-free filter pipette tips to prevent cross-contamination Use disposable plasticware that has been certified to be RNAse-free. All reagents should be prepared from RNAse-free components, including ultrapure water Store RNA samples by freezing. Keep samples on ice at all times while working with them. Avoid extended pauses in the protocol until the RNA has been reverse transcribed into DNA Use RNAse/DNAse decontamination solution to decontaminate work surfaces and equipment EDIT to make shorter


21 Now we are ready to sequence!

22 Length of Reads/single vs. paired
Longer reads gives you better alignment confidence Maximizes sequencing coverage on the flow cell Average number of sequences representing a particular region of the transcriptome Paired ends help deduce large insertions/deletions/rearrangements Drawback- It costs more

23 Depth The number of reads per sample/library
More depth means more likely to see genes that are very low expressed ~200 million reads can be generated per lane on a flowcell Nat Rev Genet , 57-63

24 How much depth do you need?
Depends on application Differential gene expression, variant detection 10x – 30x coverage If your interested in lower expressed genes, then you still might need more. For applications like transcriptome assemblies Much more depth needed So we choose how much we want to add to a lane for sequencing depending on how much depth we need.

25 Multiplexing Add a “barcode” to each sample/library then mix and sequence A string of unique nucleotides within the adapter Using barcode, sequenced reads can be traced back to their appropriate sample. Barcodes mixed Sequencing B3 B1 B2 B4

26 Cost In our lab, it only cost us about ~$30 a library to construct ourselves. Additionally, you have sequencing costs Depends on length(cycles)/paired end Depends on facility and machine $ per lane HiSeq2000 Again, multiplexing reduces cost per sample

27 Advantages of RNA-Seq compared with other transcriptomics methods

28 Typical Differential Gene Expression Workflow
Raw reads Filter Reads Assemble transcriptome Align to reference genome/transcriptome Count reads that map to genes Run statistical tests Evaluate genes that are differentially expressed

29 Strand Specificity BMC Genomics 2012, 13:721 BMC Genomics 2012, 13:721

30 FPKM (RPKM): Expression Normalization
Fragments (Reads) Per Kilobase of exon model per Million mapped fragments C= the number of reads mapped onto the gene's exons (raw counts) N= total number of reads in the experiment L= the sum of the exons in base pairs (size of gene). Example 1: Large gene #1 with 100 reads and small gene #2 with 100 reads Gene1<gene2 Example 2: library 1 has half the depth of library 2. Gene 1 has 50 reads in library 1 and 100 reads in library 2 Expression for gene1 is the same

31 Conclusions We know what RNA-seq is! RNA quality is very important
Library preparation Next Generation Sequencing RNA quality is very important 3’ bias Tips to protect Things to consider within the cost versus information balance Introduced some analysis

32 Acknowledgements I thank HHMI, the van der Knaap lab, Dr. Dean Fraga and everyone involved in this workshop for making this possible

33 Thank You! Questions?

Download ppt "An Introduction to Studying Expression Data Through RNA-seq"

Similar presentations

Ads by Google