Presentation on theme: "An Introduction to Studying Expression Data Through RNA-seq"— Presentation transcript:
1An Introduction to Studying Expression Data Through RNA-seq By Jason Van Houten
2Outline Why do we study RNA? What is RNA-seq? Issues with RNA quality Brief overview on how to make RNA-seq librariesChoices to make (depth, paired-end, cost, strand specificity, ect…)Examples
3Before we start… Please ask questions!! If I’m not clear or have a comment, don’t be afraid to stop me.There is more than one right way to do this type of analysisWhat I present is only a variationI am not the expert! I am always learning new things as well so if you have a thought or opinion don’t be afraid to chime in as well
4Gene Expression The Central Dogma Each level can tell us something differentGIVE CREDIT
5Why Study RNA over DNA? Functional studies Genome may be constant but an experimental condition has a pronounced effect on gene expressione.g. Drug treated vs. untreated cell linee.g. Wild type versus knock out miceSome molecular features can only be observed at the RNA levelAlternative isoforms, fusion transcripts, RNA editingPredicting transcript sequence from genome sequence is difficultAlternative splicing, RNA editing, etc.EDIT!!! NOT MY WORDS!!
6Why Study RNA over DNA?Interpreting mutations that do not have an obvious effect on protein sequence‘Regulatory’ mutations that affect what mRNA isoform is expressed and how muche.g. splice sites, promoters, exonic/intronic splicing motifs, etc.Prioritizing protein coding somatic mutations (often heterozygous)If the gene is not expressed, a mutation in that gene would be less interestingIf the gene is expressed but only from the wild type allele, this might suggest loss-of-function (haploinsufficiency)If the mutant allele itself is expressed, this might suggest a candidate drug targetEDIT!!! NOT MY WORDS!!
7What is RNA-seq Whole Transcriptome Shotgun Sequencing High-throughput sequencing of cDNA to gain information about that samples RNA content.“Transcription Snap-shot”Know the expression levels of every gene in the genome at that particular point in time.
8What is Next Gen Seq? Video! Briefly discusses library prep and how sequencing works
9We start by asking a question Condition 1(normal colon)Condition 2(colon tumor)
10We start by asking a question Condition 1(normal colon)Condition 2(colon tumor)What genes are turned on or off during these conditions?What about whole gene pathways?Change of expression of one gene effect the expression of many
11RNA sequencing Overview Fragment,generate cDNA, add adapters, size select, PCR amplifyIsolate RNAsSamples of interestCondition 1(normal colon)Condition 2(colon tumor)Sequence endsMap to genome, transcriptome, and predicted exon junctionsEDIT and Give Credit100s of millions of paired reads10s of billions bases of sequenceDownstream analysis
12Challenges of Studying RNA RNAs consist of small exons that may be separated by large intronsMapping reads to genome is challengingThe relative abundance of RNAs vary wildly105 – 107 orders of magnitudeSince RNA sequencing works by random sampling, a small fraction of highly expressed genes may consume the majority of readsRibosomal and mitochondrial genesRNAs come in a wide range of sizesSmall RNAs must be captured separatelyPolyA selection of large RNAs may result in 3’ end biasRNA is fragile compared to DNA (easily degraded)
19Best Practice www.invitrogen.com RNA is highly susceptible to degradation by RNAse enzymes. RNAse enzymes are present in cells and tissues and can be carried on hands, labware, or even dust. They are very stable and difficult to inactivate. For these reasons, it is important to follow best laboratory practices while preparing and handling RNA samples.When harvesting total RNA, use a method that quickly disrupts tissue and isolates and stabilizes RNAWear gloves and use sterile technique at all timesReserve a set of pipettes for RNA work. Use sterile RNAse-free filter pipette tips to prevent cross-contaminationUse disposable plasticware that has been certified to be RNAse-free.All reagents should be prepared from RNAse-free components, including ultrapure waterStore RNA samples by freezing. Keep samples on ice at all times while working with them. Avoid extended pauses in the protocol until the RNA has been reverse transcribed into DNAUse RNAse/DNAse decontamination solution to decontaminate work surfaces and equipmentEDIT to make shorter
22Length of Reads/single vs. paired Longer reads gives you better alignment confidenceMaximizes sequencing coverage on the flow cellAverage number of sequences representing a particular region of the transcriptomePaired ends help deduce large insertions/deletions/rearrangementsDrawback- It costs more
23Depth The number of reads per sample/library More depth means more likely to see genes that are very low expressed~200 million reads can be generated per lane on a flowcellNat Rev Genet , 57-63
24How much depth do you need? Depends on applicationDifferential gene expression, variant detection10x – 30x coverageIf your interested in lower expressed genes, then you still might need more.For applications like transcriptome assembliesMuch more depth neededSo we choose how much we want to add to a lane for sequencing depending on how much depth we need.
25MultiplexingAdd a “barcode” to each sample/library then mix and sequenceA string of unique nucleotides within the adapterUsing barcode, sequenced reads can be traced back to their appropriate sample.BarcodesmixedSequencingB3B1B2B4
26CostIn our lab, it only cost us about ~$30 a library to construct ourselves.Additionally, you have sequencing costsDepends on length(cycles)/paired endDepends on facility and machine$ per lane HiSeq2000Again, multiplexing reduces cost per sample
27Advantages of RNA-Seq compared with other transcriptomics methods
28Typical Differential Gene Expression Workflow Raw readsFilter ReadsAssemble transcriptomeAlign to reference genome/transcriptomeCount reads that map to genesRun statistical testsEvaluate genes that are differentially expressed
30FPKM (RPKM): Expression Normalization Fragments (Reads) Per Kilobase of exon model per Million mapped fragmentsC= the number of reads mapped onto the gene's exons (raw counts)N= total number of reads in the experimentL= the sum of the exons in base pairs (size of gene).Example 1: Large gene #1 with 100 reads and small gene #2 with 100 readsGene1<gene2Example 2: library 1 has half the depth of library 2. Gene 1 has 50 reads in library 1 and 100 reads in library 2Expression for gene1 is the same
31Conclusions We know what RNA-seq is! RNA quality is very important Library preparationNext Generation SequencingRNA quality is very important3’ biasTips to protectThings to consider within the cost versus information balanceIntroduced some analysis
32AcknowledgementsI thank HHMI, the van der Knaap lab, Dr. Dean Fraga and everyone involved in this workshop for making this possible