Presentation on theme: "Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK."— Presentation transcript:
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK and R/Bioconductor offer many options – Start by mapping reads to reference genome with a mapping/alignment tool – deal with exon-intron junctions – Reconstruct transcripts from mapped reads – deal with alternate splicing products – Calculate relative abundance of different transcripts – Estimate biological significance based on annotation – Example tools: Bowtie/TopHat, Cufflinks, Myrna BIT 815: Analysis of Deep Sequencing Data
Workflow summary from a review “From RNA-seq reads to differential expression results”, by Oshlack et al, Genome Biol 11:220, 2010. Note emphasis on statistical analysis methods; an equal emphasis should be placed on experimental design.
The ‘Tuxedo’ suite of programs: Bowtie, TopHat, Cufflinks and CummeRbund See Trapnell et al, Nature Protocols 7:562 – 578, 2012 for details BIT 815: Analysis of Deep Sequencing Data
TopHat maps reads Cufflinks assembles transcripts Cuffmerge merges transcript data detected in different treatments Cuffdiff evaluates differential expression CummeRbund provides visualization tools
BIT 815: Analysis of Deep Sequencing Data Why merge data across treatments?
BIT 815: Analysis of Deep Sequencing Data Differential transcript abundance mechanisms
Transcriptome analysis Without a reference – First step is assembly – Transcriptome assembly pipelines Velvet/Oases – Oases is a post-assembly processor for Velvet Trans-ABySS (BCGSC) – based on ABySS parallel assembler Rnnotator – based on Velvet Trinity (Broad Institute) – a set of three programs – Common strategy: Assembly at multiple k-values, then merging of resulting contigs, followed by refinement – Once an assembly is available, continue with analysis as before BIT 815: Analysis of Deep Sequencing Data
After Transcriptome Assembly… BIT 815: Analysis of Deep Sequencing Data Some amount of analysis of differential splicing versus differential promoter activity is possible, but conclusions may be less robust in the absence of a reference The fraction of the total number of genes that can be discovered by RNA-seq depends on the diversity of tissue types and developmental stages analyzed, as well as the depth of sequencing
330 million SOLiD reads from a human cell line detect only about 67% of all annotated transcripts in the human genome. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Labaj et al, Bioinformatics 27:i383-91, 2011
Transcriptome analysis with RSEM RNA-Seq with Expectation Maximization Li & Dewey, BMC Bioinformatics 12:323, 2011 BIT 815: Deep Sequencing (a). Allows estimation of transcript abundance without a reference genome, based on alignments to assembled transcripts, although the transcripts can be taken from a reference genome sequence if it is available (b). Uses the Bowtie aligner by default, but considers reads that map to multiple locations in the reference transcript collection (c). For each sample, files of estimated transcript and isoform abundance are produced, along with SAM files of alignments. (d). The files of transcript and isoform abundance can be used to evaluate differential expression using tools from R and Bioconductor