RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.

Slides:



Advertisements
Similar presentations
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Advertisements

RNAseq.
DNAseq analysis Bioinformatics Analysis Team
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
High Throughput Sequencing
mRNA-Seq: methods and applications
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
Next generation sequencing Xusheng Wang 4/29/2010.
Genome-Wide SNP Discovery from de novo Assemblies of Pepper (Capsicum annuum ) Transcriptomes Hamid Ashrafi 1, Jiqiang Yao 2, Kevin Stoffel 1, Sebastian.
De-novo Assembly Day 4.
Expression Analysis of RNA-seq Data
Todd J. Treangen, Steven L. Salzberg
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
June 11, 2013 Intro to Bioinformatics – Assembling a Transcriptome Tom Doak Carrie Ganote National Center for Genome Analysis Support.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
Transcriptome Analysis
The iPlant Collaborative
RNA Sequencing I: De novo RNAseq
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Quality Control Hubert DENISE
RNA Sequence Assembly WEI Xueliang. Overview Sequence Assembly Current Method My Method RNA Assembly To Do.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
The iPlant Collaborative
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
The iPlant Collaborative
Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.
No reference available
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
De novo assembly of RNA Steve Kelly
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Meet the ants Camponotus floridanus Carpenter ant Harpegnathos saltator Jumping ant Solenopsis invicta Red imported fire ant Pogonomyrmex barbatus Harvester.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
When the next-generation sequencing becomes the now- generation Lisa Zhang November 6th, 2012.
How to design arrays with Next generation sequencing (NGS) data Lecture 2 Christopher Wheat.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Extract RNA, convert to cDNA RNA-Seq Empowers Transcriptome Studies Next-gen Sequencer (pick your favorite)
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Canadian Bioinformatics Workshops
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
de Novo Transcriptome Assembly
Quality Control & Preprocessing of Metagenomic Data
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Denovo genome assembly of Moniliophthora roreri
Transcriptomics II De novo assembly
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Kallisto: near-optimal RNA seq quantification tool
Transcriptome Assembly
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Sequence Analysis - RNA-Seq 2
Transcriptomics – towards RNASeq – part III
Schematic representation of a transcriptomic evaluation approach.
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日

Motivation Why transcriptome sequencing (RNA-seq)? Gene expression / differential expression Reconstruct transcripts Exon-exon-junction detection (genome annotation) Alternative splicing/isoforms SNP detection...

RNA-seq with Illumina Wang L, Brutnell T et al. (2010) Brief Funct Genomic Proteomic 9:

Constructing transcripts from RNA-Seq data Haas & Zody,

Why de novo assembly? No reference genome available for species Genomic sequence: Incomplete (even reference genomes!) Fragmented Altered

Run FASTQC first! Quality trimming Based on quality scores

Base quality

Data Quality Assessment Good Quality scores across reads Bad Filtering needed

Data Quality Assessment - FastQC GoodGC Distribution Bad

Adapter contamination

Data Quality Assessment Recommendations - Generate quality plots for all read libraries - Trim and/or filter data if needed Always trim and filter for de novo transcriptome assembly - Regenerate quality plots after trimming and filtering to determine effectiveness

TRIMMOMATIC example This will perform the following: Remove adapters Remove leading low quality or N bases (below quality 3) Remove trailing low quality or N bases (below quality 3) Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15 Drop reads below the 36 bases long Read and write files in gzipped format $ java -cp trimmomatic-0.15.jar org.usadellab.trimmomatic.TrimmomaticPE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz ILLUMINACLIP:adapters.fasta:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Workflow

Assembly strategy - k-mer construction Create all substrings of length k from the reads read

Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Create all substrings of length k from the reads read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Generate de Bruijn graph read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Generate de Bruijn graph read Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - de Bruijn-graph Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - de Bruijn-graph Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - k-mer construction Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - de Bruijn-graph Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Assembly strategy - splice isoforms Martin, J. A., and Wang, Z. (2011). Nature Reviews Genetics 12,

Software An (incomplete) selection Trinity (single k-mer) Broad Institute and Hebrew University of Jerusalem Trans-ABySS (multiple k-mers) Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency Velvet-Oases (single & multiple k-mers) EMBL-EBI / MPI for Molecular Genomics SOAPdenovo-Trans Beijing Genomics Institute ( 华大 ) CLC pipeline (not free) CLCbio

Example assembly Zhao et al. (2011). BMC Bioinformatics 12 Suppl 1, S2.

Runtime and memory usage Zhao et al. (2011). BMC Bioinformatics 12 Suppl 1, S M read pairs, on 20 CPUs

Assemblers De novo assemblers are prone to miss lowly expressed transcripts Multi k-mer approaches can improve assembly results Pool RNA-seq reads from different samples Assembler overview AssemblerRunning timeMemory requirements Trinity+++ Velvet-Oases○++ Trans-ABySS○- SOAPdenovo-○

Trinity

Inchworm Sequencing error correction Assemble candidate contigs Chrysalis Build de Bruijn transcript graphs from candidate contigs Butterfly Resolve alternatively spliced and paralogous transcripts k-mer length fixed to 25 Can incorporate results from reference-based analysis

Trinity Inchworm Sequencing error correction Assemble candidate contigs Chrysalis Build de Bruijn transcript graphs from candidate contigs Butterfly Resolve alternatively spliced and paralogous transcripts k-mer length fixed to 25 Can incorporate results from reference-based analysis

Trinity Inchworm Sequencing error correction Assemble candidate contigs Chrysalis Build de Bruijn transcript graphs from candidate contigs Butterfly Resolve alternatively spliced and paralogous transcripts k-mer length fixed to 25 Can incorporate results from reference-based analysis

Assembly QC - continuity Average length, min and max length, combined total length (N%) N50 captures how much of the assembly is covered by relatively large contigs “Half of all the sequences I’ve assembled are in contigs larger than {N50} bp …”

Assembly QC Ask these questions: Accuracy – How many of the assembled contigs map to genome? Accuracy – What are the contigs that do not align? (BLAST to nr) Completeness – How many previously annotated genes covered by the contigs? How many full-length? Contiguity – Does a single contig cover each gene? Compare results from multiple programs Martin et al., 2010

练习 Assemble rice RNA-seq data Compare two different assemblers Compare to standard rice gene models (MSU7.0)