TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping

Name: TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
Uploaded: 2017-08-04T09:32:03+00:00
Duration: PTM9S48
Channel: Holly May
Description: TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping

TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
Center for Bioinformatics Hanqing Zhao

Missions for RNA-Seq Mapping

Before Tophat Previous software for aligning RNA-Seq data relies on known splice junctions and cannot identify novel ones.

TOPHAT Tophat is designed to align reads from RNA-Seq experiment to a reference genome without relying on known splicing sites. Tophat is free and available from

Patterns of alternative splicing
adfafdaf Xing et al. 2006

Tophat pipeline Trapnell et al. 2009

Step I: mapping with Bowtie
Adjustable parameters: -mismatches -multireads No more than a few mismatches (two, by default) in the 5-most s bases of the read The Phred-quality-weighted Hamming distance is less than a specified threshold (70 by default). TopHat allows Bowtie to report more than one alignment for a read (default = 10)

Step II. island assembly
Use Maq assembly module to produce pseudo-consensus exons (islands). Use reference genome to call bases. Merge exon gaps(6bp). Elongate 45bp to both sides of each islands. Adjustable parameters: -consensus call -flanking extention -gap merge

Step III. Creating candidate junction database
TopHat first enumerates all canonical donor and acceptor sites within the island sequences (as well as their reverse complements). Next, it considers all pairings of these sites that could form canonical (GT–AG) introns between neighboring (but not necessarily adjacent) islands. By default, TopHat only examines potential introns longer than 70 bp and shorter than bp.

Single island junctions
In order to detect such junctions without sacrificing performance and specificity, TopHat looks for introns within islands that are deeply sequenced.

Step IV. Looking for junction reads
Each possible intron is checked against the IUM reads for reads that span the splice junction. The seed-and-extend strategy is used to match reads to possible splice sites. TopHat only examines the first 28 bp on the 5 end of each read by default. Default : k=5bp s=28bp s-2k+1 seeds TopHat will miss spliced alignments to reads with mismatches in the seed region of the splice junction

Step V. Filtering false junctions
Wang et al. (2008) observed that 86% of the minor isoforms were expressed at least 15% of the level of the major isoform. For each junction, the average depth of read coverage is computed for the left and right flanking regions of the junction separately. The number of alignments crossing the junction is divided by the coverage of the more deeply covered side to obtain an estimate of the minor isoform frequency. 15% is the default cut-off.

Old Tophat’s pipeline Trapnell et al. 2009

Reads are becoming longer, and paired-sequencing are more and more common …

Current Tophat (latest 1.3.1)
Segment Search Butterfly search Closure search Coverage Search Gene model annotations

I. Segment search --segment-length --segment-mismatches
--min-segment-intron --max-segment-intron

I. Segment search

II. Closure search --closure-search --no-coverage-search
--min-closure-intron --max-closure-intron Closure search is only used when TopHat is run with paired end reads Closure search should only be used when the expected inner distance between mates is small (<= 50bp)

III. Coverage search --coverage-search :disabled for reads 75bp or longer --no-coverage-search --min-coverage-intron --max-coverage-intron

IV. Butterfly search --butterfly-search
Consider using this if you expect that your experiment produced a lot of reads from pre-mRNA, that fall within the introns of your transcripts.

V. Junction annotations
-G/--GTF <GTF 2 or GFF3> -j/--raw-juncs <.juncs file>. --no-novel-junctions Only look for reads across junctions indicated in the supplied GFF or junctions file.

Input Reference sequence indexed by bowtie_index Fastq sequences
tophat [options]* <index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2] Reference sequence indexed by bowtie_index Fastq sequences Quality format ? phred33 (default) --solexa-quals --solexa1.3-quals Paired-ends ? Strand-specific ? Multi-files ?

The software is optimized for reads 75bp or longer.
Mixing paired- and single- end reads together is not supported.

Strand-specific data --library-type TopHat will treat the reads as strand specific.

Paired-end data -r/--mate-inner-dist <int>
This is the expected (mean) inner distance between mate pairs. --mate-std-dev <int> The standard deviation for the distribution on inner distances between mate pairs.

Other parameters --bowtie-n (after tophat 1.3.0) -g/--max-multihits -a/--min-anchor-length (>=3, default 8) -m/--splice-mismatches (default 0) -F/--min-isoform-fraction < > -p/--num-threads --keep-tmp

Output accepted_hits.bam A list of read alignments in SAM format.
junctions.bed insertions.bed deletions.bed

References Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics doi: /bioinformatics/btp120 Tophat manual Further Readings： Tophat-fusion

Practice time All the files are at: ngs_vm1:
Reference sequence: REF.fa RNA-seq data 1: SampleA.Run01 SampleA.Run02 paired-end 50nt at each end Phred33 quality RNA-seq data 2 SampleB.Run01 75nt at each end strand-specific solexa1.3-quals

Index the genome sequence
bowtie-build REF.fa REF

Run tophat tophat --version # update is frequent; version is important
tophat # go through all the parameters tophat \ -o sampleA.ouput \ -r \ --mate-std-dev 30 \ REF \ SampleA.Run01.1.fastq,SampleA.Run02.1.fastq \ SampleA.Run01.2.fastq,SampleA.Run02.2.fastq

Run tophat tophat \ -o sampleB.ouput \ -r 50 \ --mate-std-dev 30 \
--library-type fr-firststrand \ --solexa1.3-quals \ REF \ SampleB.Run01.1.fastq \ SampleB.Run01.2.fastq

TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping

Similar presentations

Presentation on theme: "TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping

Similar presentations

Presentation on theme: "TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping"— Presentation transcript:

Similar presentations

About project

Feedback