7 Step I: mapping with Bowtie Adjustable parameters: -mismatches -multireadsNo more than a few mismatches (two, by default) in the 5-most s bases of the readThe Phred-quality-weighted Hamming distance is less than a specified threshold (70 by default).TopHat allows Bowtie to report more than one alignment for a read (default = 10)
8 Step II. island assembly Use Maq assembly module to produce pseudo-consensus exons (islands).Use reference genome to call bases.Merge exon gaps(6bp).Elongate 45bp to both sides of each islands.Adjustable parameters:-consensus call-flanking extention-gap merge
9 Step III. Creating candidate junction database TopHat first enumerates all canonical donor and acceptor sites within the island sequences (as well as their reverse complements).Next, it considers all pairings of these sites that could form canonical (GT–AG) introns between neighboring (but not necessarily adjacent) islands.By default, TopHat only examines potential introns longer than 70 bp and shorter than bp.
10 Single island junctions In order to detect such junctions without sacrificing performance and specificity, TopHat looks for introns within islands that are deeply sequenced.
11 Step IV. Looking for junction reads Each possible intron is checked against the IUM reads for reads that span the splice junction.The seed-and-extend strategy is used to match reads to possible splice sites.TopHat only examines the first 28 bp on the 5 end of each read by default.Default : k=5bp s=28bp s-2k+1 seedsTopHat will miss spliced alignments to reads with mismatches in the seed region of the splice junction
12 Step V. Filtering false junctions Wang et al. (2008) observed that 86% of the minor isoforms were expressed at least 15% of the level of the major isoform.For each junction, the average depth of read coverage is computed for the left and right flanking regions of the junction separately.The number of alignments crossing the junction is divided by the coverage of the more deeply covered side to obtain an estimate of the minor isoform frequency.15% is the default cut-off.
18 II. Closure search --closure-search --no-coverage-search --min-closure-intron--max-closure-intronClosure search is only used when TopHat is run with paired end readsClosure search should only be used when the expected inner distance between mates is small (<= 50bp)
19 III. Coverage search--coverage-search :disabled for reads 75bp or longer--no-coverage-search--min-coverage-intron--max-coverage-intron
20 IV. Butterfly search --butterfly-search Consider using this if you expect that your experiment produced a lot of reads from pre-mRNA, that fall within the introns of your transcripts.
21 V. Junction annotations -G/--GTF <GTF 2 or GFF3>-j/--raw-juncs <.juncs file>.--no-novel-junctions Only look for reads across junctions indicated in the supplied GFF or junctions file.
22 Input Reference sequence indexed by bowtie_index Fastq sequences tophat [options]* <index_base> <reads1_1[,...,readsN_1]> [reads1_2,...readsN_2]Reference sequence indexed by bowtie_indexFastq sequencesQuality format ?phred33 (default)--solexa-quals--solexa1.3-qualsPaired-ends ?Strand-specific ?Multi-files ?
23 The software is optimized for reads 75bp or longer. Mixing paired- and single- end reads together is not supported.
24 Strand-specific data--library-type TopHat will treat the reads as strand specific.
25 Paired-end data -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs.--mate-std-dev <int>The standard deviation for the distribution on inner distances between mate pairs.
27 Output accepted_hits.bam A list of read alignments in SAM format. junctions.bedinsertions.beddeletions.bed
28 ReferencesTrapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics doi: /bioinformatics/btp120Tophat manualFurther Readings： Tophat-fusion
29 Practice time All the files are at: ngs_vm1: Reference sequence: REF.faRNA-seq data 1: SampleA.Run01 SampleA.Run02paired-end50nt at each endPhred33 qualityRNA-seq data 2 SampleB.Run0175nt at each endstrand-specificsolexa1.3-quals
30 Index the genome sequence bowtie-build REF.fa REF
31 Run tophat tophat --version # update is frequent; version is important tophat # go through all the parameterstophat \-o sampleA.ouput \-r \--mate-std-dev 30 \REF \SampleA.Run01.1.fastq,SampleA.Run02.1.fastq \SampleA.Run01.2.fastq,SampleA.Run02.2.fastq