Presentation is loading. Please wait.

Presentation is loading. Please wait.

TopHat 2014.10.01 Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.

Similar presentations


Presentation on theme: "TopHat 2014.10.01 Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center."— Presentation transcript:

1 TopHat 2014.10.01 Mi-kyoung Seo

2 Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center for Computational Biology at Johns Hopkins University

3 Pipeline for RNA-Seq TopHat FastQC Data quality control Mapping Differential expression analysis Cufflinks (Cuffdiff) FPKM for genes (transcripts) Cleaned reads Mapped reads R DEG Transcripts assembly Final transcripts assembly 8 RNA-seq 2x100bp reads Sequencing Tuxedo protocol 2012, Nature Method

4 Software information Purpose –A spliced read mapper for RNA-Seq TopHat has a number of parameters and options, and their default values are tuned for processing mammalian RNA- Seq reads Software URL –http://ccb.jhu.edu/software/tophat/index.shtmlhttp://ccb.jhu.edu/software/tophat/index.shtml Category –Aligner License –Open source and freely available under the Artistic license

5 Overview of RNA-Seq 5 Garber, M., et al. (2011), Nature Methods, 8(6), 469–477. Microarray or cDNA/EST sequencing  RNA-seq

6 QPALMA De Bona et al., 2008 Optimal spliced alignments of short sequence reads Q-PALMA is based on a machine learning approach, in which data from previously known splice junctions are used to train the software. Initial mapping phase uses Vmatch (Abouelhoda et al., 2004) Vmatch is a flexible, fast aligner, but because it is not designed to map short reads on machines with small main memories, it is substantially slower than other specialized short-read mappers. –Vmatch: 180,000 reads per CPU hour –TopHat: 2.2 million reads per CPU hour

7 The Tophat pipeline 1. Find Exons - Mapping using Bowtie  Mapped & IUM reads - Assemble exons using MAQ  Putative exons (islands) - Extend exons by 45bp - Gap 2. Find Splice Junctions

8 Mapping: Bowtie  TopHat uses Bowtie in order to initially map all reads to the reference genome while collecting all the unmapped reads for further analysis  Bowtie is reporting reads with no more than a few mismatches (default: 2) within s bp from the 5' end and the 3' end may have errors based on Phred-Quality-Weighted Hamming Distance (s default: 28)  TopHat allows reads from bowtie that map up to 10 locations (multireads) read s bp 5'3'

9 Assembly: MAQ Exon 2Exon 3 Exon 1 read Exon 2 Exon 1 Transcript 1 Gene (DNA or reference) read GU AG read island consensus read Putative exons IUM reads

10 Identification of read spanning junction Exon 2Exon 3 Exon 1 read GTAG 70 < potential Intron length < 20,000bp IUM reads

11 Identification of read spanning junction ERAGNE (annotation based pipeline) (Mortazavi et al., 2008) Each island spanning coordinates i to j dm: depth of coverage at coordinate m n: the length of the reference genome B3gat1 gene

12 Seed and extend strategy  To find reads that span junction 2K-mer seed

13 Running Operations Input –For Paired-end read, Read1.fastq, Read2.fastq –genome (Genome index by bowtie-build) –Option: genes.gtf (Gene annotation) Command –Usage: tophat [options]* [reads1_2,...readsN_2] –tophat –p 4 –o output/Normal –G genes.gtf BowtieIndex/genome read1.fastq read2.fastq --library-type fr-unstranded TopHat Output –accepted_hits.bam –unmapped.bam –junctions.bed –insertion.bed –deletion.bed Usage: tophat [options]* [reads1_2,...readsN_2] $ tophat –p 4 –o output/Normal –r 100 –G genes.gtf BowtieIndex/genome read1.fastq read2.fastq

14 Input Parameters $ tophat [optioins] -o: output dir -p: use this many threads to align reads [default: 1] -r: this is the expected (mean) inner distance between mate pairs [default: 50bp] –For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. The default is 50bp -G: geneset $ tophat –p 4 –o output/Normal –r 100 –G genes.gtf BowtieIndex/genome read1.fastq read2.fastq

15 Tophat output by IGV

16 Discussion Tophat2 default option 확인 –r: 실험에 따라 다름 –Bowtie2 for mapping


Download ppt "TopHat 2014.10.01 Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center."

Similar presentations


Ads by Google