Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads 2009-09-10 Hua Bao Sun Yat-sen University, Guangzhou,

Similar presentations


Presentation on theme: "MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads 2009-09-10 Hua Bao Sun Yat-sen University, Guangzhou,"— Presentation transcript:

1 MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads 2009-09-10 Hua Bao Sun Yat-sen University, Guangzhou, China Evolution.sysu.edu.cn InCoB 2009

2 Next-generation sequencing High-throughput (tens of millions reads per lane) Read length is short (25-50bp) Sequencing error rate is relatively higher than Sanger sequencing Applications: genome sequencing, transcriptome sequencing, pooled population sequencing

3 The objective 1. Unspliced alignment of reads onto the genome 2. Spliced alignment of transcript reads over exon-intron boundaries 3. SNP detection from population sequences

4 Seed hash table Read 1 TACACCACGGTCAGACTTGCATCACAACTGTTAAGC Read 2 AGACTTGCATCACAACTGTTAAGCTACACCACGGTC Read n … … Seed hash table TACACCACGGTC Position 1, Read 1, + ; Position 25, Read 2, + ; … AGACTTGCATCA Position 13, Read 1, + ; Position 1, Read 2, +; … TGATGCAAGTCT Position 25, Read 1, - ; Position 13, Read 2, -; … Other seed (K-mer) … GACCGTGGTGTA Position 1, Read 1, - ; Position 25, Read 2, - ; …

5 Seed hash table Coding A: 0 T: 1 G: 2 C: 3 k-mer CCGATT key = 3*4 5 + 3*4 4 + 2*4 3 + 0*4 2 + 1*4 1 +1*4 0 Seed hash table [0] (read id, position, strand) [1] [2] [..] [n] (1,1,+) (2,13,-) … Reads [0] Read sequence [1] CCGATTGGCTAAA … [2] [..] [n] Key computation of the seed Key=n

6 Unspliced alignment Genome TACACCACGGTCAGACTTGCATCA … Seed hash table [0] (read id, position,strand) [1] [2] [3] (1,1,+) (2,13,-) … [n] Key=3 Reads [0] Read sequence [1] [2] [3] [n] Extension O(1) K-mer:8-12bp Step-size: 1bp

7 Spliced alignment Genome TACACCACGGTCAGACTTGCATCA … Hash table [0] (read id, posi,strand) [1] [2] (1,H,+) (2,T,-) … [n] Key=2 Seed hit list [0] (Genome posi, read posi, strand) [1] (1,H,+) (780,T,+) … [2] (1,T,-) … O(1) Reads [0] Read sequence [1] TACACCACG … [2] [n] K-mer:6-10bp Step-size: 1bp TACACCACGGTCAGA GTGCCATGGCTAGT TACACCACGGTCAGA gt ac … cc ag GTGCCATGGCTAGT 1 780

8 Accuracy of alignment A total of 1893118 reads (35bp length, 134274 spliced and 1758844 unspliced) from 5796 coding DNA sequences of chromosome I of Arabidopsis thaliana for the query dataset were simulated. ProgramUnspliced alignmentSpiced alignment True positive (%) False positive (%) Running time (s) True positive (%) False positive (%) Running time (m) SHRiMP94.798.97809N/A SeqMap96.506.71447N/A SOAP96.416.72101N/A MAQ96.536.73138N/A QpalmaN/A 84.174.45557 MapNext96.516.7120986.894.31231

9 SNP detection from population sequences … TACACACGGTCAGACTAGCATCAGTCCGTAATGCT … CACGGTCAGACGAGCATCAGTCC CACACGGTCAGACGAGCATCAGT GGTCAGACGAGCATCAGTCCGTA CAGACTAGCATCAGTCCGTAATG CACACGGTCAGACTAGCATCAGT GGTCAGACTAGCATCAGACCGTA GGTCAGACTAGCATCAGTCCGTA CGGTCAGACTAGCATCAGTCCG Quality control : minimum quality score (MQS), minimum neighbour quality score (MNQS) Significance control : minimum coverage (MC) , minimum minor allele frequency (MMAF)

10 SNP detection from population sequences N N N Y Clustered short reads Reads that passed QC? Polymorphism sites are covered by MC number of reads? The frequency of minor allele is higher than MMAF? Candidate SNPs Y Y

11 Accuracy of SNP detection from population sequencing CoverageTrue positiveFalse positive 4X1961 (90.70%)690 (29.51%) 6X1998 (92.41 %)23 (1.06%) 8X2015 (93.20%)8 (0.37%) 10X2043 (94.50%)0 (0.00%) 12X2068 (95.65%)0 (0.00%) There were 2162 true SNPs in 50 individuals (haploid) in our simulation. Coverage equals sequencing depth per individual. MQV, MNQV, MMAF and MC were set at 25, 20, 0.01 and 50 (1X per individual), respectively.

12 Accuracy of MAF estimation from population sequencing Real minor allele frequency Estimated minor allele frequency 0.000.060.120.180.240.300.360.420.48 0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28 0.32 0.36 0.40 0.44 0.48

13 Summary 1. MapNext supports both spliced and unspliced alignments of the short reads. And for spliced alignments, a training process is not needed. 2. MapNext can detect SNPs and estimate minor allele frequency from population sequences.

14 2009-09-10 Thank you! MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads


Download ppt "MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads 2009-09-10 Hua Bao Sun Yat-sen University, Guangzhou,"

Similar presentations


Ads by Google