Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France.

Similar presentations


Presentation on theme: "Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France."— Presentation transcript:

1 Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France

2 2 Plan Introduction –Local alignment –Heuristic methods Hit criteria –Seed Models and extension proposed –Single/Multiple hit strategies and extension proposed Experiments Conclusion –Extensions

3 3 Local alignment methods Why being interested in local alignment methods –Improvement needed #sequences, #users, ( budget ) Dynamic programming (Smith-Waterman) –Give an exact solution –Quadratic cost (Best optimization in [Crochemore et al 02]) Heuristic Algorithms –Fasta, Blast, PatternHunter, Blastz, Yass,… In practice

4 4 Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Seed filtering Start with small conserved and easily detected fragments (seeds). Then extend seeds to build possible alignments Detected seeds

5 5 Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Two questions usually asked 1.seed model: What can serve as a seed? 2.hit criterion: What is the criterion that witnesses a potential alignment? Detected alignment Detected seeds 1. Seed model 2. Hit criterion

6 6 1.What can serve as a seed Exact similarity : Seed Pattern : Contiguous Seed Example : ATCAGT |||||| ATCAGT ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

7 7 ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA Spaced Seed Model [Ma et al. 02: PATTERNHUNTER] Seed Pattern : ###--#-## # : obligatory match position - : joker position (dont care position) Weight : 6[number of #] Span : 9[number of all symbols] Example : ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

8 8 Spaced Seeds Some probabilistic observations: For spaced seeds, hits at subsequent positions are more independent events For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different ||||||||||||||||| ###### ||||||||||||||||| ###--#-## ||||||||||||||||| ###### ||||||||||||||||| ###--#-##

9 9 Some probabilistic observations: ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different

10 10 Spaced seeds Spaced seed model is generally more sensitive than the contiguous seed model Extend spaced seed model by taking into account DNA substitutions specificity

11 11 Biological properties Transitions are usually over-represented. Regularity phenomenon in coding sequences. Use those properties to extend the spaced seed model ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA Mutational events AT GC transitions transversions. : ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

12 12 BLASTZ model [Schwartz et al. 03] A spaced seed that allows one possible transition substitution over its # positions. Problem : running time seed of large weight to obtain reasonable speed. ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

13 13 YASS model: Transition Constrained Seeds Seed Pattern: ##@#-#@-### # : obligatory match position - : joker position (dont care position) @ : transition constrained position transition constrained position: position that corresponds to either a match or a transition. ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ##@#-#@-### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

14 14 Transition Constrained Seeds Seed Pattern: ##@#-#@-### # : obligatory match position - : joker position (dont care position) @ : transitions constrained position Weight : 8[number of # + half number of @] @ carries 1 bit of information whereas # carries 2 bits. @ adapted to GC-rich/poor genomes

15 15 Spaced seeds and Transition-Constrained Seeds Seed pattern ( why ##@#-#@-### and not #@-#-#-#@# ?) –Not chosen randomly Need to: define an alignment model. search for the best (at least a good) seed pattern according to this model. ( Sensitivity : probability to detect any alignment given by the model ) –Chosen model can drastically change the seed shape… Example Bernoulli model ##@-#@#--#-#-### Markov model ##@##-##@##

16 16 –Bernoulli [Keich et al 02] –Markov [Buhler et al 03] –Automata (M3/M8) and HMMs [Brejova et al 03] –Homogeneous alignments [Kucherov et al 04] ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA |||||.||.||||:||||| 2222212212222022222 P( 2 ) = 0.7, P( 1 ) = 0.15, P( 0 ) = 0.15 222221221222 X Transition has an emission probability for each symbol Ex : P( 2 ) = 0.8, P( 1 ) = 0.10, P( 0 ) = 0.10 Probabilistic Alignment Models: HSP Alignments found by heuristic algorithms

17 17 Seed Design Alignment Model : Bernoulli –P(match) = 0.7, P(transition)=0.15, P(transversion)=0.15 –alignment length = 64

18 18 Seed Design Alignment Model : Markov –5 th Order, obtained on N.Menengitidis, S.Cerevisiae, Drosophila, and Human sequences.

19 19 Experiments S.Cerevisiae/Neisseiria sequences

20 20 To summarize... We have presented several seed models (contiguous, classic spaced seeds, BLASTZ) We introduced transition-constrained seeds and showed how they improve the sensitivity From detected seeds to detected alignments

21 21 2.Hit criterion What is the criterion that witnesses a potential alignment ? Restriction : only the information about seeds is available Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Detected seeds 2. Hit criterion

22 22 Several methods have been proposed FASTA: –Several small seeds on proximal diagonals BLAST: (single hit) –One large seed. Gapped-BLAST: (double hit) –Two seeds on the same diagonal To define a good criterion we have first to define a class of similarities we want to detect : mutation model Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

23 23 Mutation effect on Seeds Mutation effect –Substitutions : suppressing seeds –Indels : diagonal shifts Remaining seeds –Estimation of inter-seed distances via a Waiting Time distribution –Estimation of diagonals shifts via a Random Walk model ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

24 24 YASS hit criterion According to these parameters, YASS propose: –An intermediate criterion between BLAST single/Gapped Blast double hit criterion. –Overlap controlled multi-hits |:|||||||:|||:||| ###### |:||||:|||||:|.|. ###--#-## 7 9

25 25 Sensitivity Comparison of BLASTn / Gapped-BLAST/YASS hit criteria score 25

26 26 Sensitivity (cont) Comparison of BLASTn / Gapped-BLAST/YASS hit criteria score 35

27 27 YASS criterion mixed with spaced seeds

28 28 Experiments Local alignment sensitivity –YASS software / BLASTn (2.2.6 package) M.t : M. tuberculosis CDC1551 S.s : Synechocystis sp. PCC 6803 V.p : Vibrio p. RIMD 2210633 I Y.p : Yersinia pestis KIM

29 29 Ads

30 30 Ads YASS web page http://www.loria.fr/projects/YASS YASS can be queried online http://yass.loria.fr YASS is Open Source

31 31 Conclusions Two improvements: –Transition-constrained spaced seeds –Hit criterion combining statistical models and advantage of single/multi hit strategies. A tool that implements both of them

32 32 Extensions To be done –Multi-seed approach [Li03, Bulher04, Noe04] –Seed design on the fly (non necessary static seeds). –and others …

33 33 Questions ? ? ?

34 34

35 35 |95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC ||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:|||| CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700 |95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400 ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT ||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.| ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770 |95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG ||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|: CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850 |95320 |95310 |95300 |95290 |95280 |95270 |95260 GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG |||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:|||| GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930 |95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG :|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.| AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010 |95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC ||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:|| CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090 |95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA :|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||. GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170 |95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC ||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.||| TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250 |94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG ||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..:: CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT |584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330 |94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA |||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::... TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410 |94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:|| TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460 *(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r * S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand) * score = 1073 : bitscore = 491.92 * mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335 |96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190 TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG |||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||. TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980 |96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.||| CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060 |96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG |:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.||||| CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140 |96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA :|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:. CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220 |95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA ||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:| TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300 |95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT |.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:|||| CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380 |95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA ::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||. GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460 |95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG ||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.::::: TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT |583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540 |95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG |....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:|| CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620

36 36 |95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC ||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:|||| CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC |583630 |583640 |583650 |583660 |583670 |583680 |583690 |583700 |95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400 ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT ||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.| ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT |583710 |583720 |583730 |583740 |583750 |583760 |583770 |95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG ||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|: CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT |583790 |583800 |583810 |583820 |583830 |583840 |583850 |95320 |95310 |95300 |95290 |95280 |95270 |95260 GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG |||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:|||| GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG |583870 |583880 |583890 |583900 |583910 |583920 |583930 |95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG :|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.| AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG |583950 |583960 |583970 |583980 |583990 |584000 |584010 |95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC ||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:|| CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC |584030 |584040 |584050 |584060 |584070 |584080 |584090 |95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA :|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||. GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG |584110 |584120 |584130 |584140 |584150 |584160 |584170 |95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC ||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.||| TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC |584190 |584200 |584210 |584220 |584230 |584240 |584250 |94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG ||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..:: CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT |584260 |584270 |584280 |584290 |584300 |584310 |584320 |584330 |94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA |||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::... TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG |584340 |584350 |584360 |584370 |584380 |584390 |584400 |584410 |94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:|| TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC |584420 |584430 |584440 |584450 |584460 *(96264-94728)(582917-584471) Ev: 0 s: 1537/1555 r * S.cerevisiae.V (reverse complementary strand) / gi|12057208|(forward strand) * score = 1073 : bitscore = 491.92 * mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335 |96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190 TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG |||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||. TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA |582920 |582930 |582940 |582950 |582960 |582970 |582980 |96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.||| CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC |583000 |583010 |583020 |583030 |583040 |583050 |583060 |96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG |:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.||||| CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG |583080 |583090 |583100 |583110 |583120 |583130 |583140 |96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA :|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:. CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG |583160 |583170 |583180 |583190 |583200 |583210 |583220 |95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA ||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:| TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA |583240 |583250 |583260 |583270 |583280 |583290 |583300 |95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT |.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:|||| CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT |583320 |583330 |583340 |583350 |583360 |583370 |583380 |95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA ::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||. GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG |583400 |583410 |583420 |583430 |583440 |583450 |583460 |95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG ||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.::::: TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT |583470 |583480 |583490 |583500 |583510 |583520 |583530 |583540 |95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG |....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:|| CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG |583550 |583560 |583570 |583580 |583590 |583600 |583610 |583620


Download ppt "Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France."

Similar presentations


Ads by Google