Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
Board of Early Education and Care Retreat June 30,
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
EU market situation for eggs and poultry Management Committee 20 October 2011.
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
2 |SharePoint Saturday New York City
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Traditional IR models Jian-Yun Nie.
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.4 Polynomials in Several Variables Copyright © 2013, 2009, 2006 Pearson Education, Inc.
LO: Count up to 100 objects by grouping them and counting in 5s 10s and 2s. Mrs Criddle: Westfield Middle School.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Subtraction: Adding UP
Januar MDMDFSSMDMDFSSS
10 -1 Chapter 10 Amortized Analysis A sequence of operations: OP 1, OP 2, … OP m OP i : several pops (from the stack) and one push (into the stack)
Analyzing Genes and Genomes
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Seeds for Similarity Search Presentation by: Anastasia Fedynak.
Multi-seed lossless filtration Gregory Kucherov Laurent Noé LORIA/INRIA, Nancy, France Mikhail Roytberg Institute of Mathematical Problems in Biology,
Presentation transcript:

Improved hit criteria for DNA local alignment JOBIM 2004 Montréal - June 28th Laurent Noé, Gregory Kucherov LORIA, Nancy France

2 Plan Introduction –Local alignment –Heuristic methods Hit criteria –Seed Models and extension proposed –Single/Multiple hit strategies and extension proposed Experiments Conclusion –Extensions

3 Local alignment methods Why being interested in local alignment methods –Improvement needed #sequences, #users, ( budget ) Dynamic programming (Smith-Waterman) –Give an exact solution –Quadratic cost (Best optimization in [Crochemore et al 02]) Heuristic Algorithms –Fasta, Blast, PatternHunter, Blastz, Yass,… In practice

4 Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Seed filtering Start with small conserved and easily detected fragments (seeds). Then extend seeds to build possible alignments Detected seeds

5 Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Two questions usually asked 1.seed model: What can serve as a seed? 2.hit criterion: What is the criterion that witnesses a potential alignment? Detected alignment Detected seeds 1. Seed model 2. Hit criterion

6 1.What can serve as a seed Exact similarity : Seed Pattern : Contiguous Seed Example : ATCAGT |||||| ATCAGT ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

7 ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA Spaced Seed Model [Ma et al. 02: PATTERNHUNTER] Seed Pattern : ###--#-## # : obligatory match position - : joker position (dont care position) Weight : 6[number of #] Span : 9[number of all symbols] Example : ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

8 Spaced Seeds Some probabilistic observations: For spaced seeds, hits at subsequent positions are more independent events For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different ||||||||||||||||| ###### ||||||||||||||||| ###--#-## ||||||||||||||||| ###### ||||||||||||||||| ###--#-##

9 Some probabilistic observations: ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA ||||||||||||||||||| ATCAGTGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||||||||||||| ATCAGCGCAATGCTCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.|||||||:||||| ATCAGCGCAATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###--#-## ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ###### ###### For contiguous vs spaced seeds of the same weight, the expected number of hits is (basically) the same but the probabilities of having at least one hit are very different

10 Spaced seeds Spaced seed model is generally more sensitive than the contiguous seed model Extend spaced seed model by taking into account DNA substitutions specificity

11 Biological properties Transitions are usually over-represented. Regularity phenomenon in coding sequences. Use those properties to extend the spaced seed model ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA Mutational events AT GC transitions transversions. : ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

12 BLASTZ model [Schwartz et al. 03] A spaced seed that allows one possible transition substitution over its # positions. Problem : running time seed of large weight to obtain reasonable speed. ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA ###-#--##--#-#--#--## ATCAGGCATGCTAAGATCGGATCCTCAATGGCTCA |||.|||:|||.|||||.||:||||||:||.|||| ATCGGGCTTGCCAAGATTGGTTCCTCATTGCCTCA

13 YASS model: Transition Constrained Seeds Seed Pattern: # : obligatory match position - : joker position (dont care : transition constrained position transition constrained position: position that corresponds to either a match or a transition. ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA

14 Transition Constrained Seeds Seed Pattern: # : obligatory match position - : joker position (dont care : transitions constrained position Weight : 8[number of # + half carries 1 bit of information whereas # carries 2 adapted to GC-rich/poor genomes

15 Spaced seeds and Transition-Constrained Seeds Seed pattern ( why and not ?) –Not chosen randomly Need to: define an alignment model. search for the best (at least a good) seed pattern according to this model. ( Sensitivity : probability to detect any alignment given by the model ) –Chosen model can drastically change the seed shape… Example Bernoulli model Markov model

16 –Bernoulli [Keich et al 02] –Markov [Buhler et al 03] –Automata (M3/M8) and HMMs [Brejova et al 03] –Homogeneous alignments [Kucherov et al 04] ATCAGTGCAATGCTCAAGA |||||.||.||||:||||| ATCAGCGCGATGCGCAAGA |||||.||.||||:||||| P( 2 ) = 0.7, P( 1 ) = 0.15, P( 0 ) = X Transition has an emission probability for each symbol Ex : P( 2 ) = 0.8, P( 1 ) = 0.10, P( 0 ) = 0.10 Probabilistic Alignment Models: HSP Alignments found by heuristic algorithms

17 Seed Design Alignment Model : Bernoulli –P(match) = 0.7, P(transition)=0.15, P(transversion)=0.15 –alignment length = 64

18 Seed Design Alignment Model : Markov –5 th Order, obtained on N.Menengitidis, S.Cerevisiae, Drosophila, and Human sequences.

19 Experiments S.Cerevisiae/Neisseiria sequences

20 To summarize... We have presented several seed models (contiguous, classic spaced seeds, BLASTZ) We introduced transition-constrained seeds and showed how they improve the sensitivity From detected seeds to detected alignments

21 2.Hit criterion What is the criterion that witnesses a potential alignment ? Restriction : only the information about seeds is available Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Detected alignment Detected seeds 2. Hit criterion

22 Several methods have been proposed FASTA: –Several small seeds on proximal diagonals BLAST: (single hit) –One large seed. Gapped-BLAST: (double hit) –Two seeds on the same diagonal To define a good criterion we have first to define a class of similarities we want to detect : mutation model Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt Dot plot ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

23 Mutation effect on Seeds Mutation effect –Substitutions : suppressing seeds –Indels : diagonal shifts Remaining seeds –Estimation of inter-seed distances via a Waiting Time distribution –Estimation of diagonals shifts via a Random Walk model ctcgactcgggctcacgctcgcaccgggttacagcggtcgattgct aggcctcgggctcgcgctcgcgcgctagacaccgggttacagcgt

24 YASS hit criterion According to these parameters, YASS propose: –An intermediate criterion between BLAST single/Gapped Blast double hit criterion. –Overlap controlled multi-hits |:|||||||:|||:||| ###### |:||||:|||||:|.|. ###--#-## 7 9

25 Sensitivity Comparison of BLASTn / Gapped-BLAST/YASS hit criteria score 25

26 Sensitivity (cont) Comparison of BLASTn / Gapped-BLAST/YASS hit criteria score 35

27 YASS criterion mixed with spaced seeds

28 Experiments Local alignment sensitivity –YASS software / BLASTn (2.2.6 package) M.t : M. tuberculosis CDC1551 S.s : Synechocystis sp. PCC 6803 V.p : Vibrio p. RIMD I Y.p : Yersinia pestis KIM

29 Ads

30 Ads YASS web page YASS can be queried online YASS is Open Source

31 Conclusions Two improvements: –Transition-constrained spaced seeds –Hit criterion combining statistical models and advantage of single/multi hit strategies. A tool that implements both of them

32 Extensions To be done –Multi-seed approach [Li03, Bulher04, Noe04] –Seed design on the fly (non necessary static seeds). –and others …

33 Questions ? ? ?

34

35 |95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC ||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:|||| CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC | | | | | | | | |95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400 ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT ||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.| ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT | | | | | | | |95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG ||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|: CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT | | | | | | | |95320 |95310 |95300 |95290 |95280 |95270 |95260 GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG |||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:|||| GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG | | | | | | | |95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG :|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.| AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG | | | | | | | |95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC ||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:|| CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC | | | | | | | |95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA :|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||. GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG | | | | | | | |95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC ||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.||| TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC | | | | | | | |94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG ||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..:: CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT | | | | | | | | |94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA |||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::... TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG | | | | | | | | |94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:|| TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC | | | | | *( )( ) Ev: 0 s: 1537/1555 r * S.cerevisiae.V (reverse complementary strand) / gi| |(forward strand) * score = 1073 : bitscore = * mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335 |96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190 TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG |||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||. TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA | | | | | | | |96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.||| CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC | | | | | | | |96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG |:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.||||| CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG | | | | | | | |96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA :|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:. CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG | | | | | | | |95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA ||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:| TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA | | | | | | | |95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT |.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:|||| CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT | | | | | | | |95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA ::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||. GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG | | | | | | | |95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG ||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.::::: TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT | | | | | | | | |95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG |....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:|| CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG | | | | | | | |583620

36 |95550 |95540 |95530 |95520 |95510 |95500 |95490 |95480 CAAGTTTATTTCTGTAGAGAGTGTAGAAGACAGTTCGATTTTAGCCTTTTCAGCGGCTTCTCTTATTCTTTGGACAGCC ||.|||:|||||:||::.::|::..::.||||.|||||||||.||.|||||.||.||||||.|.|::|:|||:|:|||| CAGGTTAATTTCGGTTTGCTGGCCGCTGGACAATTCGATTTTGGCTTTTTCGGCAGCTTCTTTCAGGCGTTGTAGAGCC | | | | | | | | |95470 |95460 |95450 |95440 |95430 |95420 |95410 |95400 ATACGGTCATTACTCAAATCGATACCGGTTTCTTTCTTGAAATGAGAAATAATTTCTTGCAACAAATAAATGTCAAAAT ||:::|||:|::.|||||||.||.||:::||||||.|||||:|:.|:.||.||:|::|::|::|....:::|||.||.| ATCACGTCTTGTTTCAAATCAATGCCTTGTTCTTTTTTGAACTCGGCGATGATGTGGTCGATGAGGCGTTGGTCGAAGT | | | | | | | |95390 |95380 |95370 |95360 |95350 |95340 CTTCGCCACCCAAATGGGTGTCACCATTGGTAGATTTAACCTCAAA------GATACC------GTTATCGATGTCCAG ||||.||.|||||.:.|||.||.||.|||||:|:.:::||.||.|| |:..|| |||.:||||:||:|: CTTCACCGCCCAAGAAGGTATCGCCGTTGGTTGCCAATACTTCGAATTGTTTGTCGCCGTCGAGGTTGGCGATTTCGAT | | | | | | | |95320 |95310 |95300 |95290 |95280 |95270 |95260 GATTGAAATATCGAAAGTACCACCGCCCAAGTCGAAAACAGCAAT------GACTTTTGGCTCTGATTTATCTAGACCG |||:|||||||||||||||||.|||||||||||.:|:||.||:|. |:||||::::||:::|||.||.|:|||| GATGGAAATATCGAAAGTACCGCCGCCCAAGTCATATACGGCTACTTTGCGGTCTTTGTTGTCGCCTTTGTCCATACCG | | | | | | | |95250 |95240 |95230 |95220 |95210 |95200 |95190 TAAGCTAGGGCAGCAGCTGTTGGTTCGTTGACAACACGTAATACATTAAGCCCAATAATTTGTCCTGCGTCTTTAGTAG :|:||.|..||.||:||:||.||.|||||||..|..|||::.||.|.:|.:||....||:.|:|||.|||||||.||.| AATGCCAAAGCGGCTGCGGTCGGCTCGTTGATGATGCGTTTCACGTCCAAACCGGCGATACGGCCTACGTCTTTGGTGG | | | | | | | |95170 |95160 |95150 |95140 |95130 |95120 |95110 CTTGTCTTTGGGCATCATTGAAGTAAGCAGGAACGGTGACAACAGCATTTTTGACGCTCTTCGCTAAGTAAGCCTCCGC ||||:|:||||:..||.||||||||.|||||.|||||.|.:||.||:|.::|:||:.|.|.::|.||||||||.||:|| CTTGACGTTGGCTGTCGTTGAAGTAGGCAGGGACGGTAATCACGGCTTCGGTTACTTTTTCGCCCAAGTAAGCTTCGGC | | | | | | | |95090 |95080 |95070 |95060 |95050 |95040 |95030 TGTTTCCTTCATTTTATTTAAGATAAAACCTCCTATTTGGGCGGGGGAGTACGTTCTGTTTCTAGCCTCTACCCAGGCA :|.|||.|||||||||.:.|.||.:::::|::::|||||.|:.||.||::.|:.|.||..|.::||.|.||||||:||. GGCTTCTTTCATTTTACGCAGGACTTCTGCGGAAATTTGAGGAGGAGACAGCTCTTTGCCTTGTGCTTTTACCCATGCG | | | | | | | |95010 |95000 |94990 |94980 |94970 |94960 |94950 TCTCCATTAGAATGCTTGACGATTTTGAAAGGAACCTGATTAATATCTCTTTGGACTTCAGCGTCCTCGAAACGGCGGC ||:||.||.::.::.||||.|||||.||||||:|.::.:|..||.||:|:|||||||||::.|||.||.||:.:|.||| TCGCCGTTGTTGGCTTTGATGATTTCGAAAGGCATAGATTCGATGTCGCGTTGGACTTCTTTGTCTTCAAATTTGTGGC | | | | | | | |94930 |94920 |94910 |94900 |94890 |94880 |94870 CGATTAAACGCTTAGTAGCAAACAAAGTGTTTTCTGAGTTTATGACGGATTGTCGTTTGGCTGGCTCACCAACTAAACG ||||.|||||.||.|..||.:|:|:||||||||.:|:|||:.|:||:|:|||:||||||||:|||:||||.||:|..:: CGATCAAACGTTTGGCGGCGTAAATAGTGTTTTTGGCGTTGGTTACCGCTTGGCGTTTGGCAGGCGCACCGACGAGGAT | | | | | | | | |94850 |94840 |94830 |94820 |94810 |94800 |94790 TTCTCCGTCTTTAGTGAAAGCCACTACAGACGGAGTAGTTCTTGAGCCTTCTGCATTTTCGATAATTCTCGGAACTTTA |||:|||.|:|.:.:.:||||:|.:||.|||||:||.||:|:||:|||||||||.||||||||:|.|.|:|:::::... TTCGCCGCCGTCCAAATAAGCGATAACGGACGGCGTGGTGCGTGCGCCTTCTGCGTTTTCGATCACTTTGGTTTGACCG | | | | | | | | |94780 |94770 |94760 |94750 |94740 CCTTCCATAATAGCTACCGCAGAATTGGTAGTACCTAAATCAATACCGATAAC..|||:.:|||.||.|::::|||.|||||:||||||||.||.||||||||:|| TTTTCGGAAATGGCCAAACAAGAGTTGGTTGTACCTAAGTCGATACCGATTAC | | | | | *( )( ) Ev: 0 s: 1537/1555 r * S.cerevisiae.V (reverse complementary strand) / gi| |(forward strand) * score = 1073 : bitscore = * mutations per triplet 347, 108, 152 (1.79e-36) | ts : 272 tv : 335 |96260 |96250 |96240 |96230 |96220 |96210 |96200 |96190 TTCCGCTTCATTAACCATTCGATCAATCTCCGTATCAGATAGCCCAGACGCTCCGGCAACAGTGATGGAAGAGTCTTTG |||:||:||:||:|||||:||:||.||.||.:.:||.::.|.:||:||:|::||:::.|..||||||::.|:::||||. TTCGGCATCTTTCACCATGCGTTCGATTTCTTCTTCGCTCAAACCTGAAGAACCTTGGATGGTGATGTTGGCTGCTTTA | | | | | | | |96180 |96170 |96160 |96150 |96140 |96130 |96120 TGGCTGGCGAGATCTTTTGCTGAAACGTTGATGATGCCGTTCGCATCGATATCAAAAGTGACTTCAATTTGTGGGGTAC.:|:||:|:::.|||||:||:|||||||::|:|||||||||:||.|||||.||.||.||:|||||.|||||.||:.||| CCGGTGCCTTTGTCTTTGGCGGAAACGTGCAGGATGCCGTTGGCGTCGATGTCGAAGGTTACTTCGATTTGCGGCATAC | | | | | | | |96100 |96090 |96080 |96070 |96060 |96050 |96040 CTTTTGGAGCTGGAGGAATGCCCGCAAGAGTAAAATTACCTATTAATTTGTTATCCTTGACTAACTCCCTCTCACCTTG |:.:.||:||:||:|:.|||.|::|:|..:|.||:|:|||.|::.|||||||.:|:::..|::..||:|:.||.||||| CGCGCGGTGCAGGTGCGATGTCGCCCAAGTTGAACTGACCCAAAGATTTGTTGGCAGAAGCGCGTTCGCGTTCGCCTTG | | | | | | | |96020 |96010 |96000 |95990 |95980 |95970 |95960 GAAAACTTTAACTTCCACCGATGTTTGACCTGATGCCGCAGTTGAAAAAATTTGAGATTTCTTATTGGGAATTGTAGAA :|.:||:|:.|.::..||.|:::||||...:::|:|:||.||:||.||:|.|||:||.:..||.:|:||.||:||.|:. CAGTACGTGGATGGTTACTGCGCTTTGGTTGTCTTCGGCGGTAGAGAACACTTGCGACGCTTTGGTCGGGATGGTGGTG | | | | | | | |95940 |95930 |95920 |95910 |95900 |95890 |95880 TTTCTTGGGATTAATTTTGTAAAAACTCCTCCTAAAGTTTCAATACCCAATGATAGGGGAGTGACATCTAGCAACAAAA ||..|.:|.||.|.|||:||:|::||:||:||.|:.|||||.||||||||:||.||.|||||:||.||.||.|.|||:| TTCTTCTGAATCAGTTTGGTCATCACGCCGCCCATGGTTTCGATACCCAAAGACAGAGGAGTTACGTCCAGTAGCAATA | | | | | | | |95860 |95850 |95840 |95830 |95820 |95810 |95800 CATCGGTAACTTCACCAGACAAGACCGCAGCCTGTATAGCGGCCCCTAAAGCGACTGCTTCATCAGGGTTAACAGCTTT |.|||:|.:::.|.||.::|||:||.:|.:|.||:||:||:||:||||:.||.||:|||||.||||||||:||.:|||| CGTCGCTGCGGCCGCCGCTCAATACTTCGCCTTGGATCGCTGCGCCTACGGCAACGGCTTCGTCAGGGTTCACGTCTTT | | | | | | | |95780 |95770 |95760 |95750 |95740 |95730 |95720 TGATGCATCCTTACCGAATAATTTCTTTACAGTATCTGCAACCTTGGGCATCCTTGACATACCACCAACTAATAAAACA ::..|::||.||.|||||:||::..||:||.|.:|||:::||.||:|||||:|::|||:::||.||.||.||:|::||. GCGCGGTTCTTTGCCGAAGAAGGCTTTAACGGCTTCTTGTACTTTCGGCATACGGGACTGCCCGCCGACCAAGATTACG | | | | | | | |95710 |95700 |95690 |95680 |95670 |95660 |95650 |95640 TCCGATATATCTGAGGCGGTAATTCTTGCGTCTTTCAGTGCTTTTTTGACAGGATCAACCGTTCTATCAATCAATGGGG ||::::||.||:::||.|:|:|::|.:||.|||||||.|||::|||||::|||:||.|.:|::|:.:.|||||.::::: TCGTCGATGTCGCCGGTGCTCAAGCCGGCATCTTTCAATGCAATTTTGCAAGGTTCGATAGAGCGGGTAATCAGGTCTT | | | | | | | | |95630 |95620 |95610 |95600 |95590 |95580 |95570 |95560 CGGTTATATTCTCAAGCTGAACCCTAGAAAAGGGCATACGAATATGCTTTGGGCCTGCAGCATCAGCAGTTATGAAAGG |....|:..|.||.|..|:..|:|:.|:||::::|||::::|:.||.||.|||||:|.:||.||:...||:|||:|:|| CAACCAGGCTTTCGAATTTGGCGCGGGTAATTTTCATCGCCAAGTGTTTCGGGCCGGTTGCGTCCATGGTGATGTACGG | | | | | | | |583620