Presentation is loading. Please wait.

Presentation is loading. Please wait.

OUTLINE Suffix trees Suffix arrays Suffix trees Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the.

Similar presentations


Presentation on theme: "OUTLINE Suffix trees Suffix arrays Suffix trees Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the."— Presentation transcript:

1

2 OUTLINE Suffix trees Suffix arrays

3 Suffix trees Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the suffix tree. Suffix is the short sub-sequence.

4 Suffix trees Problems: – Given a pattern P (sub-sequence) find all occurances of P in text S. – Given two strings find their longest common sub-string

5 Suffix trees Problems in Bioinformatics: – Multiple genome alignment – Identification of sequence repeats

6 Suffix trees Suffix tree: – For example: S: abdfrg (length:6) S has 6 suffixes: g, rg, frg, dfrg, bdfrg, abdfrg

7 Suffix trees Suffixes can be stored in a suffix tree and this tree.  in O(n) time (n: length of the string) A string pattern of length m can be searched  in O(m) time

8 Suffix trees Suffix tree: – S = S[1…n] is a string of length n, – A suffix tree is a tree with n leaves, – n leaves represent n suffixes of the string, – ababc$

9 Suffix trees If a suffix is a prefix of another suffix we can not construct a tree with leaves as suffixes xabxa xa and a are not leaf nodes.

10 Suffix trees Insert e special character (for example $) at the end of the string to solve the problem xabxa$

11 Suffix trees How to construct suffix tree: – Assume we have a string S[1…n] – Start from the suffix S – For example consdier vbacxad$

12 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix S[2…n] – Which is bacxad$

13 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix S[3…n] – Which is acxad$

14 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is cxad$

15 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is xad$

16 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is ad$, we have a matching leaf (first character of acxad$). So split the edge

17 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is ad$, we have a matching leaf (first character of acxad$). So split the edge

18 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is ad$, we have a matching leaf (first character of acxad$). So split the edge

19 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is d$

20 Suffix trees How to construct suffix tree (cont.): – Enter the next suffix – Which is $

21 Suffix trees Suffix tree of vbacxad$:

22 Suffix trees Pattern match using suffix trees: – Try to match a pattern on a path, starting from the root: The pattern does not match, The match ends in a node u of the tree, The match ends inside an edge.

23 Suffix trees Example: (consider vbacxad$ ) – Suffixes: 1. vbacxad$ 2. bacxad$ 3. acxad$ 4. cxad$ 5. xad$ 6. ad$ 7. d$ 8. $

24 Suffix trees Example: (consider vbacxad$ ) – Suffixes: 1. vbacxad$ 2. bacxad$ 3. acxad$ 4. cxad$ 5. xad$ 6. ad$ 7. d$ 8. $ Search for: cxa a xdb

25 Suffix arrays Consider the string: The suffix array:

26 Suffix arrays Search is in mississippi$:

27 References M. Zvelebil, J. O. Baum, “Understanding Bioinformatics”, 2008, Garland Science Andreas D. Baxevanis, B.F. Francis Ouellette, “Bioinformatics: A practical guide to the analysis of genes and proteins”, 2001, Wiley.


Download ppt "OUTLINE Suffix trees Suffix arrays Suffix trees Indexing techniques are used to locate highest – scoring alignments. One method of indexing uses the."

Similar presentations


Ads by Google