Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees.

Similar presentations


Presentation on theme: "CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees."— Presentation transcript:

1 CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

2 Example for two DNA sequences: T and T’=reverse(complement(T)) T = AATGTT T’ = AACATT

3 Steps 1.Create SuffixTree(T$) using Ukkonen’s algorithm. Keep suffix links. 2. Add “T:” to all leaf labels (designate current labels) 3.Traverse SuffixTree(T$) using the prefix of T’ The stoppage point is new active point 4. Use Ukkonen’s algorithm to insert the remainder of T’ 4.1. Label leaves “T’: [x, ∞]” 4.2. modification: traverse to existing leaves to leave a label

4 T = AATGTTT’ = AACATT Tree Trie ε ┴ ε ┴

5 T = AATGTTT’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT

6 T = AATGTTT’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point

7 T A T = AATGTTT’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point 2,∞ A 3,∞ 4,∞ T G 6,∞ TG Last active point

8 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string Step 1.5: finish the tree T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point 2,∞ A 3,∞ 4,∞ T G 6,∞ T G Last active point 7,∞ $ $

9 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point 2,∞ A 3,∞ 4,∞ T G 6,∞ T G 7,∞ $ $ New active point

10 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point 2,∞ A 3,∞ 4,∞ T G 6,∞ T G 7,∞ $ $ AAC AC C

11 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C Make leaf nodes “generalized”

12 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C

13 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point Nothing to do!

14 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point AACAT ACAT CAT Nothing to do!

15 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point AACAT ACAT CAT ATT G T’:6,∞ T

16 T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point AACAT ACAT CAT ATT G T’:6,∞ T Crucial bit coming! T’:6,∞


Download ppt "CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees."

Similar presentations


Ads by Google