Download presentation

Presentation is loading. Please wait.

Published byJennifer Quinn Modified about 1 year ago

1
CSC 448: Bioninformatics Algorithms Alex Dekhtyar Ukkonen’s Algorithm for Generalized Suffix Trees

2
Example for two DNA sequences: T and T’=reverse(complement(T)) T = AATGTT T’ = AACATT

3
Steps 1.Create SuffixTree(T$) using Ukkonen’s algorithm. Keep suffix links. 2. Add “T:” to all leaf labels (designate current labels) 3.Traverse SuffixTree(T$) using the prefix of T’ The stoppage point is new active point 4. Use Ukkonen’s algorithm to insert the remainder of T’ 4.1. Label leaves “T’: [x, ∞]” 4.2. modification: traverse to existing leaves to leave a label

4
T = AATGTTT’ = AACATT Tree Trie ε ┴ ε ┴

5
T = AATGTTT’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT

6
T = AATGTTT’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point

7
T A T = AATGTTT’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point 2,∞ A 3,∞ 4,∞ T G 6,∞ TG Last active point

8
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 1: insert fist string Step 1.5: finish the tree T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point 2,∞ A 3,∞ 4,∞ T G 6,∞ T G Last active point 7,∞ $ $

9
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT Last boundary path - Last active point 2,∞ A 3,∞ 4,∞ T G 6,∞ T G 7,∞ $ $ New active point

10
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point 2,∞ A 3,∞ 4,∞ T G 6,∞ T G 7,∞ $ $ AAC AC C

11
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C Make leaf nodes “generalized”

12
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C

13
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point Nothing to do!

14
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point AACAT ACAT CAT Nothing to do!

15
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point AACAT ACAT CAT ATT G T’:6,∞ T

16
T A T = AATGTT$T’ = AACATT Tree Trie A AA AAT AATG AATGT AATGTT ε ┴ ε ┴ Step 2: Traverse the prefix of T’ Step 3: Start inserting the rest of T’ T AT ATG TG G ATGT TGT GT ATGTT TGTT GTT TT - active point T:2,∞ A T:3,∞ T:4,∞ T G T:6,∞ T G T:7,∞ $ $ AAC AC C T’:3,∞ C T C C AACA ACA CA - end point AACAT ACAT CAT ATT G T’:6,∞ T Crucial bit coming! T’:6,∞

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google