Download presentation

Presentation is loading. Please wait.

Published byDestiny Mainwaring Modified over 2 years ago

1
Presented By Dr. Shazzad Hosain Asst. Prof. EECS, NSU Linear Time Construction of Suffix Tree

2
High-level of Ukkonen’s Algorithm Ukkonen’s algorithm is divided into m phases. In phase i+1, tree i+1 is constructed from i Each phase i+1 is further divided into i+1 extensions, one for each of the i+1 suffixes of S[1… i+1]. b a a 1 a b 2 1 : S[1…1] {a} 2 : S[1…2] {ab, b} a b 3 : S[1…3] {aba, ba, a} a bb a extensions phases

3
MISSISSIPI 1 : M 2 : MI 3 : MIS 4 : MISS 5 : MISSI 6 : MISSIS 7 : MISSISS 8 : MISSISSI 9 : MISSISSIP 10 : MISSISSIPI 1 M I S S I S S I P I I S S I S S I I S S I S S I P I I S S I P I I I I P I I 2 3 4 5 P P 6 P 7 P 8 P 9 1234567890 Corollary 6.1.1: In Ukkanon’s algorithm, any newly created internal node will have a suffix link form it by the end of the next extension. How suffix links help?

4
What is achieved so far? Not so much. Worst-case running time is O(m 2 ) for a phase.

5
Trick1: Skip/Count Trick There must be a γ path from s(v).

6
Trick1: Skip/Count Trick There must be a γ path from s(v). Walking down along γ takes time proportional to |γ| Skip/count trick reduces the traversal time to something proportional to the number of nodes on the path. zabcdefghy 2233 Nodes But what does it buy in terms of worst-case bounds? Edge length

7
Lemma 6.1.2: Let (v, s(v)) be any suffix link traversed during Ukkonen’s algorithm. At that moment, the node-depth of v is at most one greater than the node depth of s(v). v=2 s(v)=1 v=3 s(v)=3 v=4 s(v)=5

8
Lemma 6.1.2: Let (v, s(v)) be any suffix link traversed during Ukkonen’s algorithm. At that moment, the node-depth of v is at most one greater than the node depth of s(v). Theorem 6.1.1: Using the skip/count trick, any phase of Ukkonen’s algorithm takes O(m) time. In a single extension – The algorithm walks up at most one edge – Find suffix link and traverse it – Walks down some number of nodes – Applies suffix extension rules – And may add a suffix link All operations except down-walk takes constant time Only needs to analyze down walk time

9
Lemma 6.1.2: Let (v, s(v)) be any suffix link traversed during Ukkonen’s algorithm. At that moment, the node-depth of v is at most one greater than the node depth of s(v). Theorem 6.1.1: Using the skip/count trick, any phase of Ukkonen’s algorithm takes O(m) time. In a single extension – The algorithm walks up at most one edge – Find suffix link and traverse it – Walks down some number of nodes – Applies suffix extension rules – And may add a suffix link All operations except down-walk takes constant time Only needs to analyze down walk time – Decreases current node-depth by at most one – Decreases node-depth by at most another one – Each down walk moves to greater node-depth – Over the entire phase, current node-depth is decremented by at most 2m times – Since no node can have depth greater than m, the total possible increment to current node- depth is bounded by 3m over the entire phase – Total number of edge traversal bounded by 3m – Since each edge traversal is constant, in a phase all the down-walking is O(m).

10
Complexity There are m phases Each phase takes O(m) So the running time is O(m 2 ) Two more tricks and we are done

11
Simple Implementation Detail Suffix tree may require O(m 2 ) space Consider the string Every suffix begins with a distinct character, so there are 26 edges out of the root. Requires 26x27/2 characters in all So O(m) is impossible to achieve in this representation.

12
Alternative Representation of Suffix Tree Edge Label Compression 1 2 3 4 56789 0 1 2 A fragment of the suffix treeEdge label compressed Could be 8,9 Number of edge at most 2m – 1, and two numbers are written in a edge, so space is O(m)

13
MISSISSIPI 1 : M 2 : MI 3 : MIS 4 : MISS 5 : MISSI 6 : MISSIS 7 : MISSISS 8 : MISSISSI 1 M I S S I S S I I S S I S S I S S I S S I I S S I 2 3 4 1234567890 Observation 1: Rule 2 is a show stopper. We stop further extension. Implicit extension Explicit Extension 8 : 12345678 7 : 1234567

14
MISSISSIPI 1 : M 2 : MI 3 : MIS 4 : MISS 5 : MISSI 6 : MISSIS 7 : MISSISS 8 : MISSISSI 1 M I S S I S S I S S I S S S S I S S I S S 2 3 4 1234567890 Observation 2: Once a leaf always a leaf Explicit Extension The major cost 8 : 12345678 7 : 1234567 1,7 2,7 4,7 3,7 e = 8

15
MISSISSIPI 1 : M 2 : MI 3 : MIS 4 : MISS 5 : MISSI 6 : MISSIS 7 : MISSISS 8 : MISSISSI 1 M I S S I S S I S S I S S S S I S S I S S 2 3 4 1234567890 Once a leaf always a leaf Explicit Extension The major cost 8 : 12345678 7 : 1234567 1,7 2,7 4,7 3,7 e = 8 At any phase the cost is only for explicit extension

16
MISSISSIPI 1 : M 2 : MI 3 : MIS 4 : MISS 5 : MISSI 6 : MISSIS 7 : MISSISS 8 : MISSISSI 1 M I S S I S S I S S I S S S S I S S I S S 2 3 4 1234567890 Once a leaf always a leaf 8 : 12345678 1,9 2,9 4,9 3,9 e = 9 At any phase the cost is only for explicit extension 9 : MISSISSIP 9 : 123456789 P 5 6,9 2,5 9,9 P 6 P 7 P 8 P 9 I I I I

17
MISSISSIPI 1234567890 8 : 12345 Since there are only m phases, the total number of explicit extension is bounded by 2m 9 : 123456789 So the total number of down-walk is bounded by O(m) Or The time to construct the suffix tree is bounded by O(m)

18
Reference Chapter 6: Algorithms on Strings, Trees and Sequences

Similar presentations

OK

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??

Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on op amp circuits formula Ppt on barack obama leadership quotes Ppt on icici prudential life insurance Ppt on nature and human beings Ppt on let us preserve our heritage Ppt on what is critical whiteness theory Ppt on current trends in hrm in health Ppt on mcx stock exchange Ppt on fiscal policy 2012 Ppt on central limit theorem proof