Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Similar presentations


Presentation on theme: "Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting."— Presentation transcript:

1 Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting

2 The Forest Edit Distance

3 Edit distance of two ordered, labeled forests Edit operations between E and F  Relabling node i in E by the label of node j in F 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy

4 Edit distance of two ordered, labeled forests Edit operations between E and F  Relabling node i in E by the label of node j in F  Relabel (3,5) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy  y

5 Edit distance of two ordered, labeled forests Edit operations between E and F  Relabling node i in E by the label of node j in F  Cost of the operation:  (3,5) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy  p

6 Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy

7 Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E  Delete (2,-) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy

8 Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E  Delete (2,-) 4 3 1 4 1 2 3 7 56 E F a h m a me z v uy

9 Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E  Cost of the operation:  (2,-) 4 3 1 4 1 2 3 7 56 E F a h m a me z v uy

10 Edit distance of two ordered, labelled forests Edit operations between E and F  Delete node j from F  The cost of operation:  (-,j) 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy

11 Edit distance of two ordered, labelled forests The edit distance  (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'. 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy 4 23 1 4 1 2 3 7 56 a h fm a me z v uy

12 Edit distance of two ordered, labelled forests The edit distance  (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F'. 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy 4 23 1 4 1 2 3 7 56 a h fm a me z v uy  e

13 Edit distance of two ordered, labelled forests The Guided edit distance  (E,F,G) between E and F with respect to a third forest G is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' include G as a subforest. 4 23 1 4 1 2 3 7 56 E F a h fm a me z v uy 4 23 4 13 a m a mee 3 12 a me G

14 Application 1: RNA comparisons Cherry small circular viroid-Like RNA GI:2347024 between base 287 and base 337. T he Hammerhead motif of the RNA is printed in bold.

15 Application 2: Comparing XML documents XML documents with same Document Type Descriptor should be aligned with this DTD to get more accurate results

16 The algorithms  (E,F)  Tai 1979:  Zhang and Shasha 1989: where  Klein 1998:  (E,F,G) :  This paper:

17 Special Cases a a c c b a c c a c c f f

18 a a c c b a c c a c c f f Longest Constraint Common Subsequence Constrained Sequence Alignment

19 The algorithms Constrained Longest Common Subsequent  Tsai 2003: Constrained Sequence Alignment  Chin et al. : This paper: where Since G has one leaf, the time becomes

20 Our algorithm for computing  (E,F,G) Dynamic Programming

21 The sub-problems Post-order numbering (naming) of the nodes 5 34 12 14 10 1211 13 8 7 9 6 18 16 15 17 20 192221 23

22 The sub-problems : A "consecutive" sub-forest 5 34 12 14 10 1211 13 8 7 9 6 18 16 15 17 20 192221 23

23 The sub-problems : A "consecutive" sub-forest 5 34 12 14 10 1211 13 8 7 9 6 18 16 15 17 20 192221 23

24 The sub-problems 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

25 The sub-problems 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

26 is equal to the minimum of the followings: 1. 2. 3. 4. 5.

27 1. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 

28 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 

29 2. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 

30 3. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

31 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

32 4. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

33 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

34 5. 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

35 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

36 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG

37 The order for solving the sub-problems for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf then find

38 The time complexity

39 Sparsify the dynamic program using a clever trick of Zhang and Shasha

40 key-root: if it is the root, or has a left-slibling 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 2 1

41 5 34 12 5 1 32 4 8 7 9 6 9 6 7 8 2 143 5 E FG 2 1 No. of key-roots ≤ no. of leaves

42 To compute  (E,F,G)=  (E|| 1..|E|,F|| 1..|F|,G|| 1..|G| ) for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf find

43 To compute  (E,F,G)=  (E|| 1..|E|,F|| 1..|F|,G|| 1..|G| ) for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf and i and j are key-roots find

44 The new running time

45 Thank you


Download ppt "Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting."

Similar presentations


Ads by Google