Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.

Similar presentations


Presentation on theme: "Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität."— Presentation transcript:

1 Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität

2 Stringology Haifa 2005 Geometric matching on sequential data2 Introduction  Motivation: To study problems in the intersection of geometry and stringology.  Applications to time-series data.

3 Stringology Haifa 2005 Geometric matching on sequential data3 Three problems  1D point set matching under translations (Akutsu, COCOON’04).  1D point set matching under translations, scaling and noise (Böcker & Mäkinen, EuroCG’05)  2D point set matching under translations (Ukkonen & Lemström & Mäkinen, 2003 + Cieliebak & Mäkinen, 2005).

4 Stringology Haifa 2005 Geometric matching on sequential data4 1D point set matching under translations  Two point sets A and B of sizes m and n.  Problem 1a: Find largest common point set of f(A) and B over translations f.  Problem 1b: Find largest common point set of f(A) and a continuous subset of B.  Let k be the number of unmatched points.

5 Stringology Haifa 2005 Geometric matching on sequential data5 Example B A f(A) Problem 1a: k=3 Problem 1b: k=1

6 Stringology Haifa 2005 Geometric matching on sequential data6 Solutions  Trivial in O(m 2 n log n) time.  Easy in O(mn log m) time.  Akutsu gives an O(k 3 +n log n) time solution.

7 Stringology Haifa 2005 Geometric matching on sequential data7 Akutsu’s solution  Use differential encoding for A and B.  A’=a 2 -a 1,a 3 -a 2,..., a m -a m-1, B’=b 2 -b 1,b 3 -b 2,..., b n -b n-1.  Construct suffix tree T of A’#B’$.  Preprocess T for LCA queries.

8 Stringology Haifa 2005 Geometric matching on sequential data8 Akutsu’s solution...  Let Jump(a i,b j )=h where h is largest integer such that,  Jump(a i,b j ) can be computed O(1) time. bjbj b j+h-1 aiai a i+h-1

9 Stringology Haifa 2005 Geometric matching on sequential data9 Akutsu’s solution...  Observation: One of the first k+1 points in both A and B must match.  Each match defines a translation.  For each translation, one needs at most k+1 queries to Jump() to find out whether there is large enough overlap.

10 Stringology Haifa 2005 Geometric matching on sequential data10 Akutsu’s solution...  Theorem 1: Problem 1a can be solved in O(k 3 +n log n) time and Problem 1b in O(k 2 n+n log n) time.  Akutsu also gives reductions from 2D/3D problems to 1D achieving good bounds.

11 Stringology Haifa 2005 Geometric matching on sequential data11 Three problems  1D point set matching under translations (Akutsu, COCOON’04).  1D point set matching under translations, scaling and noise (Böcker & Mäkinen, EuroCG’05)  2D point set matching under translations (Ukkonen & Lemström & Mäkinen, 2003 + Cieliebak & Mäkinen, 2005).

12 Stringology Haifa 2005 Geometric matching on sequential data12 Linear 1D point set matching  Let us consider generalization where we allow also scaling and noise.  We search for best linear mapping from point set A to point set B. - maximum number of points of A should move  close to points of B.

13 Stringology Haifa 2005 Geometric matching on sequential data13 Example A B

14 Stringology Haifa 2005 Geometric matching on sequential data14 Example... A B f(A)

15 Stringology Haifa 2005 Geometric matching on sequential data15 Linear 1D point set matching...  There is an optimum mapping such that two points of A are mapped exactly at  -distance from some points of B.  One mapping fixes the translation, second the scale around the new origin defined by the translation.

16 Stringology Haifa 2005 Geometric matching on sequential data16 Example 22 A B f(A)

17 Stringology Haifa 2005 Geometric matching on sequential data17 Degenerate solution! 22 B A f(A)

18 Stringology Haifa 2005 Geometric matching on sequential data18 One-to-one mapping  To avoid the degenerate solution, one needs a better definition for the mapping searched for.  Hence, we search for a mapping producing maximum size one-to-one matching between the points (Problem 2). 22 22 22 22 22 22 f(A) B

19 Stringology Haifa 2005 Geometric matching on sequential data19 Solving one-to-one case  Consider a fixed translation and scale.  Construct a bipartite graph having edges between points of f(A) and B that are at  - distance.  Solve the maximum matching problem on this graph. 22 22 22 22 22 22 f(A) B

20 Stringology Haifa 2005 Geometric matching on sequential data20 Solving one-to-one case...  Repeating the algorithm on each relevant translation and scale gives the optimum solution.  The overall time complexity is O((mn) 2 g(mn)) where g(x) is the complexity of the maximum matching algorithm on a graph with x edges.

21 Stringology Haifa 2005 Geometric matching on sequential data21 Solving one-to-one case faster  Consider a fixed translation, and sort the relevant scales from smallest to largest.  Observation [Alt et al. 88]: The graph G i corresponding to ith scale differs from the graph G i-1 of the (i-1)th scale by one edge.  The maximum matching on G i can be found by searching for an augmenting path in G i-1 added/deleted one edge.

22 Stringology Haifa 2005 Geometric matching on sequential data22 Solving one-to-one case faster..  Incremental computation gives O((mn) 3 ) time solution.  Theorem 2: Problem 2 can be solved in O((mn) 2 (m+n)) time.  To obtain the result, we exploit the monotonicity of the match graph.

23 Stringology Haifa 2005 Geometric matching on sequential data23 Staircase property f i (A) B

24 Stringology Haifa 2005 Geometric matching on sequential data24 Greedy algorithm is enough B f i (A)

25 Stringology Haifa 2005 Geometric matching on sequential data25 scale i => scale i+1 B f i+1 (A)

26 Stringology Haifa 2005 Geometric matching on sequential data26 scale i+1 B f i+1 (A)

27 Stringology Haifa 2005 Geometric matching on sequential data27 scale i+1 => scale i+2 B f i+2 (A)

28 Stringology Haifa 2005 Geometric matching on sequential data28 scale i+2 B f i+2 (A)

29 Stringology Haifa 2005 Geometric matching on sequential data29 Observation - open question  Observation: With only translations and noise, we obtain O(mn(m+n)) time.  The staircase matrix changes only by one cell when moving from one scale to another.  Question: Can one update the greedy path incrementally?  O(1) solution for the above would imply that adding noise does not make the problem any harder.

30 Stringology Haifa 2005 Geometric matching on sequential data30 Three problems  1D point set matching under translations (Akutsu, COCOON’04).  1D point set matching under translations, scaling and noise (Böcker & Mäkinen, EuroCG’05)  2D point set matching under translations (Ukkonen & Lemström & Mäkinen, 2003 + Cieliebak & Mäkinen, 2005).

31 Stringology Haifa 2005 Geometric matching on sequential data31 2D point set matching B Af(A)

32 Stringology Haifa 2005 Geometric matching on sequential data32 Solutions  Easy in O(mn log m) time by constructing the set of mn translation vectors, sorting it, and finding maximum repeating element.  Possible also in O(mn) time by using naive string matching type algorithm.

33 Stringology Haifa 2005 Geometric matching on sequential data33 Naive point set matching A B Remark: This is the fastest known algorithm for this problem!!

34 Stringology Haifa 2005 Geometric matching on sequential data34 Restricted case?  Would the problem become easier if there were no other points inside the area of matches? f(A)

35 Stringology Haifa 2005 Geometric matching on sequential data35 Restricted case?  Restricted 1D case is extremely easy: - Exact string matching on the differentially encoded sequences.

36 Stringology Haifa 2005 Geometric matching on sequential data36 Easier on grid points

37 Stringology Haifa 2005 Geometric matching on sequential data37 Easier on grid points...  The problem becomes a special case of two- dimensional exact string matching.  Can be solved in O(N 2 ) time on a text grid of size N £ N and pattern grid of size M £ M.  Notice that the run-length encoded representation of the rows of the matrix is of size O(n).

38 Stringology Haifa 2005 Geometric matching on sequential data38 Easier on grid points...  The algorithm of Amir & Landau & Sokol, 2002, for run-length compressed 2D search can be applied: - Time complexity O(M 2 +n). (can be reduced to O(m 2 +n)?)

39 Stringology Haifa 2005 Geometric matching on sequential data39 What about Bird-Baker?  Our idea to solve the problem is to modify Bird-Baker algorithm to work directly on point sets.  As a preliminary tool, we need an Aho- Corasick automaton that recognizes run- length encoded binary strings.

40 Stringology Haifa 2005 Geometric matching on sequential data40 Run-length encoding 5.712.2 3.19.3... 0 5.7 10 12.2...

41 Stringology Haifa 2005 Geometric matching on sequential data41 Modified Aho-Corasick automaton  Proposition: There is an automaton accepting a set of run-length encoded binary strings with the following properties: - O(m log m) construction time, where m is the number of 1-bits in the set. - Reading a fail-link in O(log m) time. - Scanning a string with n 1-bits in O(n log m) time.

42 Stringology Haifa 2005 Geometric matching on sequential data42 Bird-Baker on point sets  Now we can build our automaton on the rows of set A, scan it with the rows of set B.  Let R be the set of positions where a row of A was accepted inside the rows of B.  After sorting R by columns, we can test in O(|R|) time if any column of R contains the correct sequence of accepting states.

43 Stringology Haifa 2005 Geometric matching on sequential data43 Bird-Baker on point sets  The overall running time is O(n log m +|R| log |R|).  Unfortunately, there are examples where |R|=  (mn) :-(  Hence, it is still open if (even) the restricted case has o(mn) solution or not.


Download ppt "Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität."

Similar presentations


Ads by Google