Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi.

Similar presentations


Presentation on theme: "Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi."— Presentation transcript:

1 Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan

2 TEXT Point Set Pattern Matching Text: A set of points in, ex., a plane Pattern: A small set of points Task: Find an occurrence of the pattern as a subset PATTERN

3 Approximate Point Set Matching in Practice: Example Analysis of 2D electrophoresis images A set of spots on gel media plane Searching digital music score by melody Ringer melody, Internet contents, Online “Kara-Oke”

4 Literature Exact matching in d-dimension Geometric algorithm by P. J. de. Rezende & D. T. Lee, '95 Transfer, Scaling, and Rotation in O(nm d ) Allowing local distortions Huristic and Hardness by Akutsu et al., '99 …NP-hard even in 1D matching Approximate matching of point sequences No-skips, O(nm) time by V. Makinen '01 Allowing substitution in O(nm 3 ) time Extension to 2-dimensional matching is NP-hard

5 Our Results Approximate point set pattern matching in 1D Pattern matches as a subset: Extends Makinen et al. Simple fast algorithm dealing with O(nm 2 ) task By reasonable assumption on sequences in practice Algorithm guarantees O(nm) time Linear with text-size by average-constant time min. query Four-Russian Speed-up Observation connected to string matching 2D approximate point set pattern matching With polynomial-time algorithm

6 1D Matching As a Target As a basis of practical problems Axes of 2D electrophoresis images are independent Points in higher dimension but having the primary axis (sort order) … ex. 3D structure of proteins Musical score search Pitch error (tone deafness) is usually fatal Exact matching in Rhythm/Timing is impractical, but indispensable to distinguish melodies

7 Point Set Matching in 1D Text and Pattern: Strictly increasing sequences of Integers An Occurrence of the Pattern: A Subsequence of the Text

8 Edit Distance for Point Set Approximate Matching Distance between two same size sequences:

9 Approximate Matching and Recurrence D(i,j) = Distance between First i Points of Pattern and best Occurrence of it in Text ending at j Distance between one-small prefix-sequences Difference of the last two distances D(n,m) can be obtained by Tabular Computation … in O(nm 2 ) time

10 “Finite Resolution” Assumption on a Class of Sequences Ratio of distances between two contiguous points is limited Spots observed as stains on small gel media plane 450 ticks per second in typical MIDI sequences Modified algorithm runs in O(nm) time if sequences have finite resolution The 3 rd iteration can be finished in constant time… Pattern Text

11 A Row can be Divided into “Positive” Part & “Negative” Part Values in “Negative” part always decrease “Ex-Minimum” can only be a candidate Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

12 Guaranteed O(nm)-time Algorithm Using “deque” simulating the right-most path of the Cartesian Tree [Gabow, et al., 1985] Maintains to-be-minimum indices in “Positive” part Min is available in amortized constant time Constant time in average for one iteration … O(nm) time Remove if turned to negative … Min. Pop all larger ones Push the latest index

13 Computational Results on Real/Synthesized* MIDI Sequences Simple algorithm expecting “Finite Resolution” is faster than O(nm) time algorithm Pattern Size = 11, Time (sec.) for filling-up table Text SizeNaïve DPFin. Res.Cartesian 30861.120.01 *183281970.030.05 *377418830.050.09 *386801---0.580.94 Solaris 9 x86/Intel Pentium 4 800MHz

14 Four-Russian Speed-up for Point Sequences with Finite Resolution Idea from Arlazarov et al.: Filling tabular cells by pre- computed values O(nm/log n + n log n) time with unit-cost RAM model As we can suppose, finite resolution assumption makes point sequences being like strings

15 Approximate Point Set Pattern Matching on the Plane: Hardness Results Akutsu et al. (’95), allowing local distortions NP-hard, even in 1D matching V. Makinen & E. Ukkonen ('01), an extension of 1D NP-hard; deciding the order of points in matching is hard Q. Is there any non-trivial 2D approximate point set matching computable in polynomia-time?

16 Extending 1D Definition to Approximate Matching on the Plane Regard a set as sequences with two orders Divide recursively by axis-parallel lines P Q

17 Recurrence for Edit Distance Divide P and Q into two arbitrary parts, by either a horizontal or a vertical lines

18 How Pattern Matching Proceeds x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box

19 Polynomia-time Algorithm for 2D Approximate Point Set Matching Finds the best partition/direction by DP-like recursion Results are stored in cache for quadruples [I, j; k, l] … O(n 2 m 2 ) space O(n 2 m 4 ) time with pattern size n and text size m

20 Remarks & Future Works Consider scaling in 1D Tempo must be considered in musical sequence search Looking for more applications 1D approximate matching to secondary structure search of proteins

21

22 TEXT Point Set Pattern Matching Text: A set of points in, e.g., a plane Pattern: A small set of points Task: Find an occurrence of the pattern as a subset PATTERN

23 TEXT Point Set Pattern Matching Text: A set of points in, e.g., a plane Pattern: A small set of points Task: Find an occurrence of the pattern as a subset PATTERN

24 Approximate Point Set Matching in Practice: Example Analysis of 2D electrophoresis images A set of spots on gel media plane Searching digital music score by melody Ringer melody, Internet contents, Online “Kara-Oke”

25 A Row can be Divided into “Positive” Part & “Negative” Part Absolute values in “Negative” part always increase “Ex-Minimum” can only be a candidate Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

26 A Row can be Divided into “Positive” Part & “Negative” Part Absolute values in “Negative” part always increase “Ex-Minimum” can only be a candidate Only a constant number of “Positive” cells exist if sequences have finite resolution … O(nm) time

27 Extending 1D Definition to Approximate Matching on the Plane Regard a set as sequences with two orders Divide recursively by axis-parallel lines P Q

28 Extending 1D Definition to Approximate Matching on the Plane Regard a set as sequences with two orders Divide recursively by axis-parallel lines P Q

29 How Pattern Matching Proceeds x Points of a pattern should be aligned on o points of a text, by cutting and moving the bounding box


Download ppt "Approximate Point Set Pattern Matching on Sequences and Planes Tomoaki Suga, Shinichi Shimozono* Kyushu Inst. of Tech. Fukuoka, Japan Tomoaki Suga, Shinichi."

Similar presentations


Ads by Google