Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006.

Similar presentations


Presentation on theme: "1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006."— Presentation transcript:

1 1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

2 2

3 3 Issues of Concern: Local Errors: - Occlusion - Transmission and resolution - Details Scaling Rotation Integration of all above issues

4 4 It seems daunting, but …

5 5 CPM 2003: Morelia, Mexico

6 6 Some History … String Matching – motivated by text editing. over alphabet

7 7 Historic Two Dimensional Model:

8 8 Bird-Baker Algorithm (1976) Time: for bounded fixed alphabets. for infinite alphabets. Technique: linearization.

9 9 Linearization Concatenate rows of Text (or pattern) and use string matching tools. In this case – The Aho and Corasick algorithm.

10 10 Find all pattern rows … then align them.

11 11 Another linearization- pad with “ don ’ t cares ” n-mm Time: Fischer-Paterson (1972)

12 12 Advantages and Disadvantages of Model Pros: Can use known techniques. Cons: - Complexity degradation (e.g. extra log factor in exact matching). - Inherent difficulties in definitions (will be addressed later).

13 13 First Truly 2d Algorithm – The Dueling Method Idea: Assume the situation is: All potential pattern “ starts ” agree on overlap. A i.e. all want to see the same symbol in every text location. (A-Benson- Farach 1991)

14 14 Dueling Method … Time for checking every text element ’ s correctness: linear. Every candidate with incorrect element in its range is eliminated. Method: The “ wave ”. Total Time:

15 15 Dueling Method … How do we arrange for candidates to agree on overlap? – duel! A A A A A A A A A A A V A A A A A A A A A A A A A V A A A A A A A A A When there is conflict between two candidates, a single text check eliminates at least one candidate. The text location can be pre-computed because of transitivity. The dueling phase is thus linear time.

16 16 Discrete Scaling (A-Landau- Vishkin 1990) In our limited model, the meaning of scaling is “ blowing up ” a symbol. Example: scaling a symbol A by 3, means a 3x3 matrix X X X X X O O X X X X X X O X X X X A A A Scaling the matrix by 2 gives:

17 17 X X X X O X X X X Scaled Occurrences of Pattern in Text: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X O X X X X X X X X X X O O X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X O O O X X X X X X X X X X X X X X X X X Scale 1Scale 2 X X X X X O O X X X X X Scale 3 X X X X X X X X X X X X O O O X X X X X X X X X X X X

18 18 Discrete Scaling Algorithms A-Landau-Vishkin 90: Can find all discrete scales of pattern in linear time (alphabet dependent). A-Calinescu 94: Alphabet independent and dictionary linear- time discrete scaling algorithm.

19 19 Tools used: For comparing substrings in constant time: Suffix trees and LCA or Weiner 1973, Harel-Tarjan 1984 Suffix arrays and LCP. K ä rkk ä inen-Sanders 2003 For computing number of sub-row repetitions in constant time: Range-Minimum queries. Gabow-Bentley-Tarjan 1984

20 20 How is it used? Do LCA query to find out that the orange line occurs here How many times does this line repeat? How is this done?

21 21 Construct an array of numbers where every location is the length of the LCP of this row and the next 0kkkkkkkkk000kkkkkkkkk00 To make sure that the orange line appears in this range, the minimum number in this range has to be greater than k.

22 22 How do we know what scale the orange line has? Run-length compression. Find the symbol part, then the repetition factor. This idea led to the compressed matching paradigm … AAABBCCCCDAAAABBBBBBCA A B C DA B CA 3 2 4 1 4 6 1 1

23 23 Compressed Matching Suppose the text (and pattern?) are compressed. Examples: run-length of rows (fax). LZ78 of rows (gif). Find pattern in text without decompressing. A-Benson 92, A-Benson-Farach 94, A-Landau-Sokol 03(x2) This led to a decade of work in the stringology and data compression community.

24 24 Compressed Matching (very partial list from citeseer … ) Pattern Matching in Compressed Raster Images - Pajarola,Widmayer (1996) Pattern Matching in Compressed Raster Images - Pajarola,Widmayer (1996) Direct Pattern Matching on Compressed Text - de Moura, Navarro, Ziviani (1998) Direct Pattern Matching on Compressed Text - de Moura, Navarro, Ziviani (1998) A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998) A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998) Randomized Efficient Algorithms for Compressed Strings: the.. - Gasieniec, al. (1996) Randomized Efficient Algorithms for Compressed Strings: the.. - Gasieniec, al. (1996) Approximate String Matching over Ziv-Lempel Compressed Text - K ä rkk ä inen, Navarro, Ukkonen (2000) Approximate String Matching over Ziv-Lempel Compressed Text - K ä rkk ä inen, Navarro, Ukkonen (2000) Pattern Matching Machine for Text Compressed Using Finite State.. - Takeda (1997) Pattern Matching Machine for Text Compressed Using Finite State.. - Takeda (1997) …

25 25 Model Deficiencies. How do we scale to non-discrete sizes? (e.g. 1.35) How do we model rotations?

26 26 A Model of Digitization (Landau-Vishkin 1994) “ Real-Life ” resolution is fine enough to be assumed continuous. This is dealt with by a discrete sampling of space done by, e.g. the camera. Digitized sample “ Real life ”

27 27 Rotation (Fredriksson-Ukkonen 1998) Consider the text as a grid of pixels, each having a color. Consider the pattern as an m x m grid of pixels with colors. Assume the center of every pattern pixel has a “ hole ”. Lay the pattern grid on the text, with the center declared the “ rotation pivot ”.

28 28 7 6 5 4 3 2 1 0 T[1,1]T[1,2]T[1,3] T[2,1]T[2,2]T[2,3] T[3,1]T[3,2]T[3,3] T[5,4] T[7,7] 7x7 text

29 29 4 3 2 1 0 The rotation pivot 4x4 pattern

30 30 7 6 5 4 3 2 1 0 8 45 O 4x4 pattern over 8x8 text in location

31 31 7 6 5 4 3 2 1 0 8

32 32 7 6 5 4 3 2 1 0 8

33 33 7 6 5 4 3 2 1 0 8

34 34 7 6 5 4 3 2 1 0 8

35 35 7 6 5 4 3 2 1 0 8

36 36 Rotated Matching Algorithms Fredrikkson-Ukkonen 1998: Filter. Good expected time. worst case. Fredrikkson-Navarro-Ukkonen 2000: A-Butman-Crochemore-Landau-Schaps 2004: Proved that output size is A-Kapah-Tsur 2004:

37 37 A Taste of handling Rotations Na ï ve Idea: Try all possible rotated patterns. Examples: Original19 rotation21 rotation26 rotation ooo

38 38 Proposed Solution Every rotated pattern can be found in the text using FFT in time If there are N rotated patterns the total time is N What is N?

39 39 Upper Bound There are pixels. Each pixel center crosses at most grid lines. Therefore there are different rotated patterns.

40 40 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 O

41 41 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 O

42 42 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 O

43 43 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 O

44 44 Lower Bound Could many points cross a gridline together? We will show: Lower Bound: Restriction: We consider only points in set P defined as follows.

45 45 Our Subset of Consideration: P is a subset of pattern coordinates Such that: 1) The coordinates are in quadrant I I 2) The coordinates are only the points (x,y) where x and y are co-prime

46 46 Key Lemma (A-Butman-Crochemore-Landau- Schaps 2003) it is impossible that and cross a grid line at the same rotation angle.

47 47 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 X1X1 Y1Y1 X2X2 Y2Y2 O Z

48 48 How does it help? Theorem (Geometry): i.e.

49 49 Consider Schematically: shaded area. In shaded area there are points. So in there are at least points, i.e. points.

50 50 Each of the points in (the yellow area) crosses the grid times and no two of them cross together. Conclude: There are different rotated patterns.

51 51 Real Scaled Matching (A- Butman-Lewentein-Porat 2003, A- Chencinsky 2006) Assume the text and pattern grids are the unit scale. A scale up of the pattern increases the grid. The center of the underlying unit grid takes the color of the scaled pattern pixel under it.

52 52 Pattern Pattern scaled continuously to 1.6 Pattern scaled continuously to 1.6 with superimposed unit grid Pattern discretely scaled to 1.6

53 53 Does This work? We tried it on “ Lenna ”…

54 54 Scale 1.3 Lenna Original Lenna Scale 2 Lenna

55 55 Lenna Today

56 56 Algorithm ’ s running time For text size n x n and pattern size m x m :

57 57 The Future? Faster rotation: did not utilize pattern, did not utilize neighboring information. Faster scaling. The holy grail – INTEGRATION. Compressed Matching: lossy compressions.

58 58 THANK YOU


Download ppt "1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006."

Similar presentations


Ads by Google