2-Dimensional Pattern Matching Amihood Amir, Dina Sokol, Shoshana Neuburger UWSL 2006
2-Dimensional Pattern Matching Perform pattern matching on images MRI FAX
Searching Aerial Photographs
Historic Two Dimensional Model:
2D Pattern Matching - Example Input: = {A,B} Pattern: Text Output: { (1,4),(2,2),(4, 3)} A B A B A B A B A B
Bird-Baker Algorithm (1976) Time: for bounded fixed alphabets. for infinite Technique: linearization.
Bird / Baker First linear-time 2D pattern matching algorithm. View each pattern row as a metacharacter to linearize problem. Convert 2D pattern matching to 1D.
Linearization Concatenate rows of Text and use string matching tools. In this case – The Aho and Corasick algorithm for a dictionary of patterns. The dictionary consists of all pattern rows.
Find all pattern rows… then align them.
Bird / Baker Preprocess pattern: Name rows of pattern using AC automaton. Using names, pattern has 1D representation. Construct KMP automaton of pattern. Identical rows receive identical names.
Bird / Baker - Example Preprocess pattern: Name rows of pattern using AC automaton. Using names, pattern has 1D representation. Construct KMP automaton of pattern. A B 1 2
Bird / Baker Scan text: Name positions of text that match a row of pattern, using AC automaton within each row. Run KMP on named columns of text. Since the 1D names are unique, only one name can be given to a text location.
Bird / Baker - Example Scan text: Name positions of text that match a row of pattern, using AC automaton within each row. Run KMP on named columns of text. A B 2 1 2 1
Another linearization- pad with “don’t cares” m n-m Time: Fischer-Paterson (1972)
Witnesses Popular paradigm in pattern matching: find consistent candidates verify candidates consistent candidates → verification is linear
Dueling Algorithm
Data Structure List of potential candidates R = rightmost element of that list N = new element R N
Case 1: N dies X R N N
Case 2: R dies X R N
Case 3: noone dies add N to list of consistent candidates Since N is consistent with R, and R is consistent with the rest of the list, by transitivity, N is consistent with the list
Witnesses Vishkin introduced the duel to choose between two candidates by checking the value of a witness. Alphabet-independent method.
Dueling Paradigm [Vishkin 1985] T= witness i j ? b a A duel chooses between two possible candidates by checking the value of a ‘witness.’
Witness Table P T Witness table A witness table is a table of size |P|, which stores a location of a conflict for each location of P (w/r to left cand). P Witness table T i j
Dueling Method in 2D How do we arrange for candidates to agree on overlap? – duel! When there is conflict between two candidates, a single text check eliminates at least one candidate. The text location can be pre-computed because of transitivity. The dueling phase is thus linear time. A A A A A A A A A A A V A A A A A A A A A A A A A V A A
A duel in 2-dimensions Witness[3,3]=(4,3) 1 2 3 4 a b 1 2 3 4 a b b
2-D Witness Table P Witness Table a b * A 2-D Witness table is a table of size m2, storing a witness for each location of P. P Witness Table a b * 4,3 4,2 4,1
2D Witnesses Amir et. al. – 2D witness table can be used for linear time and space alphabet-independent 2D matching. The order of duels is significant, it is done in 2 waves: 1: duel within each column, bottom to top. 2: duel between columns from right to left.
First Truly 2d Algorithm – The Dueling Method (A-Benson- Farach 1991) Once duels are over, the situation is: All potential pattern “starts” agree on overlap. A i.e. all want to see the same symbol in every text location.
Verification Do a forward wave down the columns to label starts of pattern rows. Do a forward wave on each row, beginning anew for each new row. Label positions of mismatch. Kill all candidates that contain a mismatch (using 2 similar backwards waves)
Dueling Method … Time for checking every text element’s correctness: linear. Every candidate with incorrect element in its range is eliminated. Method: The “wave”. Total Time:
2D Dictionary Matching Suppose we are given a set of 2d patterns, called a dictionary. Goal: search for all Patterns in Text simultaneously, in linear time. Bird/Baker can be extended, if all patterns have uniform width. (How?)