1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006.

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

QR Code Recognition Based On Image Processing
Analysis of Algorithms
The Efficiency of Algorithms
Shift-And Approach to Pattern Matching in LZW Compressed Text Takuya KIDA Department of Informatics Kyushu University, Japan Masayuki TAKEDA Ayumi SHINOHARA.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Two implementation issues Alphabet size Generalizing to multiple strings.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Randomized Algorithms Randomized Algorithms CS648 Lecture 6 Reviewing the last 3 lectures Application of Fingerprinting Techniques 1-dimensional Pattern.
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
School of Computing Science Simon Fraser University
Last Time Pinhole camera model, projection
Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University.
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Property Matching and Weighted Matching Amihood Amir, Eran Chencinski, Costas Iliopoulos, Tsvi Kopelowitz and Hui Zhang.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
1 Efficient String Matching : An Aid to Bibliographic Search Alfred V. Aho and Margaret J. Corasick Bell Laboratories.
SPIE Vision Geometry - July '99 Even faster point set pattern matching in 3-d Niagara University and SUNY - Buffalo Laurence Boxer Research.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Computer Vision Lecture 3: Digital Images
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
Faster 2-Dimensional Scaled Matching Amihood Amir and Eran Chencinski.
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
1 Exact Matching Charles Yan Na ï ve Method Input: P: pattern; T: Text Output: Occurrences of P in T Algorithm Naive Align P with the left end.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
S C A L E D PATTERN MATCHING A.Amir Bar-Ilan Univ. & Georgia Tech A.Butman Holon College M.Lewenstein Bar-Ilan Univ. E.Porat Bar-Ilan Univ.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Spring 2015 Mathematics in Management Science Binary Linear Codes Two Examples.
Analysis of Recursive Algorithms October 29, 2014
Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda Kyushu University, Japan SPIRE Cartagena, Colombia.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK Astronomical Data Analysis I 11 lectures, beginning autumn 2008.
Metric Self Calibration From Screw-Transform Manifolds Russell Manning and Charles Dyer University of Wisconsin -- Madison.
Section 11.4 Language Classes Based On Randomization
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Prof. Amr Goneid Department of Computer Science & Engineering
A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.
Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.
September 5, 2013Computer Vision Lecture 2: Digital Images 1 Computer Vision A simple two-stage model of computer vision: Image processing Scene analysis.
CS Discrete Mathematical Structures Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 9:30-11:30a.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Complexity, etc. Homework. Comparison to computability. Big Oh notation. Sorting. Classwork/Homework: prepare presentation on specific sorts. Presentation.
Speeding up pattern matching by text compression Department of Informatics, Kyushu University, Japan Department of AI, Kyushu Institute of Technology,
String Matching String Matching Problem We introduce a general framework which is suitable to capture an essence of compressed pattern matching according.
Review 1 Arrays & Strings Array Array Elements Accessing array elements Declaring an array Initializing an array Two-dimensional Array Array of Structure.
Multiple Pattern Matching Algorithms on Collage System T. Kida, T. Matsumoto, M. Takeda, A. Shinohara, and S. Arikawa Department of Informatics, Kyushu.
A Unifying Framework for Compressed Pattern Matching Takuya Kida, Masayuki Takeda, Ayumi Shinohara, Yusuke Shibata, Setsuo Arikawa Department of Informatics,
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Keisuke Goto, Hideo Bannai, Shunsuke Inenaga, Masayuki Takeda
Ravello, Settembre 2003Indexing Structures for Approximate String Matching Alessandra Gabriele Filippo Mignosi Antonio Restivo Marinella Sciortino.
1 String Processing CHP # 3. 2 Introduction Computer are frequently used for data processing, here we discuss primary application of computer today is.
Faster Approximate String Matching over Compressed Text By Gonzalo Navarro *, Takuya Kida †, Masayuki Takeda †, Ayumi Shinohara †, and Setsuo Arikawa.
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling Amihood Amir Benny Porat.
Suffix Tree 6 Mar MinKoo Seo. Contents  Basic Text Searching  Introduction to Suffix Tree  Suffix Trees and Exact Matching  Longest Common Substring.
Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.
HIERARCHY THEOREMS Hu Rui Prof. Takahashi laboratory
Objective of This Course
Reachability on Suffix Tree Graphs
2-Dimensional Pattern Matching
Intensity Transformation
Presentation transcript:

1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006

2

3 Issues of Concern: Local Errors: - Occlusion - Transmission and resolution - Details Scaling Rotation Integration of all above issues

4 It seems daunting, but …

5 CPM 2003: Morelia, Mexico

6 Some History … String Matching – motivated by text editing. over alphabet

7 Historic Two Dimensional Model:

8 Bird-Baker Algorithm (1976) Time: for bounded fixed alphabets. for infinite alphabets. Technique: linearization.

9 Linearization Concatenate rows of Text (or pattern) and use string matching tools. In this case – The Aho and Corasick algorithm.

10 Find all pattern rows … then align them.

11 Another linearization- pad with “ don ’ t cares ” n-mm Time: Fischer-Paterson (1972)

12 Advantages and Disadvantages of Model Pros: Can use known techniques. Cons: - Complexity degradation (e.g. extra log factor in exact matching). - Inherent difficulties in definitions (will be addressed later).

13 First Truly 2d Algorithm – The Dueling Method Idea: Assume the situation is: All potential pattern “ starts ” agree on overlap. A i.e. all want to see the same symbol in every text location. (A-Benson- Farach 1991)

14 Dueling Method … Time for checking every text element ’ s correctness: linear. Every candidate with incorrect element in its range is eliminated. Method: The “ wave ”. Total Time:

15 Dueling Method … How do we arrange for candidates to agree on overlap? – duel! A A A A A A A A A A A V A A A A A A A A A A A A A V A A A A A A A A A When there is conflict between two candidates, a single text check eliminates at least one candidate. The text location can be pre-computed because of transitivity. The dueling phase is thus linear time.

16 Discrete Scaling (A-Landau- Vishkin 1990) In our limited model, the meaning of scaling is “ blowing up ” a symbol. Example: scaling a symbol A by 3, means a 3x3 matrix X X X X X O O X X X X X X O X X X X A A A Scaling the matrix by 2 gives:

17 X X X X O X X X X Scaled Occurrences of Pattern in Text: X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X O X X X X X X X X X X O O X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X O O O X X X X X X X X X X X X X X X X X Scale 1Scale 2 X X X X X O O X X X X X Scale 3 X X X X X X X X X X X X O O O X X X X X X X X X X X X

18 Discrete Scaling Algorithms A-Landau-Vishkin 90: Can find all discrete scales of pattern in linear time (alphabet dependent). A-Calinescu 94: Alphabet independent and dictionary linear- time discrete scaling algorithm.

19 Tools used: For comparing substrings in constant time: Suffix trees and LCA or Weiner 1973, Harel-Tarjan 1984 Suffix arrays and LCP. K ä rkk ä inen-Sanders 2003 For computing number of sub-row repetitions in constant time: Range-Minimum queries. Gabow-Bentley-Tarjan 1984

20 How is it used? Do LCA query to find out that the orange line occurs here How many times does this line repeat? How is this done?

21 Construct an array of numbers where every location is the length of the LCP of this row and the next 0kkkkkkkkk000kkkkkkkkk00 To make sure that the orange line appears in this range, the minimum number in this range has to be greater than k.

22 How do we know what scale the orange line has? Run-length compression. Find the symbol part, then the repetition factor. This idea led to the compressed matching paradigm … AAABBCCCCDAAAABBBBBBCA A B C DA B CA

23 Compressed Matching Suppose the text (and pattern?) are compressed. Examples: run-length of rows (fax). LZ78 of rows (gif). Find pattern in text without decompressing. A-Benson 92, A-Benson-Farach 94, A-Landau-Sokol 03(x2) This led to a decade of work in the stringology and data compression community.

24 Compressed Matching (very partial list from citeseer … ) Pattern Matching in Compressed Raster Images - Pajarola,Widmayer (1996) Pattern Matching in Compressed Raster Images - Pajarola,Widmayer (1996) Direct Pattern Matching on Compressed Text - de Moura, Navarro, Ziviani (1998) Direct Pattern Matching on Compressed Text - de Moura, Navarro, Ziviani (1998) A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998) A General Practical Approach to Pattern Matching over.. - Navarro, Raffinot (1998) Randomized Efficient Algorithms for Compressed Strings: the.. - Gasieniec, al. (1996) Randomized Efficient Algorithms for Compressed Strings: the.. - Gasieniec, al. (1996) Approximate String Matching over Ziv-Lempel Compressed Text - K ä rkk ä inen, Navarro, Ukkonen (2000) Approximate String Matching over Ziv-Lempel Compressed Text - K ä rkk ä inen, Navarro, Ukkonen (2000) Pattern Matching Machine for Text Compressed Using Finite State.. - Takeda (1997) Pattern Matching Machine for Text Compressed Using Finite State.. - Takeda (1997) …

25 Model Deficiencies. How do we scale to non-discrete sizes? (e.g. 1.35) How do we model rotations?

26 A Model of Digitization (Landau-Vishkin 1994) “ Real-Life ” resolution is fine enough to be assumed continuous. This is dealt with by a discrete sampling of space done by, e.g. the camera. Digitized sample “ Real life ”

27 Rotation (Fredriksson-Ukkonen 1998) Consider the text as a grid of pixels, each having a color. Consider the pattern as an m x m grid of pixels with colors. Assume the center of every pattern pixel has a “ hole ”. Lay the pattern grid on the text, with the center declared the “ rotation pivot ”.

T[1,1]T[1,2]T[1,3] T[2,1]T[2,2]T[2,3] T[3,1]T[3,2]T[3,3] T[5,4] T[7,7] 7x7 text

The rotation pivot 4x4 pattern

O 4x4 pattern over 8x8 text in location

36 Rotated Matching Algorithms Fredrikkson-Ukkonen 1998: Filter. Good expected time. worst case. Fredrikkson-Navarro-Ukkonen 2000: A-Butman-Crochemore-Landau-Schaps 2004: Proved that output size is A-Kapah-Tsur 2004:

37 A Taste of handling Rotations Na ï ve Idea: Try all possible rotated patterns. Examples: Original19 rotation21 rotation26 rotation ooo

38 Proposed Solution Every rotated pattern can be found in the text using FFT in time If there are N rotated patterns the total time is N What is N?

39 Upper Bound There are pixels. Each pixel center crosses at most grid lines. Therefore there are different rotated patterns.

O

O

O

O

44 Lower Bound Could many points cross a gridline together? We will show: Lower Bound: Restriction: We consider only points in set P defined as follows.

45 Our Subset of Consideration: P is a subset of pattern coordinates Such that: 1) The coordinates are in quadrant I I 2) The coordinates are only the points (x,y) where x and y are co-prime

46 Key Lemma (A-Butman-Crochemore-Landau- Schaps 2003) it is impossible that and cross a grid line at the same rotation angle.

X1X1 Y1Y1 X2X2 Y2Y2 O Z

48 How does it help? Theorem (Geometry): i.e.

49 Consider Schematically: shaded area. In shaded area there are points. So in there are at least points, i.e. points.

50 Each of the points in (the yellow area) crosses the grid times and no two of them cross together. Conclude: There are different rotated patterns.

51 Real Scaled Matching (A- Butman-Lewentein-Porat 2003, A- Chencinsky 2006) Assume the text and pattern grids are the unit scale. A scale up of the pattern increases the grid. The center of the underlying unit grid takes the color of the scaled pattern pixel under it.

52 Pattern Pattern scaled continuously to 1.6 Pattern scaled continuously to 1.6 with superimposed unit grid Pattern discretely scaled to 1.6

53 Does This work? We tried it on “ Lenna ”…

54 Scale 1.3 Lenna Original Lenna Scale 2 Lenna

55 Lenna Today

56 Algorithm ’ s running time For text size n x n and pattern size m x m :

57 The Future? Faster rotation: did not utilize pattern, did not utilize neighboring information. Faster scaling. The holy grail – INTEGRATION. Compressed Matching: lossy compressions.

58 THANK YOU