Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Two-dimensional pattern matching M.G.W.H. van de Rijdt 23 August 2005.
Bipartite Matching, Extremal Problems, Matrix Tree Theorem.
A simple example finding the maximum of a set S of n numbers.
Tutorial 6 of CSCI2110 Bipartite Matching Tutor: Zhou Hong ( 周宏 )
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
Zoo-Keeper’s Problem An O(nlogn) algorithm for the zoo-keeper’s problem Sergei Bespamyatnikh Computational Geometry 24 (2003), pp th CGC Workshop.
15-853Page : Algorithms in the Real World Suffix Trees.
296.3: Algorithms in the Real World
1 Data structures for Pattern Matching Suffix trees and suffix arrays are a basic data structure in pattern matching Reported by: Olga Sergeeva, Saint.
Lectures on Network Flows
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
1 Huffman Codes. 2 Introduction Huffman codes are a very effective technique for compressing data; savings of 20% to 90% are typical, depending on the.
Totally Unimodular Matrices Lecture 11: Feb 23 Simplex Algorithm Elliposid Algorithm.
CS Lecture 9 Storeing and Querying Large Web Graphs.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
By Makinen, Navarro and Ukkonen. Abstract Let A and B be two run-length encoded strings of encoded lengths m’ and n’, respectively. we will show an O(m’n+n’m)
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Homework page 102 questions 1, 4, and 10 page 106 questions 4 and 5 page 111 question 1 page 119 question 9.
Construction of Aho Corasick automaton in Linear time for Integer Alphabets Shiri Dori & Gad M. Landau University of Haifa.
On the ICP Algorithm Esther Ezra, Micha Sharir Alon Efrat.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
Lecture 6 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
1 Amihood Amir Bar-Ilan University and Georgia Tech UWSL 2006.
Advanced Algorithms Piyush Kumar (Lecture 5: Weighted Matching) Welcome to COT5405 Based on Kevin Wayne’s slides.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Physical Mapping of DNA Shanna Terry March 2, 2004.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
UNC Chapel Hill M. C. Lin Point Location Reading: Chapter 6 of the Textbook Driving Applications –Knowing Where You Are in GIS Related Applications –Triangulation.
Spring 2015 Lecture 11: Minimum Spanning Trees
RA PRESENTATION Sublinear Geometric Algorithms B 張譽馨 B 汪牧君 B 李元翔.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
A compression-boosting transform for 2D data Qiaofeng Yang Stefano Lonardi University of California, Riverside.
String Matching with k Mismatches Moshe Lewenstein Bar Ilan University Modified by Ariel Rosenfeld.
Improved string matching with k mismatches (The Kangaroo Method) Galil, R. Giancarlo SIGACT News, Vol. 17, No. 4, 1986, pp. 52–54 Original: Moshe Lewenstein.
Computing languages by (bounded) local sets Dora Giammarresi Università di Roma “Tor Vergata” Italy.
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
 6.2 Pivoting Strategies 1/17 Chapter 6 Direct Methods for Solving Linear Systems -- Pivoting Strategies Example: Solve the linear system using 4-digit.
Mathematical Background and Linked Lists. 2 Iterative Algorithm for Sum Find the sum of the first n integers stored in an array v : sum (v[], n) temp_sum.
Lectures on Greedy Algorithms and Dynamic Programming
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Word : Let F be a field then the expression of the form a 1, a 2, …, a n where a i  F  i is called a word of length n over the field F. We denote the.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Graph Algorithms Maximum Flow - Best algorithms [Adapted from R.Solis-Oba]
Center for Graphics and Geometric Computing, Technion 1 Computational Geometry Chapter 9 Line Arrangements.
Suffix Tree 6 Mar MinKoo Seo. Contents  Basic Text Searching  Introduction to Suffix Tree  Suffix Trees and Exact Matching  Longest Common Substring.
Ch03-Algorithms 1. Algorithms What is an algorithm? An algorithm is a finite set of precise instructions for performing a computation or for solving a.
1 Maximum Flows CONTENTS Introduction to Maximum Flows (Section 6.1) Introduction to Minimum Cuts (Section 6.1) Applications of Maximum Flows (Section.
Approximate Matching of Run-Length Compressed Strings
RE-Tree: An Efficient Index Structure for Regular Expressions
Lectures on Network Flows
PC trees and Circular One Arrangements
CS 213: Data Structures and Algorithms
Enough Mathematical Appetizers!
Chapter 5. Optimal Matchings
Computation.
Lectures on Graph Algorithms: searching, testing and sorting
Applied Discrete Mathematics Week 6: Computation
Approximate Matching of Run-Length Compressed Strings
2-Dimensional Pattern Matching
Algorithm design (computational geometry)
Haitao Wang Utah State University WADS 2017, St. John’s, Canada
Lecture 15: Least Square Regression Metric Embeddings
Presentation transcript:

Geometric Matching on Sequential Data Veli Mäkinen AG Genominformatik Technical Fakultät Bielefeld Universität

Stringology Haifa 2005 Geometric matching on sequential data2 Introduction  Motivation: To study problems in the intersection of geometry and stringology.  Applications to time-series data.

Stringology Haifa 2005 Geometric matching on sequential data3 Three problems  1D point set matching under translations (Akutsu, COCOON’04).  1D point set matching under translations, scaling and noise (Böcker & Mäkinen, EuroCG’05)  2D point set matching under translations (Ukkonen & Lemström & Mäkinen, Cieliebak & Mäkinen, 2005).

Stringology Haifa 2005 Geometric matching on sequential data4 1D point set matching under translations  Two point sets A and B of sizes m and n.  Problem 1a: Find largest common point set of f(A) and B over translations f.  Problem 1b: Find largest common point set of f(A) and a continuous subset of B.  Let k be the number of unmatched points.

Stringology Haifa 2005 Geometric matching on sequential data5 Example B A f(A) Problem 1a: k=3 Problem 1b: k=1

Stringology Haifa 2005 Geometric matching on sequential data6 Solutions  Trivial in O(m 2 n log n) time.  Easy in O(mn log m) time.  Akutsu gives an O(k 3 +n log n) time solution.

Stringology Haifa 2005 Geometric matching on sequential data7 Akutsu’s solution  Use differential encoding for A and B.  A’=a 2 -a 1,a 3 -a 2,..., a m -a m-1, B’=b 2 -b 1,b 3 -b 2,..., b n -b n-1.  Construct suffix tree T of A’#B’$.  Preprocess T for LCA queries.

Stringology Haifa 2005 Geometric matching on sequential data8 Akutsu’s solution...  Let Jump(a i,b j )=h where h is largest integer such that,  Jump(a i,b j ) can be computed O(1) time. bjbj b j+h-1 aiai a i+h-1

Stringology Haifa 2005 Geometric matching on sequential data9 Akutsu’s solution...  Observation: One of the first k+1 points in both A and B must match.  Each match defines a translation.  For each translation, one needs at most k+1 queries to Jump() to find out whether there is large enough overlap.

Stringology Haifa 2005 Geometric matching on sequential data10 Akutsu’s solution...  Theorem 1: Problem 1a can be solved in O(k 3 +n log n) time and Problem 1b in O(k 2 n+n log n) time.  Akutsu also gives reductions from 2D/3D problems to 1D achieving good bounds.

Stringology Haifa 2005 Geometric matching on sequential data11 Three problems  1D point set matching under translations (Akutsu, COCOON’04).  1D point set matching under translations, scaling and noise (Böcker & Mäkinen, EuroCG’05)  2D point set matching under translations (Ukkonen & Lemström & Mäkinen, Cieliebak & Mäkinen, 2005).

Stringology Haifa 2005 Geometric matching on sequential data12 Linear 1D point set matching  Let us consider generalization where we allow also scaling and noise.  We search for best linear mapping from point set A to point set B. - maximum number of points of A should move  close to points of B.

Stringology Haifa 2005 Geometric matching on sequential data13 Example A B

Stringology Haifa 2005 Geometric matching on sequential data14 Example... A B f(A)

Stringology Haifa 2005 Geometric matching on sequential data15 Linear 1D point set matching...  There is an optimum mapping such that two points of A are mapped exactly at  -distance from some points of B.  One mapping fixes the translation, second the scale around the new origin defined by the translation.

Stringology Haifa 2005 Geometric matching on sequential data16 Example 22 A B f(A)

Stringology Haifa 2005 Geometric matching on sequential data17 Degenerate solution! 22 B A f(A)

Stringology Haifa 2005 Geometric matching on sequential data18 One-to-one mapping  To avoid the degenerate solution, one needs a better definition for the mapping searched for.  Hence, we search for a mapping producing maximum size one-to-one matching between the points (Problem 2). 22 22 22 22 22 22 f(A) B

Stringology Haifa 2005 Geometric matching on sequential data19 Solving one-to-one case  Consider a fixed translation and scale.  Construct a bipartite graph having edges between points of f(A) and B that are at  - distance.  Solve the maximum matching problem on this graph. 22 22 22 22 22 22 f(A) B

Stringology Haifa 2005 Geometric matching on sequential data20 Solving one-to-one case...  Repeating the algorithm on each relevant translation and scale gives the optimum solution.  The overall time complexity is O((mn) 2 g(mn)) where g(x) is the complexity of the maximum matching algorithm on a graph with x edges.

Stringology Haifa 2005 Geometric matching on sequential data21 Solving one-to-one case faster  Consider a fixed translation, and sort the relevant scales from smallest to largest.  Observation [Alt et al. 88]: The graph G i corresponding to ith scale differs from the graph G i-1 of the (i-1)th scale by one edge.  The maximum matching on G i can be found by searching for an augmenting path in G i-1 added/deleted one edge.

Stringology Haifa 2005 Geometric matching on sequential data22 Solving one-to-one case faster..  Incremental computation gives O((mn) 3 ) time solution.  Theorem 2: Problem 2 can be solved in O((mn) 2 (m+n)) time.  To obtain the result, we exploit the monotonicity of the match graph.

Stringology Haifa 2005 Geometric matching on sequential data23 Staircase property f i (A) B

Stringology Haifa 2005 Geometric matching on sequential data24 Greedy algorithm is enough B f i (A)

Stringology Haifa 2005 Geometric matching on sequential data25 scale i => scale i+1 B f i+1 (A)

Stringology Haifa 2005 Geometric matching on sequential data26 scale i+1 B f i+1 (A)

Stringology Haifa 2005 Geometric matching on sequential data27 scale i+1 => scale i+2 B f i+2 (A)

Stringology Haifa 2005 Geometric matching on sequential data28 scale i+2 B f i+2 (A)

Stringology Haifa 2005 Geometric matching on sequential data29 Observation - open question  Observation: With only translations and noise, we obtain O(mn(m+n)) time.  The staircase matrix changes only by one cell when moving from one scale to another.  Question: Can one update the greedy path incrementally?  O(1) solution for the above would imply that adding noise does not make the problem any harder.

Stringology Haifa 2005 Geometric matching on sequential data30 Three problems  1D point set matching under translations (Akutsu, COCOON’04).  1D point set matching under translations, scaling and noise (Böcker & Mäkinen, EuroCG’05)  2D point set matching under translations (Ukkonen & Lemström & Mäkinen, Cieliebak & Mäkinen, 2005).

Stringology Haifa 2005 Geometric matching on sequential data31 2D point set matching B Af(A)

Stringology Haifa 2005 Geometric matching on sequential data32 Solutions  Easy in O(mn log m) time by constructing the set of mn translation vectors, sorting it, and finding maximum repeating element.  Possible also in O(mn) time by using naive string matching type algorithm.

Stringology Haifa 2005 Geometric matching on sequential data33 Naive point set matching A B Remark: This is the fastest known algorithm for this problem!!

Stringology Haifa 2005 Geometric matching on sequential data34 Restricted case?  Would the problem become easier if there were no other points inside the area of matches? f(A)

Stringology Haifa 2005 Geometric matching on sequential data35 Restricted case?  Restricted 1D case is extremely easy: - Exact string matching on the differentially encoded sequences.

Stringology Haifa 2005 Geometric matching on sequential data36 Easier on grid points

Stringology Haifa 2005 Geometric matching on sequential data37 Easier on grid points...  The problem becomes a special case of two- dimensional exact string matching.  Can be solved in O(N 2 ) time on a text grid of size N £ N and pattern grid of size M £ M.  Notice that the run-length encoded representation of the rows of the matrix is of size O(n).

Stringology Haifa 2005 Geometric matching on sequential data38 Easier on grid points...  The algorithm of Amir & Landau & Sokol, 2002, for run-length compressed 2D search can be applied: - Time complexity O(M 2 +n). (can be reduced to O(m 2 +n)?)

Stringology Haifa 2005 Geometric matching on sequential data39 What about Bird-Baker?  Our idea to solve the problem is to modify Bird-Baker algorithm to work directly on point sets.  As a preliminary tool, we need an Aho- Corasick automaton that recognizes run- length encoded binary strings.

Stringology Haifa 2005 Geometric matching on sequential data40 Run-length encoding

Stringology Haifa 2005 Geometric matching on sequential data41 Modified Aho-Corasick automaton  Proposition: There is an automaton accepting a set of run-length encoded binary strings with the following properties: - O(m log m) construction time, where m is the number of 1-bits in the set. - Reading a fail-link in O(log m) time. - Scanning a string with n 1-bits in O(n log m) time.

Stringology Haifa 2005 Geometric matching on sequential data42 Bird-Baker on point sets  Now we can build our automaton on the rows of set A, scan it with the rows of set B.  Let R be the set of positions where a row of A was accepted inside the rows of B.  After sorting R by columns, we can test in O(|R|) time if any column of R contains the correct sequence of accepting states.

Stringology Haifa 2005 Geometric matching on sequential data43 Bird-Baker on point sets  The overall running time is O(n log m +|R| log |R|).  Unfortunately, there are examples where |R|=  (mn) :-(  Hence, it is still open if (even) the restricted case has o(mn) solution or not.