Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.

Slides:



Advertisements
Similar presentations
Convolutional Codes Representation and Encoding  Many known codes can be modified by an extra code symbol or by deleting a symbol * Can create codes of.
Advertisements

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
Cyclic Code.
Applied Algorithmics - week7
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
التصميم المنطقي Second Course
Greedy Algorithms Amihood Amir Bar-Ilan University.
ERROR CORRECTION.
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Bar Ilan University And Georgia Tech Artistic Consultant: Aviya Amir.
CENG536 Computer Engineering Department Çankaya University.
15-853Page : Algorithms in the Real World Suffix Trees.
1 2 Dimensional Parameterized Matching Carmit Hazay Moshe Lewenstein Dekel Tsur.
Asynchronous Pattern Matching - Metrics Amihood Amir.
BTrees & Bitmap Indexes
Property Matching and Weighted Matching Amihood Amir, Eran Chencinski, Costas Iliopoulos, Tsvi Kopelowitz and Hui Zhang.
Dynamic Text and Static Pattern Matching Amihood Amir Gad M. Landau Moshe Lewenstein Dina Sokol Bar-Ilan University.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
CSE 421 Algorithms Richard Anderson Lecture 13 Divide and Conquer.
Indexing and Searching
7/2/2015Errors1 Transmission errors are a way of life. In the digital world an error means that a bit value is flipped. An error can be isolated to a single.
Orgad Keller Modified by Ariel Rosenfeld Less Than Matching.
S C A L E D Pattern Matching Amihood Amir Ayelet Butman Bar-Ilan University Moshe Lewenstein and Johns Hopkins University Bar-Ilan University.
Error Detection and Reliable Transmission EECS 122: Lecture 24 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
1 Exact Set Matching Charles Yan Exact Set Matching Goal: To find all occurrences in text T of any pattern in a set of patterns P={p 1,p 2,…,p.
SIMS-201 Representing Information in Binary. 2  Overview Chapter 3: The search for an appropriate code Bits as building blocks of information Binary.
They are the same as registers since they store binary numbers. Called shifting registers since they shift (left or right) the binary number stored in.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
SPANISH CRYPTOGRAPHY DAYS (SCD 2011) A Search Algorithm Based on Syndrome Computation to Get Efficient Shortened Cyclic Codes Correcting either Random.
Lecture 10: Error Control Coding I Chapter 8 – Coding and Error Control From: Wireless Communications and Networks by William Stallings, Prentice Hall,
EX_01.1/46 Numeric Systems. EX_01.2/46 Overview Numeric systems – general, Binary numbers, Octal numbers, Hexadecimal system, Data units, ASCII code,
Bits, Bytes, Words Digital signal. Digital Signals The amplitude of a digital signal varies between a logical “0” and logical “1”. – The information in.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
Chapter 10 Strings, Searches, Sorts, and Modifications Midterm Review By Ben Razon AP Computer Science Period 3.
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Great Theoretical Ideas in Computer Science.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
DIGITAL COMMUNICATIONS Linear Block Codes
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
06/12/2015Applied Algorithmics - week41 Non-periodicity and witnesses  Periodicity - continued If string w=w[0..n-1] has periodicity p if w[i]=w[i+p],
Computer Science Division
Announcement!!! First exam next Thursday (I’m trying to give you a first exam before the drop date) I’ll post a sample exam over the weekend and will try.
4/3/2003CSE More Math CSE Algorithms Euclidean Algorithm Divide and Conquer.
CSE 421 Algorithms Lecture 15 Closest Pair, Multiplication.
Error Detection and Correction
Computer Science Background for Biologists CSC 487/687 Computing for Bioinformatics Fall 2005.
1 How to Multiply Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. integers, matrices, and polynomials.
On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling Amihood Amir Benny Porat.
Sorting Algorithms Written by J.J. Shepherd. Sorting Review For each one of these sorting problems we are assuming ascending order so smallest to largest.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
Interleaving Compounding Packets & Convolution Codes
Pattern Matching With Don’t Cares Clifford & Clifford’s Algorithm Orgad Keller.
Section Recursion 2  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Section Recursion  Recursion – defining an object (or function, algorithm, etc.) in terms of itself.  Recursion can be used to define sequences.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
Cyclic Redundancy Check (CRC)
Advanced Computer Networks
13 Text Processing Hongfei Yan June 1, 2016.
Arithmetic operations in binary
Rabin & Karp Algorithm.
Real-time 1-input 1-output DSP systems
Punya Biswas Lecture 15 Closest Pair, Multiplication
Data Structures Sorted Arrays
Lecture 17 Making New Codes from Old Codes (Section 4.6)
Presentation transcript:

Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010

Motivation

Western WallBar Ilan University Error in Address: Error in Content: Bar Ilan UniversityWestern Wall

Motivation In the “old” days: Pattern and text are given in correct sequential order. It is possible that the content is erroneous. New paradigm: Content is exact, but the order of the pattern symbols may be scrambled. Why? Transmitted asynchronously? The nature of the application?

Example: Swaps Tehse knids of typing mistakes are very common So when searching for pattern These we are seeking the symbols of the pattern but with an order changed by swaps. Surprisingly, pattern matching with swaps is easier than pattern matching with mismatches (ACHLP:01)

Motivation: Biology. Reversals AAAGGCCCTTTGAGCCC AAAGAGTTTCCCGGCCC Given a DNA substring, a piece of it can detach and reverse. Question: What is the minimum number of reversals necessary to match a pattern and text?

Example: Transpositions AAAGGCCCTTTGAGCCC AATTTGAGGCCCAGCCC Given a DNA substring, a piece of it can be transposed to another area. Question: What is the minimum number of transpositions necessary to match a pattern?

Motivation: Architecture. Assume distributed memory. Our processor has text and requests pattern of length m. Pattern arrives in m asynchronous packets, of the form: Example:,,,, Pattern: BCBAA

What Happens if Address Bits Have Errors? In Architecture: 1. Checksums. 2. Error Correcting Codes. 3. Retransmits.

We would like… To avoid extra transmissions. For every text location compute the minimum number of address errors that can cause a mismatch in this location.

Our Model… Text: T[0],T[1],…,T[n] Pattern: P[0]=, P[1]=, …, P[m]= ; C[i] є ∑, A [i] є {1, …,m}. Standard pattern Matching: no error in A. Asynchronous Pattern Matching: no error in C. Eventually: error in both.

Address Register log m bits “bad” bits What does “bad” mean? 1. bit “flips” its value. 2. bit sometimes flips its value. 3. Transient error. 4. “stuck” bit. 5. Sometimes “stuck” bit.

We will now concentrate on consistent bit flips Example: Let ∑={a,b} T[0] T[1] T[2] T[3] a a b b P[0] P[1] P[2] P[3] b b a a

P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] b b a a Example: BAD

P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] a a b b Example: GOOD

P[0] P[1] P[2] P[3] b b a a P[00] P[01] P[10] P[11] a a b b Example: BEST

Naive Algorithm For each of the 2 = m different bit combinations try matching. Choose match with minimum bits. Time: O(m ). 2 log m

In Pattern Matching Convolutions: O(n log m) using FFT b 0 b 1 b 2

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] Dot products array: P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

What Really Happened? T[0] T[1] T[2] T[3] C[-3] C[-2] C[-1] C[0] C[1] C[2] C[3] P[0] P[1] P[2] P[3]

Another way of defining the convolution: Where we define: P[x]=0 for x m.

FFT solution to the “shift” convolution: 1. Compute in time O(m log m) (values of X at roots of unity). 2. For polynomial multiplication compute values of product polynomial at roots of unity in time O(m log m). 3. Compute the coefficient of the product polynomial, again in time O(m log m).

A General Convolution C f Bijections ; j=1,….,O(m)

Consistent bit flip as a Convolution Construct a mask of length log m that has 0 in every bit except for the bad bits where it has a 1. Example: Assume the bad bits are in indices i,j,k є{0,…,log m}. Then the mask is i j k An exclusive OR between the mask and a pattern index Gives the target index.

Example: Mask: 0010 Index: Index:

Our Case: Denote our convolution by: Our convolution: For each of the 2 =m masks, let jє{0,1} log m

To compute min bit flip: Let T,P be over alphabet {0,1}: For each j, is a permutation of P. Thus, only the j ’s for which = number of 1 ‘s in T are valid flips. Since for them all 1’s match 1’s and all 0’s match 0’s. Choose valid j with minimum number of 1’s.

Time All convolutions can be computed in time O(m ) After preprocessing the permutation functions as tables. Can we do better? (As in the FFT, for example) 2

Idea – Divide and Conquer- Walsh Transform 1.Split T and P to the length m/2 arrays: 2.Compute 3.Use their values to compute in time O(m). Time: Recurrence: t(m)=2t(m/2)+m Closed Form: t(m)=O(m log m)

Details Constructing the Smaller Arrays Note: A mask can also be viewed as a number i=0,…, m-1. For :, m-2 m-1 V[0]+V[1], V[2]+V[3],...,V[m-2]+V[m-1] V[0]-V[1], V[2]-V[3],...,V[m-2]-V[m-1] V = + -

Putting it Together

Putting it Together

Putting it Together

Putting it Together

Putting it Together Why does it work ????

Consider the case of i=0 dot product T t 0 t 1 P p 0 p 1 T - t 0 - t 1 P - p 0 -p 1 T + t 0+ t 1 P + p 0+ p 1 dot product

Consider the case of i=0 dot product T t 0 t 1 P p 0 p 1 T - t 0 - t 1 P - p 0 -p 1 T + t 0+ t 1 P + p 0+ p 1 dot product Need a way to get this

Consider the case of i=0 dot product T t 0 t 1 P p 0 p 1 T - t 0 - t 1 P - p 0 -p 1 T + t 0+ t 1 P + p 0+ p 1 dot product Need a way to get this from these…

Lemma: T a c P b d To get the dot product: ab+cd from: (a+c)(b+d) and (a-c)(b-d) Add: (a+c)(b+d) = ab + cd + cb + ad (a-c)(b-d) = ab + cd – cb – ad Get: 2ab+2cd Divide by 2: ab + cd Because of distributivity it works for entire dot product. T + a+c P + b+d T - a-c P - b-d

If mask is 00001: T a c P b d To get the dot product: ad+cb from: (a+c)(b+d) and (a-c)(b-d) Subtract: (a+c)(b+d) = ab + cd + cb + ad (a-c)(b-d) = ab + cd – cb – ad Get: 2cb+2ad Divide by 2: cb + ad Because of distributivity it works for entire dot product. T + a+c P + b+d T - a-c P - b-d

What happens when other bits are bad? If LSB=0, mask i0 on T x P is mask i on T + x P + and T - x P - meaning, the “bad” bit is at half the index. P P + What it means is that appropriate pairs are multiplied, and single products are extracted from pairs as seen in the lemma.

If Least Significant Bit is 1 If LSB=1, mask i1 on is mask i on meaning, the “bad” bit is at half the index. But there Is an additional flip within pairs. P P + What it means is that appropriate pairs are multiplied, and single products are extracted from pairs as seen in the lemma for the case of flip within pair.

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings:

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a 20

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a 21

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a a m1

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a a m1... S log m =a 0 log m

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a a m1... S log m =a 0 log m a 1 log m

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a a m1... S log m =a 0 log m a 1 log m a 2 log m

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a a m1... S log m =a 0 log m a 1 log m a 2 log m... a m log m

General Alphabets 1. Sort all symbols in T and P. 2. Encode {0,…,m} in binary, i.e. log m bits per symbol. 3. Split into log m strings: S = A 0 A 1 A 2... A m a 00 a 01 a 02 … a 10 a 11 a 12 … a 20 a 21 a 22 … a m0 a m1 a m2 … S 0 = a 00 a 10 a a m0 S 1 = a 01 a 11 a a m1... S log m =a 0 log m a 1 log m a 2 log m... a m log m

General Alphabets 4. For each S i : Write list of masks that achieves minimum flips. 5. Merge lists and look for masks that appear in all. Time: O(m log m) per bit. O(m log 2 m) total.

Other Models 1. Minimum “bad” bits (occasionally flip). 2. Minimum transient error bits? 3. Consistent flip in string matching model? 4. Consistent “stuck” bit? 5. Transient “stuck” bit? Note: The techniques employed in asynchronous pattern matching have so far proven new and different from traditional pattern matching.

Thank You