Download presentation

Presentation is loading. Please wait.

Published byOmar Pile Modified over 2 years ago

1
1 Approximating Edit Distance in Near-Linear Time Alexandr Andoni (MIT) Joint work with Krzysztof Onak (MIT)

2
2 Edit Distance For two strings x,y ∑ n ed(x,y) = minimum number of edit operations to transform x into y Edit operations = insertion/deletion/substitution Important in: computational biology, text processing, etc Example: ED(0101010, 1010101) = 2

3
3 Computing Edit Distance Problem: compute ed(x,y) for given x,y {0,1} n Exactly: O(n 2 ) [Levenshtein’65] O(n 2 /log 2 n) for |∑|=O(1) [Masek-Paterson’80] Approximately in n 1+o(1) time: n 1/3+o(1) approximation [Batu-Ergun-Sahinalp’06], improving over [Myers’86, BarYossef-Jayram-Krauthgamer-Kumar’04] Sublinear time: ≤n 1-ε vs ≥n/100 in n 1-2ε time [Batu-Ergun-Kilian-Magen- Raskhodnikova-Rubinfeld-Sami’03]

4
4 Computing via embedding into ℓ 1 Embedding: f:{0,1} n → ℓ 1 such that ed(x,y) ≈ ||f(x) - f(y)|| 1 up to some distortion (=approximation) Can compute ed(x,y) in time to compute f(x) Best embedding by [Ostrovsky-Rabani’05]: distortion = 2 Õ(√log n) Computation time: ~n 2 randomized (and similar dimension) Helps for nearest neighbor search, sketching, but not computation…

5
5 Our result Theorem: Can compute ed(x,y) in n*2 Õ(√log n) time with 2 Õ(√log n) approximation While uses some ideas of [OR’05] embedding, it is not an algorithm for computing the [OR’05] embedding

6
6 Sketcher’s hat 2 examples of “sketches” from embeddings… [Johnson-Lindenstrauss]: pick a random k- subspace of R n, then for any q 1,…q n R n, if q̃ i is projection of q i, then, w.h.p. ||q i -q j || 2 ≈ ||q̃ i -q̃ j || 2 up to O(1) distortion. for k=O(log n) [Bourgain]: given n vectors q i, can construct n vectors q̃ i of k=O(log 2 n) dimension such that ||q i -q j || 1 ≈ ||q̃ i -q̃ j || 1 up to O(log n) distortion.

7
7 Our Algorithm For each length m in some fixed set L [n], compute vectors v i m ℓ 1 such that ||v i m – v j m || 1 ≈ ed( z[i:i+m], z[j:j+m] ) Dimension of v i m is only O(log 2 n) Vectors {v i m } are computed recursively from {v i k } corresponding to shorter substrings (smaller k L) Output: ed(x,y)≈||v 1 n/2 – v n/2+1 n/2 || 1 (i.e., for m=n/2=|x|=|y|) i z[i:i+m] z= xy

8
8 Idea: intuition How to compute {v i m } from {v i k } for k<

9
9 Key step: Main Lemma: fix n vectors v i ℓ 1 k, of dimension k=O(log 2 n). Let s

10
10 Proof of Main Lemma “low” = log O(1) n Graph-metric: shortest path on a weighted graph Sparse: Õ(n) edges min k M is semi-metric on M k with “distance” d min,M (x,y)=min i=1..k d M (x i,y i ) EMD over n sets A i min low ℓ 1 high min low ℓ 1 low min low tree-metric sparse graph-metric O(log 2 n) O(1) O(log n) O(log 3 n) ℓ 1 low O(log n) [Bourgain] (efficient)

11
11 Step 1 EMD over n sets A i min low ℓ 1 high O(log 2 n) q.e.d.

12
12 Step 2 Lemma 2: can embed an n point set from ℓ 1 H into min O(log n) ℓ 1 k, for k=log 3 n, with O(1) distortion. Use weak dimensionality reduction in ℓ 1 Thm [Indyk’06]: Let A be a random* matrix of size H by k=log 3 n. Then for any x,y, letting x̃=Ax, ỹ=Ay: no contraction: ||x̃-ỹ|| 1 ≥||x-y|| 1 (w.h.p.) 5-expansion: ||x̃-ỹ|| 1 ≤5*||x-y|| 1 (with 0.01 probability) Just use O(log n) of such embeddings Their min is O(1) approximation to ||x-y|| 1, w.h.p. min low ℓ 1 high min low ℓ 1 low O(1)

13
13 Efficiency of Step 1+2 From step 1+2, we get some embedding f() of sets A i ={v i, v i+1, …, v i+s-1 } into min low ℓ 1 low Naively would take Ω(n*s)=Ω(n 2 ) time to compute all f(A i ) Save using linearity of sketches: f() is linear: f(A) = ∑ a A f(a) Then f(A i ) = f(A i-1 )-f(v i-1 )+f(v i+s-1 ) Compute f(A i ) in order, for a total of Õ(n) time

14
14 Step 3 Lemma 3: can embed ℓ 1 over {0..M} p into min low tree-m, with O(log n) distortion. For each Δ = a power of 2, take O(log n) random grids. Each grid gives a min - coordinate min low ℓ 1 low min low tree-metric O(log n) ∞ Δ

15
15 Step 4 Lemma 4: suppose have n points in min low tree-m, which approximates a metric up to distortion D. Can embed into a graph-metric of size Õ(n) with distortion D. min low tree-metric sparse graph-metric O(log 3 n)

16
16 Step 5 Lemma 5: Given a graph with m edges, can embed the graph-metric into ℓ 1 low with O(log n) distortion in Õ(m) time. Just implement [Bourgain]’s embedding: Choose O(log 2 n) sets B i Need to compute the distance from each node to each B i For each B i can compute its distance to each node using Dijkstra’s algorithm in Õ(m) time sparse graph-metric ℓ 1 low O(log n)

17
17 Summary of Main Lemma Min-product helps to get low dimension (~small-size sketch) bypasses impossibility of dim-reduction in ℓ 1 Ok that it is not a metric, as long as it is close to a metric EMD over n sets A i min low ℓ 1 high min low ℓ 1 low min low tree-metric sparse graph-metric O(log 2 n) O(1) O(log n) O(log 3 n) ℓ 1 low O(log n) oblivious non-oblivious

18
18 Conclusion Theorem: can compute ed(x,y) in n*2 Õ(√log n) time with 2 Õ(√log n) approximation

Similar presentations

OK

An Efficient Index Structure for String Databases Tamer Kahveci Ambuj K. Singh Presented By Atul Ugalmugale/Nikita Rasam 1.

An Efficient Index Structure for String Databases Tamer Kahveci Ambuj K. Singh Presented By Atul Ugalmugale/Nikita Rasam 1.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on centring ig Ppt on modern methods of rainwater harvesting Moving message display ppt on ipad Ppt on political parties and electoral process united Difference between lcd and led display ppt online Ppt on adobe photoshop tools Ppt on deccan plateau of india Oled flexible display ppt on tv Ppt on motivation in psychology Ppt on mughal empire in india