# Xiaoming Sun Tsinghua University David Woodruff MIT

## Presentation on theme: "Xiaoming Sun Tsinghua University David Woodruff MIT"— Presentation transcript:

Xiaoming Sun Tsinghua University David Woodruff MIT
The Communication and Streaming Complexity of Computing the Longest Common and Increasing Subsequences Xiaoming Sun Tsinghua University David Woodruff MIT

The Problem 4 3 7 3 1 1 Stream of elements a1, …, an 2 
Stream of elements a1, …, an 2  Algorithm given one pass over stream Problem: Compute the longest increasing subsequence (LIS) – in this case answer is (3,7)

Previous Work Let k be the length of the LIS of the stream
There exists an algorithm which computes the LIS with O(k2 log ||) space [LNVZ05] Trivial (k) lower bound Our first result: Improve both bounds to a tight (k2 log ||/k)

Our Lower Bound Reduction from indexing function: What is xi?
Alice Bob What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Randomized 1-way communication is (n)

What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Alice Bob
Construct a stream A Construct a stream B From LIS(A, B), Bob can get xi 2. |LIS(A, B)| = k, where k is input parameter

Alice Ak-1 Value x 2 {0,1}n A: A2 A1 Position in stream Alice uses x to create k-1 increasing sequences A1, …, Ak-1 For each j, Aj has length j. Each bit of x is encoded in some sequence Aj Every element in Ak-1 is larger than every element in Ak-2, every element in Ak-2 larger than every element in Ak-3, etc. Set A = Ak-1 ,…, A2 , A1

Aj-1 Aj+1 Value Position in stream Aj B: B Bob i 2 [n] Bob uses i to recover Aj, the sequence encoding xi Bob creates an increasing subsequence B of length k-j, Every element in B is greater than Ar if r < j, and every element in B is less than Ar if r > j

What is xi? x 2 {0,1}n i 2 [n] Alice Bob Aj+1 B Aj Aj-1 B
Value B Aj Aj-1 B A = Ak-1, …, A2, A1 Position in stream LIS(A, B) = Aj, B, and |LIS(A, B)| = k But xi encoded in Aj, so Bob recovers xi

Thus, any streaming algorithm must use (n) space.
But what is n? We need to construct k increasing sequences that are different for different x in {0,1}n Assume || large. Divide  into k-1 blocks of size ||/(k-1) Let Aj be a random increasing sequence of length j in block j. The space to represent Aj is (k log ||/k) for j > k/2 Set n = (k2 log ||/k).

Our Upper Bound When processing the stream, keep lists A[1], A[2], …, A[k]. A[j] is an LIS of length j in the stream with minimal last element. Let L[1], L[2], …, L[k] be last elements of A[1], A[2], …, A[k] To process item x, find i for which L[i] < x < L[i+1], and replace A[i+1] with A[i], x

So we have k arrays A[1], …, A[k], each of length at most k.
Naively, this takes O(k2 log ||) space. But the Ai are increasing, so can compress the list by storing differences. Total space is O(k2 log ||/k).

This talk First result: a tight space bound for the LIS problem
Second result: tight bounds for longest common subsequence (LCS)

LCS Bounds Problem: Alice has a permutation  of [N], Bob has a permutation  of [N]. Decide if |LCS(, )| ¸ k. Previous space bound: (k) [LNVZ05] Our space bound: (N) for 3 · k · N/2 (holds for randomized O(1)-pass algorithms)

LCS Bounds Why can we only prove (N) for 3 · k · N/2?
If k = 2, reduces to equality test. If k large, there are at most O(N2(N-k)) permutations  with |LCS(, )| > k, so just use an equality test with error O(1/N2(N-k))

Our Lower Bound Padding lemma: if for k = 3 the randomized communication complexity is (N), then it’s (N) for all k · N/2 Proof: just pad each of the inputs by some common subsequence of length k-3

Remains to show high complexity for k =3. We reduce from disjointness
Is there an i such that xi = yi = 1? Alice Bob x 2 {0,1}n y 2 {0,1}n Randomized multi-way communication is (n)

Is there an i such that xi = yi = 1? x 2 {0,1}N/3 y 2 {0,1}N/3 Alice
Bob x 2 {0,1}N/3 y 2 {0,1}N/3 Construct  Construct  Want |LCS(, )| ¸ 3 iff x and y are disjoint

Divide 1, …, N into N/3 groups
Alice  = 1, 2, …, N/3 x 2 {0,1}N/3 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use x to choose 1, …, N/3 i acts on Gi If xi = 0, i (m+1, m+2, m+3) = (m+1, m+2, m+3). If xi = 1, i (m+1, m+2, m+3) = (m+1, m+3, m+2).

Divide 1, …, N into N/3 groups
Bob y 2 {0,1}N/3  = N/3 , …, 1 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use y to choose 1, …, N/3 i acts on Gi If yi = 0, i (m+1, m+2, m+3) = (m+3, m+2, m+1). If yi = 1, I (m+1, m+2, m+3) = (m+1, m+3, m+2).

N/3(GN/3) N/3(GN/3) 3(G3) 3(G3) 2(G2) 2(G2) 1(G1) 1(G1) Claim: |LCS(, )| · 3. Proof: Use the fact that LCS(, ) intersects at most one Gi Claim: |LCS(, )| = 3 iff there is some i with xi = yi = 1 Proof: Use the way we defined i and i Thus, can decide disjointness, so (N) communication.

Other results Tight space bounds for computing the LIS length.
Generalization to approximate LIS and LCS. Still many gaps here. Example: approximate LIS length, we have (1/) and O(k log ||). Recent work [GJKK07] has shown O(sqrt(N/) log ||), but still large gap.

Conclusion First result: a tight bound for the LIS
Second result: an (N) space bound for the LCS k-decision problem for 3 · k · N/2 Other results for approximation problems Another open question: extend our lower bound for LIS to randomized multi-round