Xiaoming Sun Tsinghua University David Woodruff MIT

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Limitations of Quantum Advice and One-Way Communication Scott Aaronson UC Berkeley IAS Useful?
Lower Bounds on Streaming Algorithms for Approximating the Length of the Longest Increasing Subsequence. Anna GalUT Austin Parikshit GopalanU. Washington.
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
The Communication Complexity of Approximate Set Packing and Covering
Longest Common Subsequence
Embedding the Ulam metric into ℓ 1 (Ενκρεβάτωση του μετρικού χώρου Ulam στον ℓ 1 ) Για το μάθημα “Advanced Data Structures” Αντώνης Αχιλλέως.
QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.
Longest Increasing Subsequence and Distance to Monotonicity in Data Stream Model Hossein Jowhari Simon Fraser University Joint work with Funda Ergun Dagstuhl.
Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Randomized Algorithms Tutorial 3 Hints for Homework 2.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Limitations of VCG-Based Mechanisms Shahar Dobzinski Joint work with Noam Nisan.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Avraham Ben-Aroya (Tel Aviv University) Oded Regev (Tel Aviv University) Ronald de Wolf (CWI, Amsterdam) A Hypercontractive Inequality for Matrix-Valued.
Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.
Variable-Length Codes: Huffman Codes
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
On Everlasting Security in the Hybrid Bounded Storage Model Danny Harnik Moni Naor.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Distributed Verification and Hardness of Distributed Approximation Atish Das Sarma Stephan Holzer Danupon Nanongkai Gopal Pandurangan David Peleg 1 Weizmann.
Longest Increasing Subsequences in Windows Based on Canonical Antichain Partition Erdong Chen (Joint work with Linji Yang & Hao Yuan) Shanghai Jiao Tong.
Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.
The Lower Bounds of Problems
Lower Bounds for Read/Write Streams Paul Beame Joint work with Trinh Huynh (Dang-Trinh Huynh-Ngoc) University of Washington.
Information Theory for Data Streams David P. Woodruff IBM Almaden.
Conjunctive Filter: Breaking the Entropy Barrier Daisuke Okanohara *1, *2 Yuichi Yoshida *1*3 *1 Preferred Infrastructure Inc. *2 Dept. of Computer Science,
Asymmetric Communication Complexity And its implications on Cell Probe Complexity Slides by Elad Verbin Based on a paper of Peter Bro Miltersen, Noam Nisan,
Data Stream Algorithms Lower Bounds Graham Cormode
Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
The Message Passing Communication Model David Woodruff IBM Almaden.
Sorting & Lower Bounds Jeff Edmonds York University COSC 3101 Lecture 5.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Computational Geometry
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
1 Chapter 8-1: Lower Bound of Comparison Sorts. 2 About this lecture Lower bound of any comparison sorting algorithm – applies to insertion sort, selection.
The Range Mode ProblemCell Probe Lower Bounds for the Range Mode ProblemThe Range k-frequency Problem Preprocess an array A of n elements into a space.
Information Complexity Lower Bounds
New Characterizations in Turnstile Streams with Applications
Open Problems in Streaming
Branching Programs Part 3
CS 154, Lecture 6: Communication Complexity
The Subset Sum Game Revisited
Near-Optimal (Euclidean) Metric Compression
Uncertain Compression
The Communication Complexity of Distributed Set-Joins
Imperfectly Shared Randomness
Communication Amid Uncertainty
Presentation transcript:

Xiaoming Sun Tsinghua University David Woodruff MIT The Communication and Streaming Complexity of Computing the Longest Common and Increasing Subsequences Xiaoming Sun Tsinghua University David Woodruff MIT

The Problem 4 3 7 3 1 1 Stream of elements a1, …, an 2  Stream of elements a1, …, an 2  Algorithm given one pass over stream Problem: Compute the longest increasing subsequence (LIS) – in this case answer is (3,7)

Previous Work Let k be the length of the LIS of the stream There exists an algorithm which computes the LIS with O(k2 log ||) space [LNVZ05] Trivial (k) lower bound Our first result: Improve both bounds to a tight (k2 log ||/k)

Our Lower Bound Reduction from indexing function: What is xi? Alice Bob What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Randomized 1-way communication is (n)

What is xi? x 2 {0,1}n i 2 [n] = {1, 2, …, n} Alice Bob Construct a stream A Construct a stream B From LIS(A, B), Bob can get xi 2. |LIS(A, B)| = k, where k is input parameter

Alice Ak-1 Value … x 2 {0,1}n A: A2 A1 Position in stream Alice uses x to create k-1 increasing sequences A1, …, Ak-1 For each j, Aj has length j. Each bit of x is encoded in some sequence Aj Every element in Ak-1 is larger than every element in Ak-2, every element in Ak-2 larger than every element in Ak-3, etc. Set A = Ak-1 ,…, A2 , A1

Aj-1 Aj+1 Value Position in stream Aj B: B Bob i 2 [n] Bob uses i to recover Aj, the sequence encoding xi Bob creates an increasing subsequence B of length k-j, Every element in B is greater than Ar if r < j, and every element in B is less than Ar if r > j

What is xi? x 2 {0,1}n i 2 [n] Alice Bob Aj+1 B Aj Aj-1 B Value B Aj Aj-1 B A = Ak-1, …, A2, A1 Position in stream LIS(A, B) = Aj, B, and |LIS(A, B)| = k But xi encoded in Aj, so Bob recovers xi

Thus, any streaming algorithm must use (n) space. But what is n? We need to construct k increasing sequences that are different for different x in {0,1}n Assume || large. Divide  into k-1 blocks of size ||/(k-1) Let Aj be a random increasing sequence of length j in block j. The space to represent Aj is (k log ||/k) for j > k/2 Set n = (k2 log ||/k).

Our Upper Bound When processing the stream, keep lists A[1], A[2], …, A[k]. A[j] is an LIS of length j in the stream with minimal last element. Let L[1], L[2], …, L[k] be last elements of A[1], A[2], …, A[k] To process item x, find i for which L[i] < x < L[i+1], and replace A[i+1] with A[i], x

So we have k arrays A[1], …, A[k], each of length at most k. Naively, this takes O(k2 log ||) space. But the Ai are increasing, so can compress the list by storing differences. Total space is O(k2 log ||/k).

This talk First result: a tight space bound for the LIS problem Second result: tight bounds for longest common subsequence (LCS)

LCS Bounds Problem: Alice has a permutation  of [N], Bob has a permutation  of [N]. Decide if |LCS(, )| ¸ k. Previous space bound: (k) [LNVZ05] Our space bound: (N) for 3 · k · N/2 (holds for randomized O(1)-pass algorithms)

LCS Bounds Why can we only prove (N) for 3 · k · N/2? If k = 2, reduces to equality test. If k large, there are at most O(N2(N-k)) permutations  with |LCS(, )| > k, so just use an equality test with error O(1/N2(N-k))

Our Lower Bound Padding lemma: if for k = 3 the randomized communication complexity is (N), then it’s (N) for all k · N/2 Proof: just pad each of the inputs by some common subsequence of length k-3

Remains to show high complexity for k =3. We reduce from disjointness Is there an i such that xi = yi = 1? Alice Bob x 2 {0,1}n y 2 {0,1}n Randomized multi-way communication is (n)

Is there an i such that xi = yi = 1? x 2 {0,1}N/3 y 2 {0,1}N/3 Alice Bob x 2 {0,1}N/3 y 2 {0,1}N/3 Construct  Construct  Want |LCS(, )| ¸ 3 iff x and y are disjoint

Divide 1, …, N into N/3 groups Alice  = 1, 2, …, N/3 x 2 {0,1}N/3 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use x to choose 1, …, N/3 i acts on Gi If xi = 0, i (m+1, m+2, m+3) = (m+1, m+2, m+3). If xi = 1, i (m+1, m+2, m+3) = (m+1, m+3, m+2).

Divide 1, …, N into N/3 groups Bob y 2 {0,1}N/3  = N/3 , …, 1 Divide 1, …, N into N/3 groups G1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N). Use y to choose 1, …, N/3 i acts on Gi If yi = 0, i (m+1, m+2, m+3) = (m+3, m+2, m+1). If yi = 1, I (m+1, m+2, m+3) = (m+1, m+3, m+2).

N/3(GN/3) N/3(GN/3) … … 3(G3) 3(G3) 2(G2) 2(G2) 1(G1) 1(G1) Claim: |LCS(, )| · 3. Proof: Use the fact that LCS(, ) intersects at most one Gi Claim: |LCS(, )| = 3 iff there is some i with xi = yi = 1 Proof: Use the way we defined i and i Thus, can decide disjointness, so (N) communication.

Other results Tight space bounds for computing the LIS length. Generalization to approximate LIS and LCS. Still many gaps here. Example: approximate LIS length, we have (1/) and O(k log ||). Recent work [GJKK07] has shown O(sqrt(N/) log ||), but still large gap.

Conclusion First result: a tight bound for the LIS Second result: an (N) space bound for the LCS k-decision problem for 3 · k · N/2 Other results for approximation problems Another open question: extend our lower bound for LIS to randomized multi-round