Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Slides:

Advertisements

Similar presentations

Estimating Distinct Elements, Optimally

Advertisements

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

Numerical Linear Algebra in the Streaming Model

Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.

Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.

6.896: Topics in Algorithmic Game Theory Lecture 11 Constantinos Daskalakis.

Sketching for M-Estimators: A Unified Approach to Robust Regression

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

1 Mazes In The Theory of Computer Science Dana Moshkovitz.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?

Complexity 1 Mazes And Random Walks. Complexity 2 Can You Solve This Maze?

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Tight Bounds for Graph Problems in Insertion Streams Xiaoming Sun and David P. Woodruff Chinese Academy of Sciences and IBM Research-Almaden.

Information Complexity Lower Bounds for Data Streams David Woodruff IBM Almaden.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

Information Theory for Data Streams David P. Woodruff IBM Almaden.

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

The Message Passing Communication Model David Woodruff IBM Almaden.

Approximation Algorithms based on linear programming.

Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.

A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.

Krishnendu ChatterjeeFormal Methods Class1 MARKOV CHAINS.

New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,

PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.

Umans Complexity Theory Lectures

Information Complexity Lower Bounds

Stochastic Streams: Sample Complexity vs. Space Complexity

New Characterizations in Turnstile Streams with Applications

Streaming & sampling.

Umans Complexity Theory Lectures

Sample Mean Distributions

From dense to sparse and back again: On testing graph properties (and some properties of Oded)

Approximate Matchings in Dynamic Graph Streams

Background: Lattices and the Learning-with-Errors problem

Sketching and Embedding are Equivalent for Norms

Lecture 7: Dynamic sampling Dimension Reduction

CIS 700: “algorithms for Big Data”

Randomized Algorithms CS648

Linear sketching with parities

Alternating tree Automata and Parity games

Near-Optimal (Euclidean) Metric Compression

Lecture 2 – Monte Carlo method in finance

Uncertain Compression

The Communication Complexity of Distributed Set-Joins

Linear sketching over

CSCI B609: “Foundations of Data Science”

On the effect of randomness on planted 3-coloring models

Introduction Wireless Ad-Hoc Network

Linear sketching with parities

Bart M. P. Jansen Jesper Nederlof

Streaming Symmetric Norms via Measure Concentration

Lecture 6: Counting triangles Dynamic graphs & sampling

Lecture 15: Least Square Regression Metric Embeddings

Switching Lemmas and Proof Complexity

Sublinear Algorihms for Big Data

Locality In Distributed Graph Algorithms

More Graphs Lecture 19 CS2110 – Fall 2009.

Presentation transcript:

Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff Max-Planck Princeton IBM Almaden

Turnstile Streaming Model Underlying n-dimensional vector x initialized to 0n Long stream of updates x Ã x + ei or x Ã x - ei for standard unit vector ei At end of the stream, x 2 {-m, -m+1, …, m-1, m}n for some bound m · poly(n) Output an approximation to f(x) whp Goal: use as little space (in bits) as possible

Example: Euclidean Norm Want to output Z with (1-Ɛ) |x|2 · Z · (1+Ɛ) |x|2 Let r = 1/Ɛ2 Choose an r x n matrix A of i.i.d. sign random variables (+1 w.pr. ½, -1 w.pr. ½) Maintain Ax in the stream Output |Ax|2 Proof: Johnson-Lindenstrauss Lemma

Generic Features Algorithm for 2-norm has the following form: Choose a random matrix A independent of x Maintain Ax in the stream Output a function of Ax Question (?!): does the optimal algorithm for approximating any function in the turnstile model have this form? All known algorithms have this form Some functions f(x) may be weird: What is xx1?

Our Result Yes, up to a factor of log n Theorem: for computing a relation f for x in {-m, -m+1, …, m}n in the turnstile model, there is a correct (whp) algorithm which: samples an integer matrix A uniformly from O(n log m) hardwired matrices with poly(n) bounded integer entries, independent of x, outputs a function of Ax Logarithm of the number of states of Ax, for x in {-m, -m+1, …, m}n, plus amount of randomness, is optimal up to a log n factor

Consequences b 2 {0,1}n a 2 {0,1}n Create stream s(b) Create stream s(a) Lower Bound Technique 1. Run Alg on s(a), transmit state of Alg(s(a)) to Bob 2. Bob computes Alg(s(a), s(b)) 3. If Bob solves g(a,b), space complexity of Alg at least the 1-way communication complexity of g

Consequences a 2 {0,1}n Create stream s(a) b 2 {0,1}n Create stream s(b) Our main theorem implies: If players can solve g(a,b), then space of Alg at least the simultaneous communication complexity of g Weaker public-coin model in which Alice and Bob simultaneously send a message to a referee

Non-Uniformity Restriction Careful wording: “samples an integer matrix A uniformly from O(n log m) hardwired matrices, with poly(n) bounded entries, independent of x” Algorithm is non-uniform Output of each state for each A also hardwired Alternatively, allow algorithm to use more space to process a stream update, provided it only retains Ax and its randomness Regenerate A during each stream update

Comment on the Model For each random seed, algorithm is a deterministic automaton with a finite number of states Main theorem only requires correctness for x 2 {-m, -m+1, …, m}n It counts the number of states as x varies in this range While processing the stream, may have |x|1 > m The algorithm can’t abort if this happens. It must still be correct at the end of the stream for x in {-m, -m+1, …, m}n

Related Work Ganguly Specific to heavy hitters problem Holds only for deterministic algorithms

Talk Outline Proof Overview Applications and Open Questions Reduction to path-independent automata From path-independent automata to linear sketches Applications and Open Questions

Stream Automaton for Fixed Randomness … Streaming algorithm only depends on x, not how it got there +en … -en … -e1, +e2 Start … +e1 +e1 0n in two different states +e5 -e1 … …

Path-Independent Automaton Each x 2 Zn in a unique state Undirected connected graph Goal: for each randomness, can we modify the automaton to make it path-independent? Rule out algorithms that e.g., an algorithm that stores the last 5 stream updates

Intuitively makes things path-independent Strategy Intuitively makes things path-independent For stream σ, freq(σ) 2 Zn is “net update” to each coordinate Idea: 1. if in a state s, and update by a stream σ, with freq(σ) = 0, answers ought to be similar 2. collapse all states s, s’ for which s+σ = s’ and freq(σ) = 0 for some stream σ Issue: how to formally define states, transition and output function of new automaton?

Zero-Frequency Graph Directed multi-graph G = (V,E) V = states of old automaton Aold (for fixed randomness) (s,t) 2 E for each stream σ of finite length with s+σ=t and freq(σ) = 0 Terminal equivalence class: strongly connected component with no outgoing edge Walk in G eventually reaches a terminal equivalence class (Walk in G is a long sequence of zero-streams) States of new automaton Anew = terminal equivalence classes

New Transition Function Suppose in terminal equivalence class C Given an update ei Let v 2 C be an arbitrary node Compute v+ei using transition function of Aold Walk from v+ei until reach terminal equivalence class C’ C’ is unique Does not depend on choice of v Only one terminal equivalence class reachable in any walk

Contradiction: zero frequency path w-x-v-y Terminal equivalence class -ei x y u v Contradiction: zero frequency path w-x-v-y +ei +ei freq(σ) = 0 freq(σ’) = 0 Terminal equivalence class Terminal equivalence class w

Output Function of Anew In each terminal equivalence class C, sample node u from stationary distribution from random walk in C (add self-loops) Output of Anew on C = Output of Aold on u If v is starting vertex of Aold, take a random walk in G from v let starting vertex of Anew be terminal equivalence class C reached Why is it correct?

Correctness Let ¦ be an arbitrary distribution on streams ¾ Choose fixed randomness so Aold correct on ¦’: Long sequence of zero frequency streams, Followed by ¾ sampled from ¦, Followed by long sequence of zero frequency streams Output of Anew on ¦ statistically close to output of Aold on ¦’ => for every ¦ there is an Anew correct on ¦ Can show Anew is path-independent (lying a little..)

Path Independence to Linear Sketches M = {x 2 Zn such that x in same state as 0n} States of automaton are cosets of Zn/M Use lattice tools…

Applications and Open Questions Simpler proof of existing lower bounds No communication complexity Many dimension lower bounds known for sketching norms over the reals Matrix norms, etc. Do these give turnstile streaming lower bounds with finite precision?