1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.

Slides:



Advertisements
Similar presentations
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Advertisements

Unconditional Weak derandomization of weak algorithms Explicit versions of Yao s lemma Ronen Shaltiel, University of Haifa :
Invertible Zero-Error Dispersers and Defective Memory with Stuck-At Errors Ariel Gabizon Ronen Shaltiel.
An Introduction to Randomness Extractors Ronen Shaltiel University of Haifa Daddy, how do computers get random bits?
Linear-Degree Extractors and the Inapproximability of Max Clique and Chromatic Number David Zuckerman University of Texas at Austin.
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Randomness Extractors: Motivation, Applications and Constructions Ronen Shaltiel University of Haifa.
Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.
How to get more mileage from randomness extractors Ronen Shaltiel University of Haifa.
Deterministic extractors for bit- fixing sources by obtaining an independent seed Ariel Gabizon Ran Raz Ronen Shaltiel Seedless.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Randomness Extraction and Privacy Amplification with quantum eavesdroppers Thomas Vidick UC Berkeley Based on joint work with Christopher Portmann, Anindya.
Simple extractors for all min- entropies and a new pseudo- random generator Ronen Shaltiel Chris Umans.
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
Deterministic Amplification of Space-Bounded Probabilistic Algorithms Ziv Bar-Yossef Oded Goldreich U.C. Berkeley Weizmann Institute U.C. Berkeley Weizmann.
Derandomized parallel repetition theorems for free games Ronen Shaltiel, University of Haifa.
Randomness Extractors: Motivation, Applications and Constructions Ronen Shaltiel University of Haifa.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Yi Wu (CMU) Joint work with Parikshit Gopalan (MSR SVC) Ryan O’Donnell (CMU) David Zuckerman (UT Austin) Pseudorandom Generators for Halfspaces TexPoint.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
ACT1 Slides by Vera Asodi & Tomer Naveh. Updated by : Avi Ben-Aroya & Alon Brook Adapted from Oded Goldreich’s course lecture notes by Sergey Benditkis,
The max flow problem
CS151 Complexity Theory Lecture 11 May 4, CS151 Lecture 112 Outline Extractors Trevisan’s extractor RL and undirected STCONN.
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
1 Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden.
1 The Complexity of Massive Data Set Computations Ziv Bar-Yossef Computer Science Division U.C. Berkeley Ph.D. Dissertation Talk May 6, 2002.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
1 Constructing Pseudo-Random Permutations with a Prescribed Structure Moni Naor Weizmann Institute Omer Reingold AT&T Research.
1 A Universal Task: Finding Hay in a Haystack Given circuit C: {0,1} n →{0,1} with μ( C) ≥ ½, find x so that C(x)=1. Want: Algorithm polynomial in n and.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
1 A New Interactive Hashing Theorem Iftach Haitner and Omer Reingold WEIZMANN INSTITUTE OF SCIENCE.
On the Complexity of Approximating the VC Dimension Chris Umans, Microsoft Research joint work with Elchanan Mossel, Microsoft Research June 2001.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 22, 2005
1 On the Power of the Randomized Iterate Iftach Haitner, Danny Harnik, Omer Reingold.
Extractors with Weak Random Seeds Ran Raz Weizmann Institute.
Simulating independence: new constructions of Condensers, Ramsey Graphs, Dispersers and Extractors Boaz Barak Guy Kindler Ronen Shaltiel Benny Sudakov.
Extractors against classical and quantum adversaries AmnonTa-Shma Tel-Aviv University.
Why Extractors? … Extractors, and the closely related “Dispersers”, exhibit some of the most “random-like” properties of explicitly constructed combinatorial.
Zeev Dvir Weizmann Institute of Science Amir Shpilka Technion Locally decodable codes with 2 queries and polynomial identity testing for depth 3 circuits.
NEAREST NEIGHBORS ALGORITHM Lecturer: Yishay Mansour Presentation: Adi Haviv and Guy Lev 1.
Extractors: applications and constructions Avi Wigderson IAS, Princeton Randomness Seeded.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Extractors: applications and constructions Avi Wigderson IAS, Princeton Randomness.
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
Calculating frequency moments of Data Stream
International Graduate School of Dynamic Intelligent Systems, University of Paderborn Fighting Against Two Adversaries: Page Migration in Dynamic Networks.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Error-Correcting Codes and Pseudorandom Projections Luca Trevisan U.C. Berkeley.
Clustering Data Streams A presentation by George Toderici.
Almost SL=L, and Near-Perfect Derandomization Oded Goldreich The Weizmann Institute Avi Wigderson IAS, Princeton Hebrew University.
Umans Complexity Theory Lecturess Lecture 11: Randomness Extractors.
Pseudorandomness: New Results and Applications Emanuele Viola IAS April 2007.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Sublinear-Time Error-Correction and Error-Detection
Streaming & sampling.
Pseudorandomness when the odds are against you
When are Fuzzy Extractors Possible?
The Curve Merger (Dvir & Widgerson, 2008)
When are Fuzzy Extractors Possible?
Neuro-RAM Unit in Spiking Neural Networks with Applications
Indistinguishability by adaptive procedures with advice, and lower bounds on hardness amplification proofs Aryeh Grinberg, U. Haifa Ronen.
The Zig-Zag Product and Expansion Close to the Degree
Compact routing schemes with improved stretch
Clustering.
On Derandomizing Algorithms that Err Extremely Rarely
Presentation transcript:

1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen Shaltiel Weizmann Institute of Science Luca Trevisan U.C. Berkeley

2 The Streaming Model [HRR98, AMS96, FKSV99] x1x1 x2x2 x3x3 xnxn Streaming Algorithm memory y1y1 y2y2 y3y3 ymym Input stream Output stream One-way (“online”) access to the input Sub-linear space As usual, one-way output

3 Algorithmic Motivation Computations over massive data sets Database –One pass algorithms for large database relations [ AMS96, GM98] Networking –Processing IP packets at ISP routers [FKSV99, FS00, Indyk00] Information Retrieval –Processing search engine query logs [CCF02]

4 Complexity Theoretic Motivation Randomized algorithms: one-way access to random bits Space-bounded randomized algorithms: “streaming algorithms” w.r.t. random inputs De-randomization procedures for space-bounded computations: streaming algorithms as “adversaries” sometimes need a “streaming” implementation themselves

5 Combinatorial Objects De-randomization primitives Potential building blocks in de-randomization of space-bounded computations (e.g., RL/BPL) Streaming implementation may be needed then Extractors Universal Hash Functions Dispersers Error-Correcting Codes Special case Leftover hash lemmas [HILL98,IZ89] [M98] [TZ01] [T99,TZS01,SU01]

6 Dispersers and Extractors [Sipser 88, NZ96] Weak random source x  {0,1} n Random-like output y  {0,1} m Short random seed r  {0,1} d E Definition Disperser / Extractor  distributions X on {0,1} n containing k bits of randomness, Disperser:  large enough S  {0,1} m, Pr (E(X,U d )  S) > 0 Extractor: E(X,U d ) is close to uniform Every extractor is a disperser

7 Online Dispersers & Extractors Two types of streaming algorithms for E: All-seed: 1-way input: x 1-way output: E(x,r),  r  {0,1} d Single-seed: Two separate 1-way inputs: x, r 1-way output: E(x,r) Any all-seed streaming algorithm for a disperser requires  (m) space. Matching construction of online “weak extractors”. Theorem (limitations of deterministic amplification in logspace) [BGW99]

8 Online Dispersers & Extractors: Our Results Theorem 1 Any single-seed streaming algorithm for an extractor requires  (m) space. Matching constructions of several online extractors. Theorem 2 A construction of a disperser that admits a single- seed streaming algorithm with poly-log(m) space. Surprising separation of extractors and dispersers that are otherwise similar in behavior

9 Universal Hash Functions [CW79] Definition (  -almost) universal hash functions A family H = { h: {0,1} n  {0,1} m }, s.t. universal:  x  x’, Pr h (h(x) = h(x’))  1/2 m  -almost:  x  x’, Pr h (h(x) = h(x’))   Lemma [Leftover hash lemmas [HILL98,IZ89] ] For appropriately chosen parameters, H is an (  -almost) universal family of hash functions E(x,h) = h(x), for h  H, is a “strong” extractor

10 Online Universal Hash Functions Streaming algorithm for H: Two separate 1-way inputs: h, x 1-way output: h(x) Theorem 3 (corollary from our Theorem 1) Any streaming algorithm for an  -almost universal family of hash functions requires  (m) space. Theorem [MNT93,BTY94] Any streaming algorithm for a universal family of hash functions requires  (m) space.

11 Online Error-Correcting Codes Definition ECC C:{0,1} k  {0,1} n s.t.  w  w’  {0,1} k, |C(w)–C(w’)|  d d – minimum distance k/n – rate Encoding streaming algorithm: 1-way input: w  {0,1} k 1-way output: C(w) Decoding streaming algorithm: 1-way input: r  {0,1} n 1-way output: w for which |C(w) – r| is minimum

12 Online ECC Lower Bounds Theorem 4 Both encoding and decoding streaming algorithms for any code require  (d · k/n) space. (minimum distance times rate) Matching constructions using simple constant-rate “block codes”.

13 The Extractor Lower Bound Theorem For k  n/2, any single-seed streaming algorithm for an extractor requires  (m) space. Notation: E: {0,1} n x {0,1} d  {0,1} m : an extractor A: a single-seed streaming algorithm for E S: space used by A Goal: show that S  m – d

14 Intuition of the Proof Two extreme input distributions: X 1 : X is uniform on the last k bits of the input and is otherwise fixed X 2 : X is uniform on the first k bits of the input and is otherwise fixed X 1,X 2 contain k bits of randomness, implying E(X 1,U d ),E(X 2,U d ) are close to uniform. Divide the execution of A into two phases: Phase 1: A reads the first n – k input bits Phase 2: A reads the last k input bits

15 Intuition of the Proof (cont.) Case 1: A outputs in Phase 1 at least one bit that depends on the source X  If X = X 1 (that is, its first n – k bits are fixed), then this output bit is fixed, implying E(X,U d ) is far from uniform. Case 2: A does not output in Phase 1 any bits that depend on the source X  If X = X 2 (that is, its first k bits are uniform) then after Phase 1, A has  S bits of X’s randomness in memory. Therefore, m  S + d.

16 The Disperser Construction Theorem For any t, there is a disperser with seed length t · poly-log(n) and a single-seed streaming algorithm that runs in m/t + O(log n) space. By choosing t = m/poly-log(m), we obtain an online disperser with poly-log(m) space. Theorem exhibits a tradeoff between space and seed length.

17 Outline of the Proof A t-partition of the input: 1 = i 0 < i 1 < … < i t = n + 1 Assume X is a “bit-fixing” source: uniform on S  {0,1} n of size k and fixed otherwise S Input: i0i0 i1i1 i2i2 i t-1 itit · · · A good t-partition of X:  j, |S ∩ [i j-1,i j )| = k/t

18 ExtEE Outline of the Proof (cont.) Ext Input:Seed: Choose a partition: Extractor seeds Use part of the seed to choose a random partition. With probability > 0, the partition is good. Use the optimal online extractor to extract randomness from each block Extraction in each block: m/t output bits  m/t space

19 Open Problems Lower bound for online dispersers. –Is the tradeoff between seed length and space inherent? Generalize the streaming lower bounds to arbitrary time-space tradeoffs

20 Thank You!