Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Characterizations in Turnstile Streams with Applications

Similar presentations


Presentation on theme: "New Characterizations in Turnstile Streams with Applications"β€” Presentation transcript:

1 New Characterizations in Turnstile Streams with Applications
Yuqing Ai Tsinghua University Wei Hu Tsinghua University Yi Li Facebook David Woodruff IBM Almaden

2 Turnstile Streaming Model
Underlying 𝑛-dimensional vector π‘₯ initialized to 0 Stream of updates π‘₯←π‘₯+ 𝑒 𝑖 or π‘₯←π‘₯βˆ’ 𝑒 𝑖 for standard unit vector 𝑒𝑖 At end of the stream, π‘₯∈{βˆ’π‘š, …, βˆ’1, 0, 1,…, π‘š}𝑛 Output an approximation to 𝑓(π‘₯) w.h.p. Goal: use as small space in bits as possible

3 Example: Estimating the β„“ 2 -norm
Output 𝑍 with 1βˆ’πœ– π‘₯ 2 ≀𝑍≀ 1+πœ– π‘₯ 2 Algorithm: Let π‘Ÿ=1/ πœ– 2 Choose an π‘ŸΓ—π‘› matrix 𝐴 of i.i.d. sign random variables (+1 w.p. 1/2, βˆ’1 w.p. 1/2) Maintain 𝐴π‘₯ in the stream Output 𝐴π‘₯ π‘Ÿ

4 Generic Form All known algorithms have the following generic form (linear sketch): Sample a random matrix 𝐴 Maintain 𝐴π‘₯ in the stream Output a function of 𝐴π‘₯ Question (?!): does the optimal algorithm for approximating any function in the turnstile model have this form?

5 The LNW Reduction Yes! [Li, Nguyα»…n, Woodruff’14]
Theorem: for computing a function 𝑓 of π‘₯ in βˆ’π‘š, …, π‘š 𝑛 in the turnstile model, there is a randomized algorithm which samples a matrix 𝐴 and a vector π‘ž uniformly from 𝑂(𝑛 log π‘š ) instances maintains (𝐴π‘₯ mod π‘ž) in the stream outputs a function of (𝐴π‘₯ mod π‘ž) Space complexity is optimal up to a constant factor (not including the 𝑂( log 𝑛 + log log π‘š ) bits for randomness)

6 Consequence Input π‘₯ Input 𝑦 Create stream 𝑠(π‘₯) Create stream 𝑠(𝑦)
Lower Bound Technique Streaming algorithm π’œ Run π’œ on 𝑠(π‘₯), send state of π’œ(𝑠(π‘₯)) to Bob Bob computes π’œ(𝑠(π‘₯), 𝑠(𝑦)) If Bob solves 𝑔(π‘₯,𝑦), space complexity of π’œ at least the 1-way communication complexity of 𝑔

7 Consequence Input π‘₯ Input 𝑦 Create stream 𝑠(π‘₯) Create stream 𝑠(𝑦)
The LNW reduction implies If players can solve 𝑔(π‘₯,𝑦), then space of π’œ at least the simultaneous communication complexity of 𝑔 Weaker model in which Alice and Bob simultaneously send a message to a referee who outputs the answer

8 Our Result Strengthen the LNW reduction from several aspects:
Remove the β€œbox constraint” Generalize to the strict turnstile model Extend to multi-pass algorithms Obtain new tight lower bounds

9 Strengthen the LNW Reduction
Remove the β€œbox constraint” Generalize to the strict turnstile model Extend to multi-pass algorithms

10 The β€œBox Constraint” The LNW reduction requires the algorithm to be correct as long as π‘₯∈ βˆ’π‘š, …, π‘š 𝑛 at the end of the stream. While processing the stream, may have π‘₯ ∞ β‰«π‘š The algorithm is not allowed to abort if this happens. It must still be correct at the end of the stream as long as π‘₯∈ βˆ’π‘š, …, π‘š 𝑛 . More natural requirement: the algorithm only needs to be correct when π‘₯ belongs to βˆ’π‘š, …, π‘š 𝑛 at all time in the stream.

11 Stream Automaton … + 𝑒 𝑛 … βˆ’ 𝑒 𝑛 … βˆ’ 𝑒 1 , + 𝑒 2 … + 𝑒 1 + 𝑒 1 + 𝑒 5
Start … + 𝑒 1 + 𝑒 1 + 𝑒 5 βˆ’ 𝑒 1 … …

12 Path-Independent Automaton
Every π‘₯∈ β„€ 𝑛 in a unique state

13 Path-Independent Automaton
+ 𝑒 𝑛 … βˆ’ 𝑒 𝑛 … βˆ’ 𝑒 1 , + 𝑒 2 Start … + 𝑒 1 + 𝑒 1 0 in two different states + 𝑒 5 βˆ’ 𝑒 1 … …

14 Path-Independent Automaton
Every π‘₯∈ β„€ 𝑛 in a unique state Equivalent to 𝐴π‘₯ mod π‘ž

15 Zero-Frequency Graph For stream 𝜎, let freq 𝜎 ∈ β„€ 𝑛 be the β€œnet update” to all coordinates. Zero-freq graph: directed graph 𝐺=(𝑉, 𝐸) 𝑉 = states of the automaton 𝑒, 𝑣 ∈𝐸 if there exists stream 𝜎 such that π‘’βŠ•πœŽ =𝑣 and freq 𝜎 = 0 Terminal equivalence class: strongly connected component in 𝐺 with no outgoing edge Walk in G is a sequence of zero-frequency streams

16 The LNW Reduction 𝐺: zero-frequency graph of π’œ old
States of new automaton π’œ new = terminal equivalence classes in 𝐺 For a terminal equivalence class 𝐢 and an update 𝑒 𝑖 , define transition as: Let π‘£βˆˆπΆ be an arbitrary node Compute π‘£βŠ• 𝑒 𝑖 using transition function of π’œ old Walk from π‘£βŠ• 𝑒 𝑖 in 𝐺 until reach a terminal equivalence class 𝐢′ 𝐢′ is unique Does not depend on 𝑣 or the walk

17 Terminal equivalence class 𝐢
𝑣 𝑒𝑖 freq(𝜎) = 0 Terminal equivalence class 𝐢′

18 The Box Constraint For a stream 𝜎, define
| 𝜎| max = max prefix πœ” of 𝜎 freq πœ” ∞ 𝜏 1 , 𝜏 2 , … are zero-frequency streams (walks in 𝐺) Length of 𝜏 𝑖 could be very large When | 𝜎| max β‰€π‘š, | πœŽβ€²| max could be very large 𝜎=( 𝜎 1 , 𝜎 2 , …, 𝜎 π‘˜ ) on π’œ new πœŽβ€²=(… ,𝜎 1 ,…, 𝜎 2 , …, 𝜎 π‘˜ , …) on π’œ old 𝜏 1 𝜏 2 𝜏 3 𝜏 4 𝜏 5 𝜏 6 …

19 Zero-Freq Stream Length
𝐿: upper bound on the lengths of 𝜏 𝑖 ’s | 𝜎| max β‰€π‘š ⟹| πœŽβ€²| max β‰€π‘š+𝐿/2 Want πΏβ‰€π‘š Let s = # states in π’œ old Lemma: if there is a zero-freq stream from 𝑒 to 𝑣, then there exists such a stream with length at most poly 𝑛𝑠 β‹… 𝑠 𝑛 +1 𝑛 𝐿≀poly 𝑛𝑠 β‹… 𝑠 𝑛 +1 𝑛

20 Tightness of Our Bound 𝐿≀poly 𝑛𝑠 β‹… 𝑠 𝑛 +1 𝑛 Lower bound: 𝐿β‰₯ 𝑠 𝑛 Ξ©(𝑛)

21 Removing the Box Constraint
Want πΏβ‰€π‘š 𝐿≀poly 𝑛𝑠 β‹… 𝑠 𝑛 +1 𝑛 ≀ 𝑠 𝑐𝑛 πΏβ‰€π‘š ⟸ 𝑠 𝑐𝑛 β‰€π‘š ⟸ log 𝑠 ≀ log π‘š 𝑐𝑛 Space of π’œ old

22 Application: Counting
𝑛=1 Problem: output |π‘₯| up to additive error π‘š/4, while π‘₯ varies in {βˆ’π‘š, …, π‘š} 𝑂( log π‘š ) space algorithm Is there an Ξ©( log π‘š ) lower bound? For insertion streams, no: approximate counting For relative error, yes: but proof doesn’t apply For additive error… yes!

23 Application: Counting
Condition for removing box constraint: space ≀ log π‘š 𝑐𝑛 = log π‘š 𝑐 Assume space ≀ log π‘š 𝑐 , otherwise done 𝐴π‘₯ mod π‘ž=( π‘Ž 1 π‘₯ mod π‘ž 1 , π‘Ž 2 π‘₯ mod π‘ž 2 , …, π‘Ž π‘Ÿ π‘₯ mod π‘ž π‘Ÿ ) Show lcm π‘ž 1 , …, π‘ž π‘Ÿ =Ξ©(π‘š) Cannot distinguish π‘₯, π‘₯+lcm, π‘₯+2β‹…lcm, … Ξ©(π‘š) different states, Ξ©( log π‘š ) space

24 Application: Norm Estimation
Problem: for π‘₯∈ βˆ’π‘š, …, π‘š 𝑛 , output π‘₯ 𝑝 up to additive error 𝑛 1/𝑝 π‘š Ξ©( log π‘š ) space lower bound 𝑂( log π‘š + log log 𝑛 ) space algorithm (1≀𝑝≀2) [KNW’10] Lower bound tight when log log 𝑛 =𝑂 log π‘š ⟺ 𝑛 ≀ exp poly(π‘š)

25 Strengthen the LNW Reduction
Remove the β€œbox constraint” Generalize to the strict turnstile model Extend to multi-pass algorithms

26 The Strict Turnstile Model
The strict turnstile model: no negative coordinates, i.e., π‘₯ 𝑖 β‰₯0 at all times in the stream Dynamic graph streams: insertions and deletions of edges Allow multi-graphs, but no negative edges Generalize the LNW reduction to the strict turnstile model 𝐿: upper bound on the length of zero-freq streams Initialize all coordinates of π‘₯ to be 𝐿 Now the reduction guarantees π‘₯ is always nonnegative Subtract 𝐿 from all coordinates at the end of the stream

27 Application: Maximum Matching
[AKLY’16]: For outputting an 𝑛 πœ– -approximate maximum matching, space is Θ ( 𝑛 2βˆ’3πœ– ) Lower bound only in simultaneous communication model Can apply our reduction

28 Strengthen the LNW Reduction
Remove the β€œbox constraint” Generalize to the strict turnstile model Extend to multi-pass algorithms

29 Multi-Pass Algorithms
𝑝-pass automaton After 𝑖-th pass (𝑖<𝑝), output an automaton π’œ 𝑖+1 Run π’œ 𝑖+1 on input stream in (𝑖+1)-st pass After 𝑝-th pass, output answer Theorem: There is a 𝑝-pass automaton for which each automaton in each pass is path-independent Space is optimal up to a constant factor

30 Conclusions New progress on characterizing turnstile streaming algorithms as linear sketches Applications Optimal lower bounds for counting with additive error, maximum matching in dynamic graph Open questions Box constraint After removing box constraint, still have very long streams Better reduction? Thank you!


Download ppt "New Characterizations in Turnstile Streams with Applications"

Similar presentations


Ads by Google