Estimating L2 Norm MIT Piotr Indyk.

Slides:

Advertisements

Similar presentations

Estimating Distinct Elements, Optimally

Advertisements

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.

An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode, S. Muthukrishnan 2003.

Heavy Hitters Piotr Indyk MIT. Last Few Lectures Recap (last few lectures) –Update a vector x –Maintain a linear sketch –Can compute L p norm of x (in.

Independence of random variables

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

Dimensionality Reduction

Data Stream Mining and Querying

Statistic estimation over data stream Slides modified from Minos Garofalakis ( yahoo! research) and S. Muthukrishnan (Rutgers University)

Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.

Geometric Problems in High Dimensions: Sketching Piotr Indyk.

Estimating Entropy for Data Streams Khanh Do Ba, Dartmouth College Advisor: S. Muthu Muthukrishnan.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.

1 Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

Lectures prepared by: Elchanan Mossel elena Shvets Introduction to probability Stat 134 FAll 2005 Berkeley Follows Jim Pitman’s book: Probability Section.

Data Streams Part 3: Approximate Query Evaluation Reynold Cheng 23 rd July, 2002.

A Review of Some Fundamental Mathematical and Statistical Concepts UnB Mestrado em Ciências Contábeis Prof. Otávio Medeiros, MSc, PhD.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)

Week 121 Law of Large Numbers Toss a coin n times. Suppose X i ’s are Bernoulli random variables with p = ½ and E(X i ) = ½. The proportion of heads is.

Embedding and Sketching Sketching for streaming Alexandr Andoni (MSR)

Data Stream Algorithms Lower Bounds Graham Cormode

Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.

Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.

Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.

Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo

New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,

Join Size The join size is the space required to join two relations.

Information Complexity Lower Bounds

New Characterizations in Turnstile Streams with Applications

Intro to Sampling Methods

Fast Dimension Reduction MMDS 2008

Finding Frequent Items in Data Streams

Streaming & sampling.

Estimating Lp Norms Piotr Indyk MIT.

Sublinear Algorithmic Tools 2

LSI, SVD and Data Management

Counting How Many Elements Computing “Moments”

COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.

Lecture 10: Sketching S3: Nearest Neighbor Search

Lecture 4: CountSketch High Frequencies

Lecture 7: Dynamic sampling Dimension Reduction

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Linear sketching with parities

Properties of Random Numbers

Y. Kotidis, S. Muthukrishnan,

The Curve Merger (Dvir & Widgerson, 2008)

Lecture 2 – Monte Carlo method in finance

Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University

Sublinear Algorihms for Big Data

Linear sketching over

Two-point Sampling Speaker: Chuang-Chieh Lin

Summarizing Data by Statistics

Linear sketching with parities

CSCI B609: “Foundations of Data Science”

Range-Efficient Computation of F0 over Massive Data Streams

3.0 Functions of One Random Variable

Embedding and Sketching

Dimension versus Distortion a.k.a. Euclidean Dimension Reduction

Lecture 6: Counting triangles Dynamic graphs & sampling

Sublinear Algorihms for Big Data

Learning-Based Low-Rank Approximations

(Learned) Frequency Estimation Algorithms

Mathematical Expectation

Presentation transcript:

Estimating L2 Norm MIT Piotr Indyk

Basic Data Stream Model Single pass over the data i1, i2,…,in from 0…m Typically, we assume n, m are known Bounded storage (often logO(1) n) Units of storage: bits, words or „elements” (e.g., points, nodes/edges) Randomness and approximation OK (in fact, almost always necessary) Last lecture: estimating the number of distinct elements, up to 1±ε, with probability 2/3, using space O(log(n)+ 1/ε2) About 8 distinct elements 8 2 1 9 1 9 2 4 6 3 9 4 2 3 4 2 3 8 5 2 5 6 ...

Generalization Vector x: 0 1 ………………………m A stream can be viewed as a sequence of updates (i,a) xi=xi+a (initially x=0) Basic streaming model corresponds to updates (i,1) In general, a could be negative Number of distinct elements  number of non-zero coordinates in x, denoted by ||x||0 Similar algorithms as in the previous lecture work for ||x||0 as well Today: two methods for estimating ||x||2 Alon-Matias-Szegedy (AMS) Johnson-Lindenstrauss Really cute and simple Really powerful, need in future lectures

Why estimate L2 norm ? Database join (on A): Self-join: if Rel1=Rel2 All triples (Rel1.A, Rel1.B, Rel2.B) s.t. Rel1.A=Rel2.A Self-join: if Rel1=Rel2 Size of self-join: ∑val of A Rows(val)2 Updates to the relation increment/decrement Rows(val) Rel1 Rel2 A B Lec2 distinct elements norm Lec3 L2 …. A B Lec2 distinct elements norm Lec3 L2 …. A Rel1.B Rel2.B Lec2 distinct elements norm ….

Algorithm I: AMS

Alon-Matias-Szegedy’96 Choose r1 … rm to be i.i.d. r.v., with Pr[ri=1]=Pr[ri=-1]=1/2 Maintain Z=∑i ri xi under increments/decrements to xi Return Y=Z2 as an estimate for ||x||22 Analysis: Compute the expectation of Y Bound the variance of Y (or the second moment)

E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj] Expectation The expectation of Z2 = (∑i ri xi )2 is equal to E[Z2] = E[∑i,j rixirjxj] = ∑i,j xi x j E[rirj] We have For i≠j, E[rirj] = E[ri] E[rj] =0 – term disappears For i=j, E[rirj] = E[ri2] = E[1] =1 Therefore E[Z2] = ∑i xi2 =||x||22 (unbiased estimator) Now we just need to bound the variance

Bounding the second moment The second moment of Z2 = (∑i ri xi )2 is equal to the expectation of Z4 = (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) (∑i ri xi ) This can be decomposed into a sum of ∑i (ri xi )4 →expectation= ∑i xi 4 6 ∑i<j (ri rj xixj )2 →expectation= 6∑i<j xi2 xj2 Terms involving single multiplier ri xi (e.g., r1x1r2x2r3x3r4x4) →expectation=0 Total: ∑i xi 4 + 6∑i<j xi2 xj2 ≤ 3 (∑i xi 2 )2

Pr[ |Y’ - ∑i xi2 | ≥c (3/k)1/2 ∑i xi2 ] ≤ 1/c2 Analysis, ctd. We have an estimator Y=Z2 E[Y] = ∑i xi2 σ2 =Var[Y] ≤ 3 (∑i xi 2 )2 Chebyshev inequality : Pr[ |E[Y]-Y| ≥ cσ ] ≤ 1/c2 Algorithm AMS+: Maintain Z1 … Zk (and thus Y1 … Yk ), define Y’ = ∑i Yi /k E[Y’] = k ∑i xi2 /k = ∑i x i2 σ’2 = Var[Y’] ≤ 3k(∑i xi 2 )2 /k2 = 3 (∑i xi 2 )2 /k Guarantee: Pr[ |Y’ - ∑i xi2 | ≥c (3/k)1/2 ∑i xi2 ] ≤ 1/c2 Setting c to a constant and k=O(1/ε2) gives (1 ε)-approximation with const. probability Total space: O( log (nm) / ε2 ) bits (not counting ri’s)

Comments Only needed 4-wise indepence of r0…rm What we did: Can generate such vars from O(log m) random bits (previous lecture) What we did: Maintain a “linear sketch” vector Z=[Z1...Zk] = R x Estimator for ||x||22 : (Z12 +... + Zk2)/k = ||Rx||22 /k “Dimensionality reduction”: x→ Rx … but the tail somewhat “heavy” Reason: only used second moment of the estimator

Algorithm II: Dim. Reduction (JL)

Interlude: Normal Distribution Normal distribution N(0,1): Range: (-∞, ∞) Density: f(x)=e-x^2/2 / (2π)1/2 Mean=0, Variance=1 Basic facts: If X and Y independent r.v. with normal distribution, then X+Y has normal distribution Var(cX)=c2 Var(X) If X,Y independent, then Var(X+Y)=Var(X)+Var(Y)

A different linear sketch Instead of 1, let ri be i.i.d. random variables from N(0,1) Consider Z=∑i ri xi We still have that E[Z2] = ∑i xi2 =||x||22, since: E[ri] E[rj] = 0 E[ri2] = variance of ri , i.e., 1 As before we maintain Z=[Z1 … Zk ] and define Y = ||Z||22= ∑j Zj2 (so that E[Y]=k||x||22 ) We show that there exists C>0 s.t. for small enough ε>0 Pr[ | Y - k||x||22 |> εk||x||22] ≤ exp(-C ε2 k) Set k=O(1/ε2 log(1/δ)) to get 1±ε approx. with prob. 1-δ

Proof See the attached notes, by Ben Rossman and Michel Goemans

JL - comments Can use k-wise independence to generate ri’s, but this is much more messy than for AMS Time to compute the sketch vector Z from x is O(dk) Good if k is small, not so great if k is large Fast JL, Sparse JL to the rescue (a few lectures from now)