Uncertainty Principles, Extractors, and Explicit Embeddings of L 2 into L 1 Piotr Indyk MIT.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Quantum t-designs: t-wise independence in the quantum world Andris Ambainis, Joseph Emerson IQC, University of Waterloo.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Chapter 10 Shannon’s Theorem. Shannon’s Theorems First theorem:H(S) ≤ L n (S n )/n < H(S) + 1/n where L n is the length of a certain code. Second theorem:
Expander codes and pseudorandom subspaces of R n James R. Lee University of Washington [joint with Venkatesan Guruswami (Washington) and Alexander Razborov.
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Symmetric Matrices and Quadratic Forms
Hard Metrics from Cayley Graphs Yuri Rabinovich Haifa University Joint work with Ilan Newman.
Chapter 6 Eigenvalues.
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.
Linear Equations in Linear Algebra
Lecture II-2: Probability Review
Matrices CS485/685 Computer Vision Dr. George Bebis.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Boyce/DiPrima 9th ed, Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues Elementary Differential Equations and Boundary Value Problems,
GROUPS & THEIR REPRESENTATIONS: a card shuffling approach Wayne Lawton Department of Mathematics National University of Singapore S ,
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
 Row and Reduced Row Echelon  Elementary Matrices.
Diophantine Approximation and Basis Reduction
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Entropy-based Bounds on Dimension Reduction in L 1 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AAAA A Oded Regev.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Additive Data Perturbation: data reconstruction attacks.
Vectors CHAPTER 7. Ch7_2 Contents  7.1 Vectors in 2-Space 7.1 Vectors in 2-Space  7.2 Vectors in 3-Space 7.2 Vectors in 3-Space  7.3 Dot Product 7.3.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Matrices CHAPTER 8.1 ~ 8.8. Ch _2 Contents  8.1 Matrix Algebra 8.1 Matrix Algebra  8.2 Systems of Linear Algebra Equations 8.2 Systems of Linear.
Computing Eigen Information for Small Matrices The eigen equation can be rearranged as follows: Ax = x  Ax = I n x  Ax - I n x = 0  (A - I n )x = 0.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Lecture 8 Matrix Inverse and LU Decomposition
Chapter 3 Determinants Linear Algebra. Ch03_2 3.1 Introduction to Determinants Definition The determinant of a 2  2 matrix A is denoted |A| and is given.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Data in Motion Michael Hoffman (Leicester) S Muthukrishnan (Google) Rajeev Raman (Leicester)
Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Matrices CHAPTER 8.9 ~ Ch _2 Contents  8.9 Power of Matrices 8.9 Power of Matrices  8.10 Orthogonal Matrices 8.10 Orthogonal Matrices 
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Sparse RecoveryAlgorithmResults  Original signal x = x k + u, where x k has k large coefficients and u is noise.  Acquire measurements Ax = y. If |x|=n,
Random Access Codes and a Hypercontractive Inequality for
Elementary Linear Algebra Anton & Rorres, 9th Edition
Estimating L2 Norm MIT Piotr Indyk.
Matrices and Vectors Review Objective
Systems of First Order Linear Equations
LSI, SVD and Data Management
Additive Data Perturbation: data reconstruction attacks
Lecture 10: Sketching S3: Nearest Neighbor Search
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
GROUPS & THEIR REPRESENTATIONS: a card shuffling approach
Y. Kotidis, S. Muthukrishnan,
Probabilistic existence of regular combinatorial objects
Singular Value Decomposition SVD
Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Lecture 15: Least Square Regression Metric Embeddings
President’s Day Lecture: Advanced Nearest Neighbor Search
Subject :- Applied Mathematics
Presentation transcript:

Uncertainty Principles, Extractors, and Explicit Embeddings of L 2 into L 1 Piotr Indyk MIT

Uncertainty principles (UP) Consider a vector x  R n and a Fourier matrix F UP: either x or Fx must have “many” non- zero entries (for x≠0 ) History: –Physics: Heisenberg principle –Signal processing [Donoho-Stark’89]: Consider any 2n  n matrix A=[I B] T such that –B is orthonormal –For any distinct rows A i, A j of A we have |A i * A j |  M Then for any x  R n we have that ||x|| 0 + ||Bx|| 0 >1/M –E.g., if A=[I H] T, where H is a normalized n  n Hadamard matrix (orthogonal, entries in {-1/n 1/2, 1/n 1/2 }): M=1/n 1/2 Ax must have >n 1/2 non-zero entries F x Fx

Extractors Expander-like graphs: –G=(A,B,E), |A|=a, |B|=b –Left degree d Property: –Consider any distribution P=(p 1,…,p a ) on A s.t. p i  1/k –G(P) : a distribution on B: Pick i from P Pick t uniformly from [d] j is the t-th neighbor of i –Then ||G(P)-Uniform(B)|| 1  ε Equivalently, can require the above for p i =1/k Many explicit constructions Holy grail: –k=b –d=O(log a) Observation: w.l.o.g. one can assume that the right degree is O(ad/b) A B

(Norm) embeddings Metric spaces M=(X,D), M’=(X’,D’) (here, X=R n, X’=R m, D=||.|| X and D=||.|| X’ ) A mapping F: M →M’ is a c-embedding if for any p  X, q  X we have D(p,q)  D’(F(p),F(q))  c D(p,q) (or, ||p-q|| X  ||F(p-q)|| X’  c ||p-q|| X ) History: –Mathematics: [Dvoretzky’59]: there exists m(n,ε) s.t., for any m>m(n,ε) and any space M’=(R m,||.|| X’ ) there exists a (1+ε)-embedding of an n-dimensional Euclidean space l 2 n into M’ In general, m must be exponential in n [Milman’71]: probabilistic proof …….. [Figiel-Lindenstrauss-Milman’77, Gordon]: if M’=l 1 m, then m  n/ε 2 suffices That is, l 2 n O(1)-embeds into l 1 O(n) A.k.a. Dvoretzky’s theorem for l 1 –Computer science: [Linial-London-Rabinovich’94]: [Bourgain’85] for sparsest cut, many other tools ……. [Dvoretzky, FLM] used for approximate nearest neighbor [IM’98, KOR’98], hardness of lattice problems [Regev-Rosen’06], etc.

Recap Uncertainty principles Extractors (Norm) embeddings

Dvoretzky theorem, ctd. Since [Milman’71], almost all proofs in geometric functional analysis use probabilistic method –In particular, the [FLM] embedding of l 2 n into l 1 O(n) uses random linear mapping A: R n  R O(n) Question: can one construct such embeddings explicitly ? [Milman’00,Johnson-Schechtman’01]

Embedding l 2 n into l 1 DistortionDim. of l 1 TypeRef 1+σO(n/σ 2 )Probabilistic[FLM,Gordon] O(1)O(n 2 )Explicit[Rudin’60,LLR] 1+1/nn O(log n) Explicit[Indyk’00] 1+1/log nn2 O(log log n) 2 ExplicitThis talk

Other implications The embedding takes time O(n 1+o(1) ), as opposed to O(n 2 ) [FLM] Similar phenomenon discovered for Johnson-Lindenstrauss dimensionality reduction lemma [Ailon-Chazelle’06], – Applications to approximate nearest neighbor problem, Singular Value Decomposition, etc –More on this later

Embedding of l 2 n into l 1 : overview How does [FLM] work ? –Choose a “random” matrix A: R n →R m, m=O(n) E.g., the entries A ij are i.i.d. random variables with normal distribution (mean=0, variance=1/m) –Then, for any unit vector x, (Ax) i is distributed according to N(0,1/m) –With prob. 1-exp(-m), constant fraction of such variables is Θ(1/m 1/2 ). I.e., Ax = (  1/m 1/2,..,  1/m 1/2,..,.., …  1/m 1/2 ) –This implies ||Ax|| 1 = Ω( m 1/2 ||x|| 2 ) –Similar arguments give ||Ax|| 1 =O( m 1/2 ||x|| 2 ) –Can extend to the whole R n

Overview, ctd. We would like to obtain something like Ax = (  1/m 1/2,..,  1/m 1/2, …, …,  1/m 1/2 ) x UP: n 1/2 non-zero entries > ≈1/n 1/ Extractor …… * constant * d=log O(1) n Repeat loglog n times Total blowup: (log n) O(log logn)

Lemma: –Let A=[ H 1 H 2 … H L ] T, such that: Each H i is an n  n orthonormal matrix For any two distinct rows A i, A j we have |A i *A j |  M M is called coherence –Then, for any x  R n, and set S of coordinates, |S|=s: ||(Ax) |S || 2 2  1+Ms (note that || (Ax)|| 2 2 = L ) Proof: –Take A S –max ||x||=1 ||A S x|| 2 2 = λ(A S  A S T ) –But A S  A S = I+E, |E ij |  M –Since E is an s  s matrix, λ(E)  Ms Suppose that we have A s.t. M  1/n 1/2. Then: –For any x  R n, |S|  n 1/2, we have || (Ax) |S || 2 2  2 –At the same time, || (Ax)|| 2 2 = L –Therefore, (1-2/L) fraction of the “mass” ||Ax|| 2 2 is contained in coordinates i s.t. (Ax) i 2  1/n 1/ ≈1/n 1/ Big 0 S Part I:

Let y=(y 1, …, y n’ ) Define probability distribution P=(y 1 2 /||y|| 2 2, …, y n 2 /||y|| 2 2 ) Extractor properties imply that, for “most” buckets B i, we have ||G(y) |Bi || 2 2  ||G(y)|| 2 2 / #buckets After log log n steps, “most” entries will be around 1/n 1/2 ≈1/n 1/ y G(y) B1B1 B2B2 B3B3 Part II:

Incoherent dictionaries Can we construct A=[ H 1 H 2 … H L ] T with coherence 1/n 1/2 ? –For L=2, take A=[I H] T –Turns out A exists for L up to n/2+1 H i = H  D i, D i has  1 on the diagonal and 0’s elsewhere [Calderbank-Cameron-Cantor-Seidel] for more (Kerdock codes)

Digression Johnson-Lindenstrauss’84: –Take a “random” matrix A: R n →R m/ε 2 (m <<n) –For any ε>0, x  R n we have ||Ax|| 2 = (1  ε)||x|| 2 with probability 1-exp(m) –Ax can be computed in O(mn/ε 2 ) time Ailon-Chazelle’06: –Essentially: take B= A  P  (H  D i ), where H : Hadamard matrix D i : random  1 diagonal matrix P : projection on m 2 coordinates A as above (but n replaced by m/ε 2 ) –Ax can be computed O(nlog n + m 3 /ε 2 )

Conclusions Extractors+UP  Embedding of l 2 n into l 1 –Dimension almost as good as for the probabilistic result –Near-linear in n embedding time Questions: –Remove 2 O(log log n) 2 ? –Making other embeddings explicit ? –Any particular reason why both [AC’06] and this paper use H  D i matrices ?