Lecture 15: Least Square Regression Metric Embeddings

Slides:



Advertisements
Similar presentations
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Advertisements

Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Property Testing of Data Dimensionality Robert Krauthgamer ICSI and UC Berkeley Joint work with Ori Sasson (Hebrew U.)
Shortest Vector In A Lattice is NP-Hard to approximate
3D Geometry for Computer Graphics
Embedding the Ulam metric into ℓ 1 (Ενκρεβάτωση του μετρικού χώρου Ulam στον ℓ 1 ) Για το μάθημα “Advanced Data Structures” Αντώνης Αχιλλέως.
Algorithmic High-Dimensional Geometry 1 Alex Andoni (Microsoft Research SVC)
Overcoming the L 1 Non- Embeddability Barrier Robert Krauthgamer (Weizmann Institute) Joint work with Alexandr Andoni and Piotr Indyk (MIT)
Searching on Multi-Dimensional Data
Metric Embeddings with Relaxed Guarantees Hubert Chan Joint work with Kedar Dhamdhere, Anupam Gupta, Jon Kleinberg, Aleksandrs Slivkins.
Cse 521: design and analysis of algorithms Time & place T, Th pm in CSE 203 People Prof: James Lee TA: Thach Nguyen Book.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT)
Computer Graphics Recitation 5.
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 I THE NATURAL PSEUDODISTANCE:
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Efficient Nearest-Neighbor Search in Large Sets of Protein Conformations Fabian Schwarzer Itay Lotan.
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Inst. / Columbia) Robert Krauthgamer (Weizmann Inst.) Ilya Razenshteyn (MIT, now.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Embedding and Sketching Non-normed spaces Alexandr Andoni (MSR)
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Equidistant Codes in the Grassmannian Netanel Raviv Equidistant Codes in the Grassmannian Netanel Raviv June Joint work with: Prof. Tuvi Etzion Technion,
AN ORTHOGONAL PROJECTION
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
13 th Nov Geometry of Graphs and It’s Applications Suijt P Gujar. Topics in Approximation Algorithms Instructor : T Kavitha.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Doubling Dimension: a short survey Anupam Gupta Carnegie Mellon University Barriers in Computational Complexity II, CCI, Princeton.
Lower Bounds for Embedding Edit Distance into Normed Spaces A. Andoni, M. Deza, A. Gupta, P. Indyk, S. Raskhodnikova.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Columbia) Robert Krauthgamer (Weizmann Inst) Ilya Razenshteyn (MIT) 1.
Honors Track: Competitive Programming & Problem Solving Seminar Topics Kevin Verbeek.
Support Vector Machines (SVM)
Dimension reduction for finite trees in L1
Lecture 22: Linearity Testing Sparse Fourier Transform
Fast Dimension Reduction MMDS 2008
Haim Kaplan and Uri Zwick
Spectral Clustering.
Lecture 18: Uniformity Testing Monotonicity Testing
Sublinear Algorithmic Tools 3
Lecture 11: Nearest Neighbor Search
Sublinear Algorithmic Tools 2
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Singular Value Decomposition
Lecture 10: Sketching S3: Nearest Neighbor Search
Sketching and Embedding are Equivalent for Norms
Analysis and design of algorithm
Lecture 7: Dynamic sampling Dimension Reduction
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Lecture 16: Earth-Mover Distance
Near-Optimal (Euclidean) Metric Compression
Yair Bartal Lee-Ad Gottlieb Hebrew U. Ariel University
Richard Anderson Lecture 28 Coping with NP-Completeness
cse 521: design and analysis of algorithms
Chapter 11 Limitations of Algorithm Power
Embedding and Sketching
Metric Methods and Approximation Algorithms
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Embedding Metrics into Geometric Spaces
Lecture 6: Counting triangles Dynamic graphs & sampling
Minwise Hashing and Efficient Search
Clustering.
Sublinear Algorihms for Big Data
Presentation transcript:

Lecture 15: Least Square Regression Metric Embeddings COMS E6998-9 F15 Lecture 15: Least Square Regression Metric Embeddings Left to the title, a presenter can insert his/her own image pertinent to the presentation.

Administrivia, Plan PS2: 120->144 auto extension Plan: Pick up after class 120->144 auto extension Plan: Least Squares Regression (finish) Metric Embeddings “reductions for distances”

Least Square Regression Problem: 𝑎𝑟𝑔𝑚𝑖 𝑛 𝑥 ||𝐴𝑥−𝑏|| where A is 𝑛×𝑑 matrix 𝑏 is a vector of dimension 𝑛 𝑛≫𝑑 Usual (exact) solution: Perform SVD (singular value decomposition) Takes 𝑂 𝑛 𝑑 𝜔−1 ≈𝑂(𝑛 𝑑 1.373 ) time Faster?

Approximate LSR Approximate solution 𝑥′: ||𝐴 𝑥 ′ −𝑏||≤ 1+𝜖 ||𝐴 𝑥 ∗ −𝑏|| Where 𝑥 ∗ optimal solution Tool: dimension reduction for subspaces! A map Π: ℜ 𝑛 → ℜ 𝑘 is (𝑑,𝜖,𝛿)-subspace embedding if For any linear subspace 𝑃⊂ ℜ 𝑛 of dimension 𝑑, we have that Pr Π ∀𝑝∈𝑃 : ||Π 𝑝 || ||𝑝|| ∈ 1−𝜖,1+𝜖 ≥1−𝛿 PS3-2: usual dimension reduction implies (𝑑,𝜖,0.1) for target dimension 𝑘=𝑂(𝑑/ 𝜖 2 )

Approximate Algorithm Let Π be (𝑑+1,𝜖,0.1)-subspace embedding Solve 𝑥 ′ =argmin ||Π𝐴𝑥−Π𝑏|| Theorem: ||𝐴 𝑥 ′ −𝑏||≤ 1+3𝜖 ||𝐴 𝑥 ∗ −𝑏|| Proof: ||Π𝐴𝑥−Π𝑏||=||Π(𝐴𝑥−𝑏)|| 𝑃= 𝐴𝑥−𝑏 𝑥∈ ℜ 𝑑 } is a (subset of a) 𝑑+1 dimensional linear subspace of ℜ 𝑛 Hence Π preserves the norm of all 𝐴𝑥−𝑏 Up to 1+𝜖 approximation each with 90% probability (overall) 1) ||Π𝐴 𝑥 ∗ −Π𝑏||≤ 1+𝜖 ||𝐴 𝑥 ∗ −𝑏|| 2) ||Π𝐴𝑥−Π𝑏||≥ 1−𝜖 ||𝐴𝑥−𝑏|| for any 𝑥 Hence ||𝐴 𝑥 ′ −𝑏||≤ 1+𝜖 1−𝜖 ||𝐴 𝑥 ∗ −𝑏||

Approx LSR algorithms Time: Can apply Fast dimension reduction! 𝑂 𝑛 𝑑 𝜔−1 ⇒ 𝑂 𝑘 𝑑 𝜔−1 = 𝑂 𝜖 ( 𝑑 𝜔 ) Plus time to multiply by Π: 𝑂 𝑛𝑑𝑘 = 𝑂 𝜖 (𝑛 𝑑 2 ) This is worse than before in fact… Can apply Fast dimension reduction! Reduce time to: 𝑂 𝑑⋅ 𝑛⋅log 𝑛+ 𝑑 3 =𝑂(𝑛𝑑⋅ log 𝑛 + 𝑑 4 ) First term near optimal Can do even faster: Exist Π with 1 non-zero/column with 𝑘=𝑂( 𝑑 2 / 𝜖 2 ) Exactly the one from problem 1 on PS2 ! Time becomes: 𝑂 𝜖 (𝑛𝑛𝑧 𝐴 + 𝑑 3 ) [Sarlos’06, Clarkson-Woodruff’13, Meng-Mahoney’13, Nelson-Nguyen’13]

Metric embeddings

Definition by example Problem: Compute the diameter of a set 𝑆, of size 𝑛, living in 𝑑-dimensional ℓ 1 𝑑 Say, for 𝑑=2 Trivial solution: 𝑂(𝑑 𝑛 2 ) time Will see: 𝑂( 2 𝑑 𝑛) time Algorithm has two steps: 1. Map 𝑓: ℓ 1 𝑑 → ℓ ∞ 𝑘 , where 𝑘= 2 𝑑 such that, for any 𝑥,𝑦 ℓ 1 𝑑 ||𝑥−𝑦| | 1 =||𝑓 𝑥 −𝑓 𝑦 | | ∞ 2. Solve the diameter problem in ℓ ∞ on set 𝑓(𝑆)

Step 1: Map from ℓ 1 to ℓ ∞ Want map 𝑓: ℓ 1 → ℓ ∞ such that for 𝑥,𝑦 ℓ 1 ║𝑥−𝑦 ║ 1 = ║𝑓 𝑥 −𝑓 𝑦 ║ ∞ Define 𝑓(𝑥) as follows: 2 𝑑 coordinates indexed by 𝑏=( 𝑏 1 𝑏 2 … 𝑏 𝑑 ) (binary representation) 𝑓 𝑥 𝑏 = 𝑖 −1 𝑏 𝑖 ⋅ 𝑥 𝑖 Claim: ║𝑓 𝑥 −𝑓 𝑦 ║ ∞ =║𝑥−𝑦 ║ 1 ║𝑓 𝑥 −𝑓 𝑦 ║ = max 𝑏 𝑖 −1 𝑏 𝑖 ⋅( 𝑥 𝑖 − 𝑦 𝑖 ) = 𝑖 max 𝑏 𝑖 −1 𝑏 𝑖 𝑥 𝑖 − 𝑦 𝑖 = 𝑖 | 𝑥 𝑖 − 𝑦 𝑖 | =||𝑥−𝑦| | 1

Step 2: Diameter in ℓ ∞ Claim: can compute diameter of 𝑛 points living in ℓ ∞ 𝑘 in 𝑂(𝑛𝑘) time. Proof: diameter(𝑆)= max 𝑝,𝑞∈𝑆 ||𝑝−𝑞| | ∞ = max 𝑥,𝑦∈𝑆 max 𝑏 | 𝑝 𝑏 − 𝑞 𝑏 | = max 𝑏 max 𝑝,𝑞∈𝑆 | 𝑝 𝑏 − 𝑞 𝑏 | = max 𝑏 ( max 𝑝∈𝑆 𝑝 𝑏 − min 𝑞∈ 𝑆 𝑞 𝑏 ) Hence, can compute in 𝑂(𝑘⋅𝑛) time. Combining the two steps, we have 𝑂( 2 𝑑 ⋅𝑛) time for computing diameter in ℓ 1 𝑑

General Theory: embeddings General motivation: given distance (metric) 𝑀, solve a computational problem 𝑃 under 𝑀 Hamming distance, ℓ 1 Compute distance between two points Euclidean distance (ℓ2) Nearest Neighbor Search Edit distance between two strings Diameter/Close-pair of set S Earth-Mover (transportation) Distance Clustering, MST, etc 𝑓 Reduce problem <𝑃 under hard metric> to <𝑃 under simpler metric>

Embeddings: landscape Definition: an embedding is a map 𝑓:𝑀𝐻 of a metric (𝑀, 𝑑 𝑀 ) into a host metric (𝐻, 𝜌 𝐻 ) such that for any 𝑥,𝑦𝑀: 𝑑 𝑀 𝑥,𝑦 ≤  𝐻 𝑓 𝑥 , 𝑓 𝑦 ≤𝐷⋅ 𝑑 𝑀 (𝑥,𝑦) where 𝐷 is the distortion (approximation) of the embedding 𝑓. Embeddings come in all shapes and colors: Source/host spaces 𝑀, 𝐻 Distortion 𝐷 Can be randomized: 𝐻(𝑓(𝑥), 𝑓(𝑦)) ≈ 𝑑𝑀(𝑥,𝑦) with 1− probability Time to compute 𝑓(𝑥) Types of embeddings: From norm to the same norm but of lower dimension (dimension reduction) From one norm ( ℓ 2 ) into another norm ( ℓ 1 ) From non-norms (edit distance, Earth-Mover Distance) into a norm (ℓ1) From given finite metric (shortest path on a planar graph) into a norm (ℓ1) 𝐻 not a metric but a computational procedure ← sketches

ℓ 2 into ℓ 1 Theorem: can embed ℓ 2 𝑑 into ℓ 1 𝑘 for 𝑘=𝑂 𝑑 𝜖 2 and distortion 1+𝜖 Map: 𝐹(𝑥)= 1 𝑘 𝐺𝑥 for 𝐺= Gaussian 𝑘×𝑑 Proof: Idea similar to dimension reduction in ℓ 2 : Claim: for any points 𝑥,𝑦∈ ℜ 𝑑 , let 𝛿=||𝑥−𝑦| | 2 , then: Pr ||𝐹 𝑥 −𝐹 𝑦 | | 1 𝛿 ∈(1−𝜖,1+𝜖) ≥1− 𝑒 −Ω 𝜖 −2 𝑘 𝐹 𝑥 −𝐹 𝑦 = 1 𝑘 𝐺 𝑥−𝑦 Distributed as 1 𝑘 ( 𝑔 1 𝛿, 𝑔 2 𝛿,…, 𝑔 𝑘 𝛿) Hence ||𝐹 𝑥 −𝐹 𝑦 | | 1 =𝛿⋅ 1 𝑘 𝑖 𝑔 𝑖 (in dimension reduction we had 1 𝑘 𝑖 𝑔 𝑖 2 ) Also can prove that: 1 𝑘 𝑖 𝑔 𝑖 =1±𝜖 with probability at least 1− 𝑒 −Ω 𝜖 −2 𝑘 Now apply the same argument as in subspace embedding to argue for the entire space ℜ 𝑑 as long as 𝑘≥Ω(𝑑/ 𝜖 2 ) Morale: ℓ 1 is at least as “large/hard” as ℓ 2

Converse? Can we embed ℓ 1 into ℓ 2 with good approximation? No! [Enflo’69]: embedding 0,1 𝑑 into ℓ 2 (any dimension) must incur 𝑑 distortion Proof: Suppose 𝑓 is the embedding of 0,1 𝑑 into ℓ 2 Two distributions over pairs of points 𝑥,𝑦∈ 0,1 𝑑 : F: 𝑥 random and 𝑦=𝑥⊕ 1 C: 𝑥=𝑦⊕ 𝑒 𝑖 for random 𝑦 and index 𝑖 Two steps: 𝐸 𝐶 𝑥−𝑦 1 2 ≤1/ 𝑑 2 ⋅ 𝐸 𝐹 𝑥−𝑦 1 2 𝐸 𝐶 𝑓 𝑥 −𝑓 𝑦 2 2 ≥1/𝑑⋅ 𝐸 𝐹 𝑓 𝑥 −𝑓 𝑦 2 2 (short diagonals) Implies Ω 𝑑 lower bound! Morale: ℓ 1 is strictly larger than ℓ 2 !

Other distances? E.g, Earth-Mover Distance Definition: Given two sets 𝐴, 𝐵 of points in a metric space 𝐸𝑀𝐷(𝐴,𝐵) = min cost bipartite matching between 𝐴 and 𝐵 Which metric space? Can be plane, ℓ 2 , ℓ 1 … Applications in image vision Images courtesy of Kristen Grauman