# Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech.

## Presentation on theme: "Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech."— Presentation transcript:

Embedding Metric Spaces in Their Intrinsic Dimension Ittai Abraham, Yair Bartal*, Ofer Neiman The Hebrew University * also Caltech

Emebdding Metric Spaces Metric spaces (X,d X ), (Y,d Y ) Metric spaces (X,d X ), (Y,d Y ) Embedding is a function f : XY Embedding is a function f : XY Distortion is the minimal α such that Distortion is the minimal α such that d X (x,y)d Y (f(x),f(y))α·d X (x,y) d X (x,y)d Y (f(x),f(y))α·d X (x,y)

Intrinsic Dimension Doubling Constant : The minimal λ such any ball of radius r>0, can be covered by λ balls of radius r/2. Doubling Constant : The minimal λ such any ball of radius r>0, can be covered by λ balls of radius r/2. Doubling Dimension : dim( X ) = log 2 λ. Doubling Dimension : dim( X ) = log 2 λ. The problem: Relation between metric dimension to intrinsic dimension. The problem: Relation between metric dimension to intrinsic dimension.

Previous Results Given a λ -doubling finite metric space (X,d) and 0<γ<1, it s snow-flake version (X,d γ ) can be embedded into L p with distortion and dimension depending only on λ [Assouad 83]. Given a λ -doubling finite metric space (X,d) and 0<γ<1, it s snow-flake version (X,d γ ) can be embedded into L p with distortion and dimension depending only on λ [Assouad 83]. Conjecture (Assouad) : This hold for γ=1. Conjecture (Assouad) : This hold for γ=1. Disproved by Semmes. Disproved by Semmes. A lower bound on distortion of for L 2, with a matching upper bound [GKL 03]. A lower bound on distortion of for L 2, with a matching upper bound [GKL 03].

Rephrasing the Question Is there a low-distortion embedding for a finite metric space in its intrinsic dimension? Is there a low-distortion embedding for a finite metric space in its intrinsic dimension? Main result : Yes. Main result : Yes.

Main Results Any finite metric space (X,d) embeds into L p : Any finite metric space (X,d) embeds into L p : With distortion O(log 1+θ n) and dimension O(dim(X)/θ), for any θ>0. With distortion O(log 1+θ n) and dimension O(dim(X)/θ), for any θ>0. With constant average distortion and dimension O(dim(X)log(dim(X))). With constant average distortion and dimension O(dim(X)log(dim(X))).

Additional Result Any finite metric space (X,d) embeds into L p : Any finite metric space (X,d) embeds into L p : With distortion and dimension. With distortion and dimension. ( For all D (log n)/dim(X) ). ( For all D (log n)/dim(X) ). In particular Õ(log 2/3 n) distortion and dimension into L 2. In particular Õ(log 2/3 n) distortion and dimension into L 2. Matches best known distortion result [KLMN 03] for D=(log n)/dim(X), with dimension O(log n log(dim(X))). Matches best known distortion result [KLMN 03] for D=(log n)/dim(X), with dimension O(log n log(dim(X))).

Distance Oracles Compact data structure that approximately answers distance queries. Compact data structure that approximately answers distance queries. For general n -point metrics: For general n -point metrics: [TZ 01] O(k) stretch with O(kn 1/k ) bits per label. [TZ 01] O(k) stretch with O(kn 1/k ) bits per label. For a finite λ -doubling metric: For a finite λ -doubling metric: O(1) average stretch with Õ(log λ) bits per label. O(1) average stretch with Õ(log λ) bits per label. O(k) stretch with Õ(λ 1/k ) bits per label. O(k) stretch with Õ(λ 1/k ) bits per label. Follows from variation on snow- flake embedding (Assouad).

First Result Thm: For any finite λ -doubling metric space (X,d) on n points and any 0<θ<1 there exists an embedding of (X,d) into L p with distortion O(log 1+θ n) and dimension O((log λ)/θ). Thm: For any finite λ -doubling metric space (X,d) on n points and any 0<θ<1 there exists an embedding of (X,d) into L p with distortion O(log 1+θ n) and dimension O((log λ)/θ).

Probabilistic Partitions P={S 1,S 2,…S t } is a partition of X if P={S 1,S 2,…S t } is a partition of X if P(x) is the cluster containing x. P(x) is the cluster containing x. P is Δ-bounded if diam(S i )Δ for all i. P is Δ-bounded if diam(S i )Δ for all i. A probabilistic partition P is a distribution over a set of partitions. A probabilistic partition P is a distribution over a set of partitions. A Δ-bounded P is η-padded if for all xєX : A Δ-bounded P is η-padded if for all xєX :

η-padded Partitions The parameter η determines the quality of the embedding. The parameter η determines the quality of the embedding. [Bartal 96]: η=Ω(1/log n) for any metric space. [Bartal 96]: η=Ω(1/log n) for any metric space. [CKR01+FRT03]: Improved partitions with η(x)=1/log(ρ(x,Δ)). [CKR01+FRT03]: Improved partitions with η(x)=1/log(ρ(x,Δ)). [GKL 03] : η=Ω(1/log λ) for λ -doubling metrics. [GKL 03] : η=Ω(1/log λ) for λ -doubling metrics. [KLMN 03]: Used to embed general + doubling metrics into L p : distortion O((log λ) 1-1/p (log n) 1/p ), dimension O(log 2 n). [KLMN 03]: Used to embed general + doubling metrics into L p : distortion O((log λ) 1-1/p (log n) 1/p ), dimension O(log 2 n). The local growth rate of x at radius r is:

Plan: A simpler result of: A simpler result of: Distortion O(log n). Distortion O(log n). Dimension O(loglog n·log λ). Dimension O(loglog n·log λ). Obtaining lower dimension of O(log λ). Obtaining lower dimension of O(log λ). Brief overview of: Brief overview of: Constant average distortion. Constant average distortion. Distortion-dimension tradeoff. Distortion-dimension tradeoff.

For each scale iє Z, create uniformly padded local probabilistic 8 i -bounded partition P i. For each scale iє Z, create uniformly padded local probabilistic 8 i -bounded partition P i. For each cluster choose σ i (S)~Ber(½) i.i.d. For each cluster choose σ i (S)~Ber(½) i.i.d. f i (x)=σ i (P i (x))·min{η i -1 (x)·d(x,X\P i (x)), 8 i } f i (x)=σ i (P i (x))·min{η i -1 (x)·d(x,X\P i (x)), 8 i } Deterministic upper bound : Deterministic upper bound : |f(x)-f(y)| O(log n·d(x,y)). |f(x)-f(y)| O(log n·d(x,y)). using using Embedding into one dimension x d(x,X\P i (x) PiPi

Lower Bound - Overview Create a r i -net for all integers i. Create a r i -net for all integers i. Define success event for a pair (u,v) in the r i -net, d(u,v)8 i : as having contribution > 8 i /4, for many coordinates. Define success event for a pair (u,v) in the r i -net, d(u,v)8 i : as having contribution > 8 i /4, for many coordinates. In every coordinate, a constant probability of having contribution for a net pair (u,v). In every coordinate, a constant probability of having contribution for a net pair (u,v). Use Lovasz Local Lemma. Use Lovasz Local Lemma. Show lower bound for other pairs. Show lower bound for other pairs.

u x Lower Bound – Other Pairs? x,y some pair, d(x,y)8 i. u,v the nearest in the r i -net to x,y. x,y some pair, d(x,y)8 i. u,v the nearest in the r i -net to x,y. Suppose that |f(u)-f(v)|>8 i /4. Suppose that |f(u)-f(v)|>8 i /4. We want to choose the net such that |f(u)-f(x)|<8 i /16, choose r i = 8 i /(16·log n). We want to choose the net such that |f(u)-f(x)|<8 i /16, choose r i = 8 i /(16·log n). Using the upper bound |f(u)-f(x)| log n·d(u,x) 8 i /16 Using the upper bound |f(u)-f(x)| log n·d(u,x) 8 i /16 |f(x)-f(y)| |f(u)-f(v)|-|f(u)-f(x)|-|f(v)-f(y)| 8 i /4-2·8 i /16 = 8 i /8. |f(x)-f(y)| |f(u)-f(v)|-|f(u)-f(x)|-|f(v)-f(y)| 8 i /4-2·8 i /16 = 8 i /8. y v 8 i /(16log n)

u v r i -net pair (u,v). Can assume that 8 id(u,v)/4. r i -net pair (u,v). Can assume that 8 id(u,v)/4. It must be that P i (u)P i (v) It must be that P i (u)P i (v) With probability ½ : d(u,X\P i (u))η i 8 i With probability ½ : d(u,X\P i (u))η i 8 i With probability ¼ : σ i (P i (u))=1 and σ i (P i (v))=0 With probability ¼ : σ i (P i (u))=1 and σ i (P i (v))=0 LowerBound:

Lower Bound – Net Pairs d(u,v)8 i. Consider d(u,v)8 i. Consider If R<8 i /2 : If R<8 i /2 : With prob. 1/8 f i (u)-f i (v) 8 i. With prob. 1/8 f i (u)-f i (v) 8 i. If R 8 i /2 : If R 8 i /2 : With prob. 1/4 f i (u)=f i (v)=0. With prob. 1/4 f i (u)=f i (v)=0. In any case In any case Lower scales do not matter Lower scales do not matter u v η i (u) 8 i The good event for pair in scale i depend on higher scales, but has constant probability given any outcome for them. Oblivious to lower scales.

Local Lemma Lemma (Lovasz): Let A 1,…A n be bad events. G=(V,E) a directed graph with vertices corresponding to events with out-degree at most d. Let c:VN be rating function of event such that (A i,A j )єE then c(A i )c(A j ), if Lemma (Lovasz): Let A 1,…A n be bad events. G=(V,E) a directed graph with vertices corresponding to events with out-degree at most d. Let c:VN be rating function of event such that (A i,A j )єE then c(A i )c(A j ), if and and then then Rating = radius of scale.

Lower Bound – Net Pairs A success event E(u,v) for a net pair u,v : there is contribution from at least 1/16 of the coordinates. A success event E(u,v) for a net pair u,v : there is contribution from at least 1/16 of the coordinates. Locality of partition – the net pair depend only on nearby points, with distance < 8 i. Locality of partition – the net pair depend only on nearby points, with distance < 8 i. Doubling constant λ, and r i 8 i /log n - there are at most λ loglog n such points, so d=λ loglog n. Doubling constant λ, and r i 8 i /log n - there are at most λ loglog n such points, so d=λ loglog n. Taking D=O(log λ·loglog n) coordinates will give roughly e -D = λ -loglog n failure probability. Taking D=O(log λ·loglog n) coordinates will give roughly e -D = λ -loglog n failure probability. By the local lemma, there is exists an embedding such that E(u,v) holds for all net pairs. By the local lemma, there is exists an embedding such that E(u,v) holds for all net pairs.

Obtaining Lower Dimension To use the LLL, probability to fail in more than 15/16 of the coordinates must be < λ -loglog n To use the LLL, probability to fail in more than 15/16 of the coordinates must be < λ -loglog n Instead of taking more coordinates, increase the success probability in each coordinate. Instead of taking more coordinates, increase the success probability in each coordinate. If probability to obtain contribution in each coordinate >1-1/ log n, it is enough to take O(log λ) coordinates. If probability to obtain contribution in each coordinate >1-1/ log n, it is enough to take O(log λ) coordinates. Similarly, if failure prob. in each coordinate < log -θ n, enough to take O((log λ)/θ) coordinates

Using Several Scales Create nets only every θloglog n scales. Create nets only every θloglog n scales. A pair (x,y) in scale i (i.e. d(x,y)8 i ) will find a close net pair in nearest smaller scale i. A pair (x,y) in scale i (i.e. d(x,y)8 i ) will find a close net pair in nearest smaller scale i. 8 i <log θ n·8 i, so lose a factor of log θ n in the distortion. 8 i <log θ n·8 i, so lose a factor of log θ n in the distortion. Consider scales i-θloglog n,…,i. Consider scales i-θloglog n,…,i. i i θloglog n > i-θloglog n i+θloglog n

Using Several Scales Take u,v in the net with d(u,v)8 i. Take u,v in the net with d(u,v)8 i. A success in one of these scales will give A success in one of these scales will give contribution > 8 i-θloglog n = 8 i /log θ n. contribution > 8 i-θloglog n = 8 i /log θ n. The success for u,v in each scale is : The success for u,v in each scale is : Unaffected by higher scales events Unaffected by higher scales events Independent of events far away in the same scale. Independent of events far away in the same scale. Oblivious to events in lower scales. Oblivious to events in lower scales. Probability that all scales failed< (7/8) θloglog n. Probability that all scales failed< (7/8) θloglog n. Take only D=O((log λ)/θ) coordinates. Take only D=O((log λ)/θ) coordinates. Lose a factor of log θ n in the distortion` i i-θloglog n i+θloglog n

Constant Average Distortion Scaling distortion – for every 0 polylog(1/ε). Scaling distortion – for every 0 polylog(1/ε). Upper bound of log(1/ε), by standard techniques. Upper bound of log(1/ε), by standard techniques. Lower bound: Lower bound: Define a net for any scale i>0 and ε=exp{-8 j }. Define a net for any scale i>0 and ε=exp{-8 j }. Every pair (x,y) needs contribution that depends on: Every pair (x,y) needs contribution that depends on: d(x,y). d(x,y). The ε -value of x,y. The ε -value of x,y. Sieve the nets to avoid dependencies between different scales and different values of ε. Sieve the nets to avoid dependencies between different scales and different values of ε. Show that if a net pair succeeded, the points near it will also succeed. Show that if a net pair succeeded, the points near it will also succeed.

Constant Average Distortion Lower bound cont… Lower bound cont… The local Lemma graph depends on ε, use the general case of local Lemma. The local Lemma graph depends on ε, use the general case of local Lemma. For a net pair (u,v) in scale 8 i – consider scales: 8 i -loglog(1/ε),…,8 i -loglog(1/ε)/2. For a net pair (u,v) in scale 8 i – consider scales: 8 i -loglog(1/ε),…,8 i -loglog(1/ε)/2. Requires dimension O(log λ·loglog λ). Requires dimension O(log λ·loglog λ). λ. The net depends on λ.

Distortion-Dimension Tradeoff Distortion : Distortion : Dimension : Dimension : Instead of assigning all scales to a single coordinate: Instead of assigning all scales to a single coordinate: For each point x: For each point x: Divide the scales into D bunches of coordinates, in each Divide the scales into D bunches of coordinates, in each Create a hierarchical partition. Create a hierarchical partition. D (log n)/log λ Upper bound needs the x,y scales to be in the same coordinates

Conclusion Main result: Main result: Embedding metrics into their intrinsic dimension. Embedding metrics into their intrinsic dimension. Open problem: Open problem: Best distortion in dimension O(log λ). Best distortion in dimension O(log λ). Dimension reduction in L 2 : Dimension reduction in L 2 : For a doubling subset of L 2,is there an embedding into L 2 with O(1) distortion and dimension O(dim(X))? For a doubling subset of L 2,is there an embedding into L 2 with O(1) distortion and dimension O(dim(X))? For p>2 there is a doubling metric space requiring dimension at least Ω(log n) for embedding into L P with distortion O(log 1/p n).

Similar presentations