Log-Sobolev Inequality on the Multislice (and what those words mean)

Log-Sobolev Inequality on the Multislice (and what those words mean)
Ryan O’Donnell Carnegie Mellon Yuval Filmus (Technion) Xinyu Wu (Carnegie Mellon)

Random walks on (regular) graphs
Boolean Cube: V = {0,1}n, Hamming-distance 1 edges Hamming “Slice”: Boolean strings of Hamming weight k Edge when strings differ by a transposition “Multislice”: E.g., ternary strings with exactly k1 1’s, k2 2’s, and k3 3’s Edge when strings differ by a transposition Symmetric group: V = Sn, transposition edges Grassmann graph, association schemes, polar spaces, …

Log-Sobolev inequalities:
Related to mixing time for random walk

Conductance / Expansion
Let S ⊆ {0,1}n be a starting set. Pick u ~ S at random. Take one step from u, to v. Ask: did we escape from S? Φ(S) = Pr [v ∉ S] (Conductance / Expansion / Boundary size) u ~ S v ~ u

vol(S) = Pr [u ∈ S | u uniform] = 1/2
Examples Φ(S) = Pr [v ∉ S | u ~ S, v ~ u] S = { u ∈ {0,1}n : u1 = 1 } Φ(S)= 1/n S = { u : HamWeight(u) > n/2 } Φ(S) = S = { u : HamWeight(u) is odd } Φ(S) = 1 These S are all large: vol(S) = Pr [u ∈ S | u uniform] = 1/2

Examples Φ(S) = Pr [v ∉ S | u ~ S, v ~ u]
Some S with exponentially small vol(S): S = { 111∙∙∙1 } Φ(S) = 1 S = { u : HamWeight(u) > (3/4)n } Φ(S) ≈ 3/4 S = { u : u1 = u2 = ∙∙∙ = un/2 = 1 } Φ(S) = 1/2 “Small Set Expansion” phenomenon in the Boolean cube.

Isoperimetric Problem:
Among all S of fixed vol(S), how small can Φ(S) be? Ancient combinatorics on the Boolean cube: The exact minimizer S for every vol(S) is known. Log-Sobolev inequalities take a more analytic approach, Only known way to answer… “Among all S of fixed vol(S), how small can Φt(S) be?” Φt(S) = Pr [v ∉ S] u ~ S v ~ t steps from u

a slightly less ‘spiky’ distribution an even less ‘spiky’ distribution
A related question: If we pick u ~ S and do a t-step random walk to v, how close is v’s distribution to the uniform distribution? u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S a slightly less ‘spiky’ distribution an even less ‘spiky’ distribution ∙∙∙ a pretty ‘smooth/flat’ distribution (?) an even ‘smoother/flatter’ distribution u∞ the uniform distribution

Pretend each distribution is uniform on a subset.
Helpful intuition: Pretend each distribution is uniform on a subset. u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S a slightly less ‘spiky’ distribution an even less ‘spiky’ distribution ∙∙∙ a pretty ‘smooth/flat’ distribution (?) an even ‘smoother/flatter’ distribution u∞ the uniform distribution

Helpful intuition: Pretend each distribution is uniform on a subset. u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S0 = S “uniform on neighbors S1 of S0” “uniform on neighbors S2 of S1” “Small Set Expansion” idea: Φ(Si) is always “large” so long as Si is “small” ⇒ walk mixes quickly, at least at the beginning u∞ uniform on all of {0,1}n

Helpful intuition: Pretend each distribution is uniform on a subset. u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S0 = S “uniform on neighbors S1 of S0” “uniform on neighbors S2 of S1” “Small Set Expansion” idea: Φ(Si) is always “large” so long as Si is “small” ⇒ walk mixes quickly, at least at the beginning If Si reaches, say, { u : u1 = 1 }, will take Θ(n) steps to make more progress. u∞ uniform on all of {0,1}n

This intuition is well captured by “Log-Sobolev inequalities”.
We’ll need to quantify “distance of a distribution from uniform”. There are zillions of “distances” for probability distributions: Total variation, Hellinger, KL divergence, χ2-distance, Lp-distances… We’ll get to these. But first, an easier cousin of Log-Sobolev inequalities…

Expansion vs. volume via eigenvalues
Poincaré Inequality: It implies: Φ(S) ≥ (2/n) (1−vol(S)) Hence: vol(S) ≤ 1/2 ⇒ Φ(S) ≥ 1/n Better than nothing, but doesn’t capture “Small Set Expansion”.

Poincaré Inequality: It implies: Φ(S) ≥ (2/n) (1−vol(S)) More generally, it is the following statement: For any probability distribution p on {0,1}n, Implies the above set statement by taking p = UnifS.

Poincaré Inequality: For any probability distribution p on {0,1}n, Exactly equivalent to: “Second eigenvalue gap for the Boolean cube’s random walk matrix is ≥ 2/n.”

Poincaré Inequality: For any probability distribution p on {0,1}n, average “local” L2-distance 2 global L2-distance of p from uniformity 2

Log-Sobolev Inequality
[Gross’75] It implies: Φ(S) ≥ ½(2/n) ln(1/vol(S)) So again: for vol(S) ≈ 1/2, only get Φ(S) ≥ Ω(1/n) But for vol(S) = 2−Θ(n), you get Φ(S) ≥ Ω(1) ! Small Set Expansion! This inequality is sharp (up to Θ(1)) for all values of vol(S).

[Gross’75] It implies: Φ(S) ≥ ½(2/n) ln(1/vol(S)) More generally, it is the following statement: For any probability distribution p on {0,1}n, a global distance of p from uniformity average local Hellinger2-distance

[Gross’75] It implies: Φ(S) ≥ ½(2/n) ln(1/vol(S)) More generally, it is the following statement: For any probability distribution p on {0,1}n, Implies the above set statement by taking p = UnifS.

What else is great: It “tensorizes”. I.e., it behaves beautifully under taking product graphs. Hamming cube is n-fold product of a single-edge graph. For any probability distribution p on {0,1}n, a global distance of p from uniformity average local Hellinger2-distance

What else is great: It “tensorizes”. I.e., it behaves beautifully under taking product graphs. Hamming cube is n-fold product of a single-edge graph. The “log-Sobolev constant” for the Hamming cube. a global distance of p from uniformity average local Hellinger2-distance

What else is great: It “tensorizes”. I.e., it behaves beautifully under taking product graphs. Hamming cube is n-fold product of a single-edge graph. The “log-Sobolev constant” for the Hamming cube. Tensorization property ⇒ log-Sobolev constant for {0,1}n is (1/n)(log-Sobolev constant for single-edge)

Log-Sobolev constant for a single-edge graph is 2: This is a simple 1-variable inequality. The “log-Sobolev constant” for the Hamming cube. Tensorization property ⇒ log-Sobolev constant for {0,1}n is (1/n)(log-Sobolev constant for single-edge)

Log-Sobolev Inequality ⇔ Hypercontractive Inequality
[Gross’75] Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ t steps.” (Technically: Walk for T ~ Poisson(t) steps.) avg (q(u) − 2−n) ≤ avg |p(u) − 2−n|1+c u Then where c = exp(−2 (2/n) t) < 1. E.g., if t = .1n then c = exp(−.4) ≈ .67, so 1+c ≈ 1.67.

[Gross’75] Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ .1n steps.” avg (q(u) − 2−n) ≤ avg |p(u) − 2−n|1.67 u Then (average) L2-distance to uniformity (average) L1.67-distance to uniformity

[Gross’75] avg-L2-distance(p) ≥ avg-L1+c-distance(p) always; e.g., for p = single point mass, ~2−n/2 vs. ~2−n Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ .1n steps.” avg (q(u) − 2−n) ≤ avg |p(u) − 2−n|1.67 u Then (average) L2-distance to uniformity (average) L1.67-distance to uniformity

[Gross’75] Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ t steps.” avg (q(u) − 2−n) ≤ avg |p(u) − 2−n|1+c u Then where c = exp(−2 (2/n) t) < 1. If you let p = UnifS you get…

⇓ “Φϵn”(S) ≥ 1 − vol(S)ϵ / (1−ϵ) Very strong “Small Set Expansion” in the “noisy hypercube”! where “Φϵn”(S) = Pr [v ∉ S] u ~ S v = Noiseϵ(u) Noiseϵ(u) = flip each coordinate indep. with probability ϵ

⇓ “Φϵn”(S) ≥ 1 − vol(S)ϵ / (1−ϵ) Very strong “Small Set Expansion” in the “noisy hypercube”! ⇓ Zillions of applications: KKL Theorem, Friedgut Junta Theorem, robust Kruskal−Katona, weak-learning monotone functions, sharp threshold phenomena, optimal Unique Games-hardness results, …

Summary so far Log-Sobolev inequalities are cool
They imply “small set expansion” for 1-step walks They imply hypercontractive inequalities Which imply “small set expansion” for long walks, and have many many other applications They’re easy to prove for product graphs / Markov chains

the only product graph Boolean Cube: V = {0,1}n, Hamming-distance 1 edges Hamming “Slice”: Boolean strings of Hamming weight k Edge when strings differ by a transposition “Multislice”: E.g., ternary strings with exactly k1 1’s, k2 2’s, and k3 3’s Edge when strings differ by a transposition Symmetric group: V = Sn, transposition edges Grassmann graph, association schemes, polar spaces, …

Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: [Diaconis−Saloff-Coste ’96]: Log-Sobolev constant is Θ(1/n log n) “Multislice”: The log n ruins everything.  No good small set expansion, no good hypercontractivity, no KKL or other cool applications… Symmetric group:

Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Boolean strings of Hamming weight k Edge when strings differ by a transposition “Multislice”: Symmetric group:

Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Boolean strings of k0 0’s and k1 1’s. Edge when strings differ by a transposition “Multislice”: [T.-Y. Lee and H.-T. Yau ’98]: Log-Sobolev constant is Θ(1/n) provided k0/n and k1/n are Ω(1). Symmetric group: Great! These slices enjoy all the same SSE and applications as Boolean Cube!

Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times  “Multislice”: Symmetric group: Log-Sobolev constant Θ(1/n log n) 

Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times  “Multislice”: E.g., ternary strings with exactly k1 1’s, k2 2’s, and k3 3’s Edge when strings differ by a transposition Symmetric group: Log-Sobolev constant Θ(1/n log n) 

Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times  “Multislice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times, O(1) symbols −[Filmus-O-Wu ’18]  Symmetric group: Log-Sobolev constant Θ(1/n log n) 

How did Lee−Yau do it for the Slice?
And why can’t you do the same for the Multislice? Chain rule for KL-divergence… Averaging over a random coordinate… An induction… In the Slice, there is only one kind of step: swapping a 0 and a 1. Even in the ternary Multislice, there are multiple kinds of steps: swapping 1 & 2, swapping 1 & 3, swapping 2 & 3.

How did Lee−Yau do it for the Slice?
And why can’t you do the same for the Multislice? Chain rule for KL-divergence… Averaging over a random coordinate… An induction… The induction becomes much more complicated. Doesn’t look like it’s going to work. But then one of the coauthors just makes it work. 

for more interesting Markov chains!
Open Directions Log-Sobolev inequalities (and hypercontractivity, and Small Set Expansion) for more interesting Markov chains! And if they turn out badly, try to classify the small sets for which SSE fails!

The End - Thanks!

Log-Sobolev Inequality on the Multislice (and what those words mean)

Similar presentations

Presentation on theme: "Log-Sobolev Inequality on the Multislice (and what those words mean)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Log-Sobolev Inequality on the Multislice (and what those words mean)

Similar presentations

Presentation on theme: "Log-Sobolev Inequality on the Multislice (and what those words mean)"— Presentation transcript:

Similar presentations

About project

Feedback