Log-Sobolev Inequality on the Multislice (and what those words mean)

Slides:



Advertisements
Similar presentations
Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.
Advertisements

Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
Entropy Rates of a Stochastic Process
Probabilistic Methods in Coding Theory: Asymmetric Covering Codes Joshua N. Cooper UCSD Dept. of Mathematics Robert B. Ellis Texas A&M Dept. of Mathematics.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Fourier Analysis of Boolean Functions Juntas, Projections, Influences Etc.
Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2004 Lecture 24April 8, 2004Carnegie Mellon University.
1 Biased card shuffling and the asymmetric exclusion process Elchanan Mossel, Microsoft Research Joint work with Itai Benjamini, Microsoft Research Noam.
Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.
Mixing Times of Self-Organizing Lists and Biased Permutations Sarah Miracle Georgia Institute of Technology.
Ryan O’Donnell Carnegie Mellon University. Part 1: A. Fourier expansion basics B. Concepts: Bias, Influences, Noise Sensitivity C. Kalai’s proof of Arrow’s.
Primer on Fourier Analysis Dana Moshkovitz Princeton University and The Institute for Advanced Study.
Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2005 Lecture 24April 7, 2005Carnegie Mellon University.
Image segmentation Prof. Noah Snavely CS1114
Time to Equilibrium for Finite State Markov Chain 許元春(交通大學應用數學系)
15-853:Algorithms in the Real World
The length of vertex pursuit games Anthony Bonato Ryerson University CCC 2013.
Quantum random walks and quantum algorithms Andris Ambainis University of Latvia.
1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Presented by Alon Levin
Equitable Rectangular Dissections Dana Randall Georgia Institute of Technology Joint with: Sarah Cannon and Sarah Miracle.
Hypercontractivity & Sums of Squares
Classifier Representation in LCS
Computational problems, algorithms, runtime, hardness
Markov Chains and Random Walks
Information Complexity Lower Bounds
Introduction to Randomized Algorithms and the Probabilistic Method
Random walks on undirected graphs and a little bit about Markov Chains
Markov Chains Mixing Times Lecture 5
Primer on Fourier Analysis
LECTURE 03: DECISION SURFACES
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Path Coupling And Approximate Counting
Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD)
Phase Transitions In Reconstruction Yuval Peres, U.C. Berkeley
Structural Properties of Low Threshold Rank Graphs
Great Theoretical Ideas In Computer Science
Haim Kaplan and Uri Zwick
Depth Estimation via Sampling
Models of Network Formation
Linear sketching with parities
Models of Network Formation
Models of Network Formation
Killing and Collapsing
Linear sketching over
Models of Network Formation
Ilan Ben-Bassat Omri Weinstein
Linear sketching with parities
Great Theoretical Ideas in Computer Science
Markov Random Fields Presented by: Vladan Radosavljevic.
ML – Lecture 3B Deep NN.
13. The Weak Law and the Strong Law of Large Numbers
CPS 173 Computational problems, algorithms, runtime, hardness
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Embedding Metrics into Geometric Spaces
The Zig-Zag Product and Expansion Close to the Degree
David Kauchak cs161 Summer 2009
the k-cut problem better approximate and exact algorithms
Small Set Expansion in The Johnson Graph
Data Exploration and Pattern Recognition © R. El-Yaniv
Sparse Kindler-Safra Theorem via agreement theorems
13. The Weak Law and the Strong Law of Large Numbers
Small Set Expansion in The Johnson Graph
Presentation transcript:

Log-Sobolev Inequality on the Multislice (and what those words mean) Ryan O’Donnell Carnegie Mellon Yuval Filmus (Technion) Xinyu Wu (Carnegie Mellon)

Random walks on (regular) graphs Boolean Cube: V = {0,1}n, Hamming-distance 1 edges Hamming “Slice”: Boolean strings of Hamming weight k Edge when strings differ by a transposition “Multislice”: E.g., ternary strings with exactly k1 1’s, k2 2’s, and k3 3’s Edge when strings differ by a transposition Symmetric group: V = Sn, transposition edges Grassmann graph, association schemes, polar spaces, …

Log-Sobolev inequalities: Related to mixing time for random walk

Conductance / Expansion Let S ⊆ {0,1}n be a starting set. Pick u ~ S at random. Take one step from u, to v. Ask: did we escape from S? Φ(S) = Pr [v ∉ S] (Conductance / Expansion / Boundary size) u ~ S v ~ u

vol(S) = Pr [u ∈ S | u uniform] = 1/2 Examples Φ(S) = Pr [v ∉ S | u ~ S, v ~ u] S = { u ∈ {0,1}n : u1 = 1 } Φ(S)= 1/n S = { u : HamWeight(u) > n/2 } Φ(S) = S = { u : HamWeight(u) is odd } Φ(S) = 1 These S are all large: vol(S) = Pr [u ∈ S | u uniform] = 1/2

Examples Φ(S) = Pr [v ∉ S | u ~ S, v ~ u] Some S with exponentially small vol(S): S = { 111∙∙∙1 } Φ(S) = 1 S = { u : HamWeight(u) > (3/4)n } Φ(S) ≈ 3/4 S = { u : u1 = u2 = ∙∙∙ = un/2 = 1 } Φ(S) = 1/2 “Small Set Expansion” phenomenon in the Boolean cube.

Isoperimetric Problem: Among all S of fixed vol(S), how small can Φ(S) be? Ancient combinatorics on the Boolean cube: The exact minimizer S for every vol(S) is known. Log-Sobolev inequalities take a more analytic approach, Only known way to answer… “Among all S of fixed vol(S), how small can Φt(S) be?” Φt(S) = Pr [v ∉ S] u ~ S v ~ t steps from u

a slightly less ‘spiky’ distribution an even less ‘spiky’ distribution A related question: If we pick u ~ S and do a t-step random walk to v, how close is v’s distribution to the uniform distribution? u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S a slightly less ‘spiky’ distribution an even less ‘spiky’ distribution ∙∙∙ a pretty ‘smooth/flat’ distribution (?) an even ‘smoother/flatter’ distribution u∞ the uniform distribution

Pretend each distribution is uniform on a subset. Helpful intuition: Pretend each distribution is uniform on a subset. u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S a slightly less ‘spiky’ distribution an even less ‘spiky’ distribution ∙∙∙ a pretty ‘smooth/flat’ distribution (?) an even ‘smoother/flatter’ distribution u∞ the uniform distribution

Pretend each distribution is uniform on a subset. Helpful intuition: Pretend each distribution is uniform on a subset. u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S0 = S “uniform on neighbors S1 of S0” “uniform on neighbors S2 of S1” “Small Set Expansion” idea: Φ(Si) is always “large” so long as Si is “small” ⇒ walk mixes quickly, at least at the beginning u∞ uniform on all of {0,1}n

Pretend each distribution is uniform on a subset. Helpful intuition: Pretend each distribution is uniform on a subset. u = u0 u1 u2 u3 ∙∙∙ ut−1 v = ut uniform on S0 = S “uniform on neighbors S1 of S0” “uniform on neighbors S2 of S1” “Small Set Expansion” idea: Φ(Si) is always “large” so long as Si is “small” ⇒ walk mixes quickly, at least at the beginning If Si reaches, say, { u : u1 = 1 }, will take Θ(n) steps to make more progress. u∞ uniform on all of {0,1}n

This intuition is well captured by “Log-Sobolev inequalities”. We’ll need to quantify “distance of a distribution from uniform”. There are zillions of “distances” for probability distributions: Total variation, Hellinger, KL divergence, χ2-distance, Lp-distances… We’ll get to these. But first, an easier cousin of Log-Sobolev inequalities…

Expansion vs. volume via eigenvalues Poincaré Inequality: It implies: Φ(S) ≥ (2/n) (1−vol(S)) Hence: vol(S) ≤ 1/2 ⇒ Φ(S) ≥ 1/n Better than nothing, but doesn’t capture “Small Set Expansion”.

Expansion vs. volume via eigenvalues Poincaré Inequality: It implies: Φ(S) ≥ (2/n) (1−vol(S)) More generally, it is the following statement: For any probability distribution p on {0,1}n, Implies the above set statement by taking p = UnifS.

Expansion vs. volume via eigenvalues Poincaré Inequality: For any probability distribution p on {0,1}n, Exactly equivalent to: “Second eigenvalue gap for the Boolean cube’s random walk matrix is ≥ 2/n.”

Expansion vs. volume via eigenvalues Poincaré Inequality: For any probability distribution p on {0,1}n, average “local” L2-distance 2 global L2-distance of p from uniformity 2

Log-Sobolev Inequality [Gross’75] It implies: Φ(S) ≥ ½(2/n) ln(1/vol(S)) So again: for vol(S) ≈ 1/2, only get Φ(S) ≥ Ω(1/n) But for vol(S) = 2−Θ(n), you get Φ(S) ≥ Ω(1) ! Small Set Expansion! This inequality is sharp (up to Θ(1)) for all values of vol(S).

Log-Sobolev Inequality [Gross’75] It implies: Φ(S) ≥ ½(2/n) ln(1/vol(S)) More generally, it is the following statement: For any probability distribution p on {0,1}n, a global distance of p from uniformity average local Hellinger2-distance

Log-Sobolev Inequality [Gross’75] It implies: Φ(S) ≥ ½(2/n) ln(1/vol(S)) More generally, it is the following statement: For any probability distribution p on {0,1}n, Implies the above set statement by taking p = UnifS.

Log-Sobolev Inequality What else is great: It “tensorizes”. I.e., it behaves beautifully under taking product graphs. Hamming cube is n-fold product of a single-edge graph. For any probability distribution p on {0,1}n, a global distance of p from uniformity average local Hellinger2-distance

Log-Sobolev Inequality What else is great: It “tensorizes”. I.e., it behaves beautifully under taking product graphs. Hamming cube is n-fold product of a single-edge graph. The “log-Sobolev constant” for the Hamming cube. a global distance of p from uniformity average local Hellinger2-distance

Log-Sobolev Inequality What else is great: It “tensorizes”. I.e., it behaves beautifully under taking product graphs. Hamming cube is n-fold product of a single-edge graph. The “log-Sobolev constant” for the Hamming cube. Tensorization property ⇒ log-Sobolev constant for {0,1}n is (1/n)(log-Sobolev constant for single-edge)

Log-Sobolev Inequality Log-Sobolev constant for a single-edge graph is 2: This is a simple 1-variable inequality. The “log-Sobolev constant” for the Hamming cube. Tensorization property ⇒ log-Sobolev constant for {0,1}n is (1/n)(log-Sobolev constant for single-edge)

Log-Sobolev Inequality ⇔ Hypercontractive Inequality [Gross’75] Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ t steps.” (Technically: Walk for T ~ Poisson(t) steps.) avg (q(u) − 2−n)2 ≤ avg |p(u) − 2−n|1+c u Then where c = exp(−2 (2/n) t) < 1. E.g., if t = .1n then c = exp(−.4) ≈ .67, so 1+c ≈ 1.67.

Log-Sobolev Inequality ⇔ Hypercontractive Inequality [Gross’75] Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ .1n steps.” avg (q(u) − 2−n)2 ≤ avg |p(u) − 2−n|1.67 u Then (average) L2-distance to uniformity (average) L1.67-distance to uniformity

Log-Sobolev Inequality ⇔ Hypercontractive Inequality [Gross’75] avg-L2-distance(p) ≥ avg-L1+c-distance(p) always; e.g., for p = single point mass, ~2−n/2 vs. ~2−n Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ .1n steps.” avg (q(u) − 2−n)2 ≤ avg |p(u) − 2−n|1.67 u Then (average) L2-distance to uniformity (average) L1.67-distance to uniformity

Log-Sobolev Inequality ⇔ Hypercontractive Inequality [Gross’75] Let p be a probability distribution on {0,1}n. Let q be final distribution of: “Draw u ~ p, walk for ≈ t steps.” avg (q(u) − 2−n)2 ≤ avg |p(u) − 2−n|1+c u Then where c = exp(−2 (2/n) t) < 1. If you let p = UnifS you get…

Log-Sobolev Inequality ⇔ Hypercontractive Inequality ⇓ “Φϵn”(S) ≥ 1 − vol(S)ϵ / (1−ϵ) Very strong “Small Set Expansion” in the “noisy hypercube”! where “Φϵn”(S) = Pr [v ∉ S] u ~ S v = Noiseϵ(u) Noiseϵ(u) = flip each coordinate indep. with probability ϵ

Log-Sobolev Inequality ⇔ Hypercontractive Inequality ⇓ “Φϵn”(S) ≥ 1 − vol(S)ϵ / (1−ϵ) Very strong “Small Set Expansion” in the “noisy hypercube”! ⇓ Zillions of applications: KKL Theorem, Friedgut Junta Theorem, robust Kruskal−Katona, weak-learning monotone functions, sharp threshold phenomena, optimal Unique Games-hardness results, …

Summary so far Log-Sobolev inequalities are cool They imply “small set expansion” for 1-step walks They imply hypercontractive inequalities Which imply “small set expansion” for long walks, and have many many other applications They’re easy to prove for product graphs / Markov chains

Random walks on (regular) graphs the only product graph Boolean Cube: V = {0,1}n, Hamming-distance 1 edges Hamming “Slice”: Boolean strings of Hamming weight k Edge when strings differ by a transposition “Multislice”: E.g., ternary strings with exactly k1 1’s, k2 2’s, and k3 3’s Edge when strings differ by a transposition Symmetric group: V = Sn, transposition edges Grassmann graph, association schemes, polar spaces, …

Random walks on (regular) graphs Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: [Diaconis−Saloff-Coste ’96]: Log-Sobolev constant is Θ(1/n log n) “Multislice”: The log n ruins everything.  No good small set expansion, no good hypercontractivity, no KKL or other cool applications… Symmetric group:

Random walks on (regular) graphs Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Boolean strings of Hamming weight k Edge when strings differ by a transposition “Multislice”: Symmetric group:

Random walks on (regular) graphs Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Boolean strings of k0 0’s and k1 1’s. Edge when strings differ by a transposition “Multislice”: [T.-Y. Lee and H.-T. Yau ’98]: Log-Sobolev constant is Θ(1/n) provided k0/n and k1/n are Ω(1). Symmetric group: Great! These slices enjoy all the same SSE and applications as Boolean Cube!

Random walks on (regular) graphs Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times  “Multislice”: Symmetric group: Log-Sobolev constant Θ(1/n log n) 

Random walks on (regular) graphs Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times  “Multislice”: E.g., ternary strings with exactly k1 1’s, k2 2’s, and k3 3’s Edge when strings differ by a transposition Symmetric group: Log-Sobolev constant Θ(1/n log n) 

Random walks on (regular) graphs Boolean Cube: Log-Sobolev constant 2/n Boolean Cube: Hamming “Slice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times  “Multislice”: Log-Sobolev constant Θ(1/n) provided each symbol used Ω(n) times, O(1) symbols −[Filmus-O-Wu ’18]  Symmetric group: Log-Sobolev constant Θ(1/n log n) 

How did Lee−Yau do it for the Slice? And why can’t you do the same for the Multislice? Chain rule for KL-divergence… Averaging over a random coordinate… An induction… In the Slice, there is only one kind of step: swapping a 0 and a 1. Even in the ternary Multislice, there are multiple kinds of steps: swapping 1 & 2, swapping 1 & 3, swapping 2 & 3.

How did Lee−Yau do it for the Slice? And why can’t you do the same for the Multislice? Chain rule for KL-divergence… Averaging over a random coordinate… An induction… The induction becomes much more complicated. Doesn’t look like it’s going to work. But then one of the coauthors just makes it work. 

for more interesting Markov chains! Open Directions Log-Sobolev inequalities (and hypercontractivity, and Small Set Expansion) for more interesting Markov chains! And if they turn out badly, try to classify the small sets for which SSE fails!

The End - Thanks!