Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.

Slides:

Advertisements

Similar presentations

Optimal Lower Bounds for 2-Query Locally Decodable Linear Codes Kenji Obata.

Advertisements

Lower Bounds for Local Search by Quantum Arguments Scott Aaronson (UC Berkeley) August 14, 2003.

Approximate Max-integral-flow/min-cut Theorems Kenji Obata UC Berkeley June 15, 2004.

Clustering in Interfering Binary Mixtures Sarah Miracle, Dana Randall, Amanda Streib Georgia Institute of Technology.

COMPLEXITY THEORY CSci 5403 LECTURE XVI: COUNTING PROBLEMS AND RANDOMIZED REDUCTIONS.

Domino Tilings of the Chessboard An Introduction to Sampling and Counting Dana Randall Schools of Computer Science and Mathematics Georgia Tech.

Equilibration and Unitary k- Designs Fernando G.S.L. Brandão UCL Joint work with Aram Harrow and Michal Horodecki arXiv: IMS, September 2013.

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

Monte Carlo Methods and Statistical Physics

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Randomized Algorithms Kyomin Jung KAIST Applied Algorithm Lab Jan 12, WSAC

Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: )

Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Random Walks Ben Hescott CS591a1 November 18, 2002.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

BAYESIAN INFERENCE Sampling techniques

Approximate Counting via Correlation Decay Pinyan Lu Microsoft Research.

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

1 Hierarchical Image-Motion Segmentation using Swendsen-Wang Cuts Adrian Barbu Siemens Corporate Research Princeton, NJ Acknowledgements: S.C. Zhu, Y.N.

1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman.

Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.

A different view of independent sets in bipartite graphs Qi Ge Daniel Štefankovič University of Rochester.

1 On the Computation of the Permanent Dana Moshkovitz.

Sampling and Approximate Counting for Weighted Matchings Roy Cagan.

Approximating The Permanent Amit Kagan Seminar in Complexity 04/06/2001.

1 Biased card shuffling and the asymmetric exclusion process Elchanan Mossel, Microsoft Research Joint work with Itai Benjamini, Microsoft Research Noam.

Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.

Mixing Times of Self-Organizing Lists and Biased Permutations Sarah Miracle Georgia Institute of Technology.

Proving Non-Reconstruction on Trees by an Iterative Algorithm Elitza Maneva University of Barcelona joint work with N. Bhatnagar, Hebrew University.

Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.

1 MCMC Style Sampling / Counting for SAT Can we extend SAT/CSP techniques to solve harder counting/sampling problems? Such an extension would lead us to.

The Power and Weakness of Randomness (when you are short on time) Avi Wigderson School of Mathematics Institute for Advanced Study.

Algorithms to Approximately Count and Sample Conforming Colorings of Graphs Sarah Miracle and Dana Randall Georgia Institute of Technology (B,B)(B,B) (R,B)(R,B)

Volume computation László Lovász Microsoft Research

Why is it useful to walk randomly? László Lovász Mathematical Institute Eötvös Loránd University October

Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen

An FPTAS for #Knapsack and Related Counting Problems Parikshit Gopalan Adam Klivans Raghu Meka Daniel Štefankovi č Santosh Vempala Eric Vigoda.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

13. Extended Ensemble Methods. Slow Dynamics at First- Order Phase Transition At first-order phase transition, the longest time scale is controlled by.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

Effective-Resistance-Reducing Flows, Spectrally Thin Trees, and ATSP Nima Anari UC Berkeley Shayan Oveis Gharan Univ of Washington.

Graph Partitioning using Single Commodity Flows

CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct

Date: 2005/4/25 Advisor: Sy-Yen Kuo Speaker: Szu-Chi Wang.

Spatial decay of correlations and efficient methods for computing partition functions. David Gamarnik Joint work with Antar Bandyopadhyay (U of Chalmers),

geometric representations of graphs

The Poincaré Constant of a Random Walk in High- Dimensional Convex Bodies Ivona Bezáková Thesis Advisor: Prof. Eric Vigoda.

Sampling algorithms and Markov chains László Lovász Microsoft Research One Microsoft Way, Redmond, WA 98052

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

Ch 6. Markov Random Fields 6.1 ~ 6.3 Adaptive Cooperative Systems, Martin Beckerman, Summarized by H.-W. Lim Biointelligence Laboratory, Seoul National.

Counting and Sampling in Lattices: The Computer Science Perspective Dana Randall Advance Professor of Computing Georgia Institute of Technology.

Domino Tilings of the Chessboard Dana Randall Computer Science and Mathematics Depts. Georgia Institute of Technology.

Shuffling by semi-random transpositions Elchanan Mossel, U.C. Berkeley Joint work with Yuval Peres and Alistair Sinclair.

Equitable Rectangular Dissections Dana Randall Georgia Institute of Technology Joint with: Sarah Cannon and Sarah Miracle.

Path Coupling And Approximate Counting

R. Srikant University of Illinois at Urbana-Champaign

Glauber Dynamics on Trees and Hyperbolic Graphs

Log-Sobolev Inequality on the Multislice (and what those words mean)

Phase Transitions In Reconstruction Yuval Peres, U.C. Berkeley

Markov chain monte carlo

Markov Networks.

Haim Kaplan and Uri Zwick

Dana Randall Georgia Tech

Markov Chain Monte Carlo: Metropolis and Glauber Chains

Slow Mixing of Local Dynamics via Topological Obstructions

15th Scandinavian Workshop on Algorithm Theory

Presentation transcript:

Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech

Introduction: Approximate Counting Problems MatchingsIndependent Sets Partition functions of Ising, Potts models Volume of a convex body # P-hard : We dont expect efficient exact algorithms.

Introduction: Approximate Counting and Sampling Theorem: Jerrum-Valiant-Vazirani '86 For "self-reducible" problems, Approximately Uniform Generation Approximate Counting Approximate Sampling Approximate Counting Matchings Colorings Independent Sets Volume of a convex body

lim Pr[X t = Y | X 0 ] = π(Y) t Markov Chains K = (Ω, P) Theorem: If K is connected and aperiodic, the Markov chain X 0,X 1,... converges in the limit to a unique stationary distribution π over Ω. P(X,Y) P(Y,X) If P(X,Y) = P(Y,X), π is uniform over Ω.

MatchingsIndependent Sets Partition functions of Ising, Potts models Volume of a convex body Broders Markov chainGlauber dynamics Ball walk, Lattice walk Markov Chains

Introduction: Markov Chain Monte Carlo Markov Chains: Matchings – Broders Markov chain Colorings – Glauber dynamics Independent Sets – Glauber dynamics Ising, Potts model – Glauber dynamics Volume – Ball walk, Lattice walk Mixing Time, T: time to get within 1/4 in variation distance to π. Rapid mixing (polynomial), slowly mixing (exponential). Techniques for proving rapid mixing: Coupling, Spectral Gap, Conductance and isoperimetry, Multicommodity flows, Decomposition, Comparison... What if natural Markov chain is slowly mixing?

The q-state Potts Model q-state Ferromagnetic Potts Model: Underlying graph: G(V,E) Configurations Ω = { x : x [q] n } Inverse temperature β > 0, π β (x) e β( H(x)) H(x) = Σ δ x i = x j Glauber dynamics Markov Chain Choose (v, c t+1 (v)) R V x [q]. Update c t (v) to c t+1 (v) with Metropolis probabilities. (i,j)

Why Simulated Tempering π β (x) H(x) Glauber dynamics mixes slowly for the q-state Potts for K n for q 2, at large enough β. Φ S = P[ X t+1 S | X t ~ π(S)] S ScSc Theorem : T c 1 Φ c 2 Φ2 Φ2 Φ = min Φ S S: π(S) ½ Conductance: [Jerrum-Sinclair 89, Lawler-Sokal 88]

Simulated Tempering [Marinari-Parisi 92] Define inverse temperatures 0 = β 0 β M = β and distributions π 0 π 1 π M = π β on Ω. i = M · i M … … π M π (x,i) = ˆ 1 M+1 π i (x) Tempering Markov Chain: From ( x,i ), W.p. ½, Glauber dynamics at β i W.p. ½, randomly move to (x,i ±1) π 0 ˆ Ω = Ω × [M+1],

Swapping [Geyer 91] Define inverse temperatures 0 = β 0 β M = β and distributions π 0 π 1 π M = π β on Ω. i = M · i M … … π M π (x) = Π ˆ π i (x i ) Swapping Markov Chain: From x, choose random i W.p. ½, Glauber dynamics at β i W.p. ½, move to x (i,i+1) π 0 ˆ Ω = Ω [M+1], i

Theoretical Results Madras-Zheng 99: Tempering mixes rapidly at all temperatures for the ferromagnetic Ising model (Potts model, q = 2) on K n. Rapid mixing for symmetric bimodal exponential distribution on an interval. Zheng 99: Rapid mixing of swapping implies tempering mixes rapidly. B-Randall 04: Simulated Tempering mixes slowly for 3 state ferromagnetic Potts model on K n. Modified swapping algorithm is rapidly mixing for mean-field Ising model with an external field. Woodard, Schmidler, Huber 08: Sufficient conditions for rapid mixing of tempering and swapping. Sufficient conditions for torpid mixing of tempering and swapping.

In This Talk: B-Randall 04: Tempering and swapping for the mean-field Potts model. Slow Mixing. Tempering can be slowly mixing for any choice of temperatures. Rapid Mixing Alternative tempered distributions for rapid mixing.

Tempering for Potts Model Theorem [ BR ]: There exists β crit > 0, such that tempering for Potts model on K n at β crit mixes slowly. (0,0,n) Proof idea: Bound conductance on Ω = Ω × [M+1]. Cut depends on number of vertices of each color. Induces the same cut on Ω at each β i The space Ω partitioned into equivalence classes σ: ˆ (n/2, 0, n/2) (n,0,0)

Stationary Distribution of Tempering Chain At β 0 At β i > 0 π i (σ) n σ R σ B σ G e β i ( ) (σ R ) 2 + (σ B ) 2 + (σ G ) 2 π i (σ) n σ R σ B σ G

Stationary Distribution of Tempering Chain At β crit At β 0 … At 0 < β i < β crit disordered mode ordered mode π i (σ) n σ R σ B σ G e β i ( ) (σ R ) 2 + (σ B ) 2 + (σ G ) 2 …

Tempering Fails to Converge β crit β0β0 … 0 < β i < β crit … At β crit tempering mixes slowly for any set of intermediate temperatures.

Swapping and Tempering for Assymetric Distributions – Rapid Mixing Assymetric exponential Ising Model with an external Field Potts model on K R, the line σ B = σ G n/3 π β (x) e β( H(x)) H(x) = Σ δ x i = x j + B Σ δ x i =+ (i,j) i π(x) C |x|, x [-n 1,n 2 ] n 1 > n 2 n π β (x) e β( H(x)) H(x) = Σ δ x i = x j

Decomposition of Swapping Chain π i (x) C |x| i M Madras-Randall 02 Decomposition for Markov chains 1.Mixing of restricted chains R 0,i and R 1,i at each temperature. 2.Mixing of the projection chain P. T swap C min T R b,i x T P b {0,1}, i M …

Decomposition of Swapping Chain π i (x) C |x| i M Projection for Swapping chain …

Decomposition of Swapping Chain Projection for Swapping chain Weighted Cube (WC)

Decomposition of Swapping Chain Projection for Swapping chain Weighted Cube (WC) Upto polynomials, π i (0) C n 1 i / M /Z i and π i (1) C n 2 i / M /Z i Lemma: If for i > j, π i (1) π j (0) p(n)π i (0) π j (1), then T P q(n) T WC.

Modify more than just temperature Define π M … π 0 so cut is not preserved. … … Flat-Swap: Fast Mixing for Mean-Field Models π i (σ) n σ R σ B σ G e β i ( ) (σ R ) 2 + (σ B ) 2 + (σ G ) 2

Modify more than just temperature Define π M … π 0 so cut is not preserved. Flat-Swap: Fast Mixing for Mean-Field Models π i (σ) n σ R σ B σ G e β i ( ) (σ R ) 2 + (σ B ) 2 + (σ G ) 2 … … i M π i (σ) = π i (σ) f i (σ) = π i (σ) n σ R σ B σ G i-M M

Modify more than just temperature Define π M … π 0 so cut is not preserved. Flat Swap for Mean-Field Models Theorem [B-Randall] : Flat swap for the 3-state Potts model onb K R using the distributions π M … π 0 mixes rapidly at every temperature. Flat swap mixes rapidly for the mean field Ising model at every temperature and for any external field B. Lemma: For i > j, π i (0) π j (1) p(n)π i (1) π j (0)

Summary and Open problems Simulated tempering algorithms for other problems? Relative complexity of swapping and tempering Open Problems Summary Insight into why tempering can fail to converge. Designing more robust tempering algorithms.

… … 0 S M > crit Tempering vs. Fixed Temperature n Theorem[BR]: On the line K R, σ G = σ B n/3, Tempering mixes slower than Metropolis at M > crit by an exponential factor.