Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.

Slides:



Advertisements
Similar presentations
Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.
Advertisements

COMPLEXITY THEORY CSci 5403 LECTURE XVI: COUNTING PROBLEMS AND RANDOMIZED REDUCTIONS.
Shortest Vector In A Lattice is NP-Hard to approximate
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
DNF Sparsification and Counting Raghu Meka (IAS, Princeton) Parikshit Gopalan (MSR, SVC) Omer Reingold (MSR, SVC)
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Counting Algorithms for Knapsack and Related Problems 1 Raghu Meka (UT Austin, work done at MSR, SVC) Parikshit Gopalan (Microsoft Research, SVC) Adam.
Approximation Algorithms Chapter 28: Counting Problems 2003/06/17.
1 Chapter 15 System Errors Revisited Ali Erol 10/19/2005.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Noga Alon Institute for Advanced Study and Tel Aviv University
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Chapter 10 Simple Regression.
Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Review.
Satyaki Mahalanabis Daniel Štefankovi č University of Rochester Density estimation in linear time (+approximating L 1 -distances)
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
Stat 217 – Day 17 Topic 14: Sampling Distributions with quantitative data.
CSE182-L17 Clustering Population Genetics: Basics.
Chapter 8 Estimation: Single Population
Chapter 11 Multiple Regression.
Sampling and Approximate Counting for Weighted Matchings Roy Cagan.
Approximating The Permanent Amit Kagan Seminar in Complexity 04/06/2001.
Continuous Random Variables and Probability Distributions
Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems.
1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.
Standard error of estimate & Confidence interval.
Review of normal distribution. Exercise Solution.
Counting Euler tours? Qi Ge Daniel Štefankovi č University of Rochester.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
An Algorithmic Proof of the Lopsided Lovasz Local Lemma Nick Harvey University of British Columbia Jan Vondrak IBM Almaden TexPoint fonts used in EMF.
Algorithms to Approximately Count and Sample Conforming Colorings of Graphs Sarah Miracle and Dana Randall Georgia Institute of Technology (B,B)(B,B) (R,B)(R,B)
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
An FPTAS for #Knapsack and Related Counting Problems Parikshit Gopalan Adam Klivans Raghu Meka Daniel Štefankovi č Santosh Vempala Eric Vigoda.
Approximate schemas Michel de Rougemont, LRI, University Paris II.
STA Lecture 181 STA 291 Lecture 18 Exam II Next Tuesday 5-7pm Memorial Hall (Same place) Makeup Exam 7:15pm – 9:15pm Location TBA.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Approximation Algorithms These lecture slides are adapted from CLRS.
LECTURE 25 THURSDAY, 19 NOVEMBER STA291 Fall
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
4.2 Binomial Distributions
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 8 Interval Estimation Population Mean:  Known Population Mean:  Known Population.
The final exam solutions. Part I, #1, Central limit theorem Let X1,X2, …, Xn be a sequence of i.i.d. random variables each having mean μ and variance.
BA 201 Lecture 13 Sample Size Determination and Ethical Issues.
Sampling and estimation Petter Mostad
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Lecture 9 State Space Gradient Descent Gibbs Sampler with Simulated Annealing.
Spatial decay of correlations and efficient methods for computing partition functions. David Gamarnik Joint work with Antar Bandyopadhyay (U of Chalmers),
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
Engineering Probability and Statistics - SE-205 -Chap 3 By S. O. Duffuaa.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
Counting and Sampling in Lattices: The Computer Science Perspective Dana Randall Advance Professor of Computing Georgia Institute of Technology.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
Monomer-dimer model and a new deterministic approximation algorithm for computing a permanent of a 0,1 matrix David Gamarnik MIT Joint work with Dmitriy.
Path Coupling And Approximate Counting
Lecture 18: Uniformity Testing Monotonicity Testing
Computability and Complexity
Haim Kaplan and Uri Zwick
Discrete Event Simulation - 4
Adaptive annealing: a near-optimal connection between
DNF Sparsification and Counting
Lecture 24 Vertex Cover and Hamiltonian Cycle
Presentation transcript:

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

independent sets spanning trees matchings perfect matchings k-colorings Counting

(approx) counting  sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86, random variables: X 1 X 2... X t E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” the outcome of the JVV reduction: such that 1) 2) squared coefficient of variation (SCV)

E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2) O(t 2 /  2 ) samples (O(t/  2 ) from each X i ) give 1  estimator of “WANTED” with prob  3/4 Theorem (Dyer-Frieze’91) (approx) counting  sampling

JVV for independent sets P( ) 1 # independent sets = GOAL: given a graph G, estimate the number of independent sets of G

JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i  [0,1] and E[X i ]  ½  = O(1) V[X i ] E[X i ] 2 P(A  B)=P(A)P(B|A)

SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples.

SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples. SAMPLER ORACLE  graph G set from gas-model Gibbs at  ŠVV: If we have a sampler oracle: then FPRAS using O * (n) samples.

O * ( |V| ) samples suffice for counting Application – independent sets Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O * ( |V| ) for graphs of degree  4. Total running time: O * ( |V| 2 ).

Other applications matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95) total running time

easy = hot hard = cold

1 2 4 Hamiltonian 0

H :  {0,...,n} Big set =  Goal: estimate |H -1 (0)| |H -1 (0)| = E[X 1 ]... E[X t ]

Distributions between hot and cold   (x)  exp(-H(x)  )  = inverse temperature  = 0  hot  uniform on   =   cold  uniform on H -1 (0) (Gibbs distributions)

  (x)  Normalizing factor = partition function exp(-H(x)  ) Z(  )=  exp(-H(x)  ) x  Z(  ) Distributions between hot and cold   (x)  exp(-H(x)  )

Partition function Z(  )=  exp(-H(x)  ) x  have: Z(0) = |  | want: Z(  ) = |H -1 (0)|

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   SAMPLER ORACLE graph G  subset of V from  

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W   

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W    X = exp(H(W)(  -  ))

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W    X = exp(H(W)(  -  )) E[X] =    (s) X(s) s  = Z(  ) Z(  ) can obtain the following ratio:

Partition function Z(  ) =  exp(-H(x)  ) x  Our goal restated Goal: estimate Z(  )=|H -1 (0)| Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)  0 = 0 <  1 <  2 <... <  t = ...

Our goal restated Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

Parameters: A and n Z(  ) = A H:  {0,...,n} Z(  ) =  exp(-H(x)  ) x  Z(  ) = a k e -  k  k=0 n a k = |H -1 (k)|

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V!

Previous cooling schedules Z(  ) = A H:  {0,...,n}   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) )  0 = 0 <  1 <  2 <... <  t =  (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)

No better fixed schedule possible Z(  ) = A H:  {0,...,n} Z a (  ) = (1 + a e ) A 1+a -  n A schedule that works for all (with a  [0,A-1]) has LENGTH   ( (ln n)(ln A) )

Parameters Z(  ) = A H:  {0,...,n} Our main result: non-adaptive schedules of length  * ( ln A ) Previously: can get adaptive schedule of length O * ( (ln A) 1/2 )

Existential part for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) Lemma: can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists

W    X = exp(H(W)(  -  )) E[X 2 ] E[X] 2 Z(2  -  ) Z(  ) Z(  ) 2 =  C E[X] Z(  ) Z(  ) = Express SCV using partition function (going from  to  )

f(  )=ln Z(  ) Proof: E[X 2 ] E[X] 2 Z(2  -  ) Z(  ) Z(  ) 2 =  C  C’=(ln C)/2  2-2-

f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A Proof: either f or f’ changes a lot Let K:=  f  (ln |f’|)  1 K 1

f:[a,b]  R, convex, decreasing can be “approximated” using f’(a) f’(b) (f(a)-f(b)) segments

Proof:  2-2- Technicality: getting to 2  - 

Proof:  2-2- ii  i+1 Technicality: getting to 2  - 

Proof:  2-2- ii  i+1  i+2 Technicality: getting to 2  - 

Proof:  2-2- ii  i+1  i+2 Technicality: getting to 2  -   i+3 ln ln A extra steps

Existential  Algorithmic can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists can get adaptive schedule of length O * ( (ln A) 1/2 )

Algorithmic construction   (x)  exp(-H(x)  ) Z(  ) using a sampler oracle for   we can construct a cooling schedule of length  38 (ln A) 1/2 (ln ln A)(ln n) Our main result: Total number of oracle calls  10 7 (ln A) (ln ln A+ln n) 7 ln (1/  )

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction X is “easy to estimate”

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction we make progress (assuming B 1  1)

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction need to construct a “feeler” for this

Algorithmic construction current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  need to construct a “feeler” for this = Z(  ) Z(  ) Z(2  ) Z(  )

Algorithmic construction current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  need to construct a “feeler” for this = Z(  ) Z(  ) Z(2  ) Z(  ) bad “feeler”

Rough estimator for Z(  ) Z(  ) Z(  ) = a k e -  k  k=0 n For W    we have P(H(W)=k) = a k e -  k Z(  )

Rough estimator for Z(  ) = a k e -  k  k=0n For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) If H(X)=k likely at both ,   rough estimator Z(  ) Z(  )

Rough estimator for For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) P(H(U)=k) P(H(W)=k) e k(  -  ) = Z(  ) Z(  ) Z(  ) Z(  )

Rough estimator for Z(  ) = a k e -  k  k=0 n For W    we have P(H(W)  [c,d]) = a k e -  k Z(  )  k=c d Z(  ) Z(  )

P(H(U)  [c,d]) P(H(W)  [c,d]) ee e c(  -  )  e 1  If |  -  |  |d-c|  1 then Rough estimator for We also need P(H(U)  [c,d]) P(H(W)  [c,d]) to be large. Z(  ) Z(  ) Z(  ) Z(  ) Z(  ) Z(  )

Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W)  I)  1/8h We say that I is HEAVY for 

Algorithm find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some  * ) use the interval I for the feeler repeat Z(  ) Z(  ) Z(2  ) Z(  ) either * make progress, or * eliminate the interval I

Algorithm find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some  * ) use the interval I for the feeler repeat Z(  ) Z(  ) Z(2  ) Z(  ) either * make progress, or * eliminate the interval I * or make a “long move”

if we have sampler oracles for   then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95)