Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.

Similar presentations


Presentation on theme: "Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia."— Presentation transcript:

1 Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

2 Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) If you want to count using MCMC then statistical physics is useful.

3 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Outline

4 independent sets spanning trees matchings perfect matchings k-colorings Counting

5 independent sets spanning trees matchings perfect matchings k-colorings Counting

6 spanning trees Compute the number of

7 spanning trees Compute the number of det(D – A) vv 0101 1010 0101 1010 2000 0200 0020 0002 2 0 2 0 2 Kirchhoff’s Matrix Tree Theorem: - DA det

8 spanning trees Compute the number of polynomial-time algorithm G number of spanning trees of G

9 independent sets spanning trees matchings perfect matchings k-colorings Counting ?

10 independent sets Compute the number of (hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =

11 # independent sets = 7 independent set = subset S of vertices no two in S are neighbors

12 ... # independent sets = G1G1 G2G2 G3G3 GnGn... G n-2... G n-1

13 ... # independent sets = G1G1 G2G2 G3G3 GnGn... G n-2... G n-1 2 3 5 F n-1 FnFn F n+1

14 # independent sets = 5598861 independent set = subset S of vertices no two in S are neighbors

15 independent sets Compute the number of polynomial-time algorithm G number of independent sets of G ?

16 independent sets Compute the number of polynomial-time algorithm G number of independent sets of G ! (unlikely)

17 #P-complete #P-complete even for 3-regular graphs graph G  # independent sets in G (Dyer, Greenhill, 1997) FP #P P NP

18 graph G  # independent sets in G approximation randomization ?

19 graph G  # independent sets in G approximation randomization ? which is more important?

20 graph G  # independent sets in G approximation randomization ? which is more important? My world-view: (true) randomness is important conceptually but NOT computationally (i.e., I believe P=BPP). approximation makes problems easier (i.e., I believe #P=BPP)

21 We would like to know Q Goal: random variable Y such that P( (1-  )Q  Y  (1+  )Q )  1-  “Y gives (1  -estimate”

22 We would like to know Q Goal: random variable Y such that P( (1-  )Q  Y  (1+  )Q )  1-  polynomial-time algorithm G, ,  FPRAS: Y (fully polynomial randomized approximation scheme):

23 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

24 We would like to know Q 1. Get an unbiased estimator X, i. e., E[X] = Q Y= X 1 + X 2 +... + X n n 2. “Boost the quality” of X:

25 P( Y gives (1  )-estimate )  1 - The Bienaymé-Chebyshev inequality V[Y] E[Y] 2 1 

26 P( Y gives (1  )-estimate )  1 - Y= X 1 + X 2 +... + X n n The Bienaymé-Chebyshev inequality V[Y] E[Y] 2 = 1 V[X] E[X] 2 n squared coefficient of variation SCV V[Y] E[Y] 2 1  

27 P( Y gives (1  )-estimate of Q ) Let X 1,...,X n,X be independent, identically distributed random variables, Q=E[X]. Let The Bienaymé-Chebyshev inequality  1 - V[X] n E[X] 2 1  Then Y= X 1 + X 2 +... + X n n

28 P( Y gives (1  )-estimate of Q ) -  2. n. E[X] / 3  1 – Let X 1,...,X n,X be independent, identically distributed random variables, 0  X  1, Q=E[X]. Let Chernoff’s bound Y= X 1 + X 2 +... + X n n Then e

29

30 n  V[X] E[X] 2 1  1  n  1 E[X] 3  ln (1/  ) 0X10X1 Number of samples to achieve precision  with confidence 

31 n  V[X] E[X] 2 1  1  n  1 E[X] 3  ln (1/  ) 0X10X1 Number of samples to achieve precision  with confidence  BAD GOOD BAD

32 Median “boosting trick” P(  )  3/4 n  1 E[X] 4  (1-  )Q(1+  )Q Y= X 1 + X 2 +... + X n n Y = BY BIENAYME-CHEBYSHEV:

33 Median trick – repeat 2T times (1-  )Q(1+  )Q P(  )  3/4 P( )  1 - e -T/4 > T out of 2T median is in   P( )  1 - e -T/4 BY BIENAYME-CHEBYSHEV: BY CHERNOFF:

34 n  V[X] E[X] 2 32  n  1 E[X] 3  ln (1/  ) 0X10X1 + median trick ln (1/  ) BAD

35 n  V[X] E[X] 2 1  ln (1/  )   Creating “approximator” from X  = precision  = confidence

36 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

37 (approx) counting  sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 random variables: X 1 X 2... X t E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” the outcome of the JVV reduction: such that 1) 2) squared coefficient of variation (SCV)

38 E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2) O(t 2 /  2 ) samples (O(t/  2 ) from each X i ) give 1  estimator of “WANTED” with prob  3/4 Theorem (Dyer-Frieze’91) (approx) counting  sampling

39 JVV for independent sets P( ) 1 # independent sets = GOAL: given a graph G, estimate the number of independent sets of G

40 JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i  [0,1] and E[X i ]  ½  = O(1) V[X i ] E[X i ] 2 P(A  B)=P(A)P(B|A)

41 JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i  [0,1] and E[X i ]  ½  = O(1) V[X i ] E[X i ] 2 P(A  B)=P(A)P(B|A)

42 Self-reducibility for independent sets ? ? ? P( ) 5 7 =

43 ? ? ? 5 7 = 5 7 = Self-reducibility for independent sets

44 ? ? ? P( ) 5 7 = 5 7 = 5 7 = Self-reducibility for independent sets

45 ? ? P( ) 3 5 = 3 5 = Self-reducibility for independent sets

46 ? ? P( ) 3 5 = 3 5 = 3 5 = Self-reducibility for independent sets

47 3 5 5 7 = 5 7 = 3 5 5 7 = 2 3 = 7 Self-reducibility for independent sets

48 SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples.

49 SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples. SAMPLER ORACLE  graph G set from gas-model Gibbs at  ŠVV: If we have a sampler oracle: then FPRAS using O * (n) samples.

50 O * ( |V| ) samples suffice for counting Application – independent sets Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O * ( |V| ) for graphs of degree  4. Total running time: O * ( |V| 2 ).

51 Other applications matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95) total running time

52 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Outline

53 easy = hot hard = cold

54 Hamiltonian 1 2 4 0

55 H :  {0,...,n} Big set =  Goal: estimate |H -1 (0)| |H -1 (0)| = E[X 1 ]... E[X t ]

56 Distributions between hot and cold   (x)  exp(-H(x)  )  = inverse temperature  = 0  hot  uniform on   =   cold  uniform on H -1 (0) (Gibbs distributions)

57   (x)  Normalizing factor = partition function exp(-H(x)  ) Z(  )=  exp(-H(x)  ) x  Z(  ) Distributions between hot and cold   (x)  exp(-H(x)  )

58 Partition function Z(  )=  exp(-H(x)  ) x  have: Z(0) = |  | want: Z(  ) = |H -1 (0)|

59 Partition function - example Z(  )=  exp(-H(x)  ) x  have: Z(0) = |  | want: Z(  ) = |H -1 (0)| 1 2 4 0 Z(  ) = 1 e -4.  + 4 e -2.  + 4 e -1.  + 7 e -0.  Z(0) = 16 Z(  )=7

60   (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   SAMPLER ORACLE graph G  subset of V from  

61   (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W   

62   (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W    X = exp(H(W)(  -  ))

63   (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W    X = exp(H(W)(  -  )) E[X] =    (s) X(s) s  = Z(  ) Z(  ) can obtain the following ratio:

64 Partition function Z(  ) =  exp(-H(x)  ) x  Our goal restated Goal: estimate Z(  )=|H -1 (0)| Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)  0 = 0 <  1 <  2 <... <  t = ...

65 Our goal restated Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

66 Our goal restated Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

67 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

68 Parameters: A and n Z(  ) = A H:  {0,...,n} Z(  ) =  exp(-H(x)  ) x  Z(  ) = a k e -  k  k=0 n a k = |H -1 (k)|

69 Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V!

70 Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! matchings = # ways of marrying them so that no unhappy couple

71 Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! matchings = # ways of marrying them so that no unhappy couple

72 Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! matchings = # ways of marrying them so that no unhappy couple

73 Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! marry ignoring “compatibility” hamiltonian = number of unhappy couples

74 Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V!

75 Previous cooling schedules Z(  ) = A H:  {0,...,n}   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) )  0 = 0 <  1 <  2 <... <  t =  (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)

76 Previous cooling schedules Z(  ) = A H:  {0,...,n}   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) )  0 = 0 <  1 <  2 <... <  t =  (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)

77   + 1/n    (1 + 1/ln A) ln A   “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z(  ) = a k e -  k  k=0 n W    X = exp(H(W)(  -  )) 1/e  X  1 V[X] E[X] 2  e 1 E[X]

78   + 1/n    (1 + 1/ln A) ln A   “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z(  ) = a k e -  k  k=0 n W    X = exp(H(W)(  -  )) Z(  ) = a 0  1 Z(ln A)  a 0 + 1 E[X]  1/2

79   + 1/n    (1 + 1/ln A) ln A   “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z(  ) = a k e -  k  k=0 n W    X = exp(H(W)(  -  )) E[X]  1/2e

80 Previous cooling schedules   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) ) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) 1/n, 2/n, 3/n,...., (ln A)/n,...., ln A

81 No better fixed schedule possible Z(  ) = A H:  {0,...,n} Z a (  ) = (1 + a e ) A 1+a -  n A schedule that works for all (with a  [0,A-1]) has LENGTH   ( (ln n)(ln A) ) THEOREM:

82 Parameters Z(  ) = A H:  {0,...,n} Our main result: non-adaptive schedules of length  * ( ln A ) Previously: can get adaptive schedule of length O * ( (ln A) 1/2 )

83 Related work can get adaptive schedule of length O * ( (ln A) 1/2 ) Lovász-Vempala Volume of convex bodies in O * (n 4 ) schedule of length O(n 1/2 ) (non-adaptive cooling schedule, using specific properties of the “volume” partition functions)

84 Existential part for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) Lemma: can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists

85 Cooling schedule (definition refresh) Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

86 W    X = exp(H(W)(  -  )) E[X 2 ] E[X] 2 Z(2  -  ) Z(  ) Z(  ) 2 =  C E[X] Z(  ) Z(  ) = Express SCV using partition function (going from  to  ) V[X] E[X] 2 +1 =

87 f(  )=ln Z(  ) Proof: E[X 2 ] E[X] 2 Z(2  -  ) Z(  ) Z(  ) 2 =  C  C’=(ln C)/2  2-2- (f(2  -  ) + f(  ))/2  (ln C)/2 + f(  ) graph of f

88 f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A Properties of partition functions

89 f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A f(  ) = ln a k e -  k  k=0 n f’(  ) = a k k e -  k  k=0 - n a k e -  k  k=0 n Properties of partition functions (ln f)’ = f’ f

90 f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A Proof: either f or f’ changes a lot Let K:=  f  (ln |f’|)  1 K 1 Then for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) GOAL: proving Lemma:

91 Proof: Let K:=  f  (ln |f’|)  1 K 1 Then c := (a+b)/2,  := b-a have f(c) = (f(a)+f(b))/2 – 1 (f(a) – f(c)) /   f’(a) (f(c) – f(b)) /   f’(b) a b c f is convex

92 Let K:=  f  (ln |f’|)  1 K Then c := (a+b)/2,  := b-a have f(c) = (f(a)+f(b))/2 – 1 (f(a) – f(c)) /   f’(a) (f(c) – f(b)) /   f’(b) f is convex f’(b) f’(a)  1-1/  f  e -  f

93 f:[a,b]  R, convex, decreasing can be “approximated” using f’(a) f’(b) (f(a)-f(b)) segments

94 Proof:  2-2- Technicality: getting to 2  - 

95 Proof:  2-2- ii  i+1 Technicality: getting to 2  - 

96 Proof:  2-2- ii  i+1  i+2 Technicality: getting to 2  - 

97 Proof:  2-2- ii  i+1  i+2 Technicality: getting to 2  -   i+3 ln ln A extra steps

98 Existential  Algorithmic can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists can get adaptive schedule of length O * ( (ln A) 1/2 )

99 Algorithmic construction   (x)  exp(-H(x)  ) Z(  ) using a sampler oracle for   we can construct a cooling schedule of length  38 (ln A) 1/2 (ln ln A)(ln n) Our main result: Total number of oracle calls  10 7 (ln A) (ln ln A+ln n) 7 ln (1/  )

100 current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction

101 current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction X is “easy to estimate”

102 current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction we make progress (where B 1  1)

103 current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction need to construct a “feeler” for this

104 Algorithmic construction current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  need to construct a “feeler” for this = Z(  ) Z(  ) Z(2  ) Z(  )

105 Algorithmic construction current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  need to construct a “feeler” for this = Z(  ) Z(  ) Z(2  ) Z(  ) bad “feeler”

106 estimator for Z(  ) Z(  ) Z(  ) = a k e -  k  k=0 n For W    we have P(H(W)=k) = a k e -  k Z(  )

107 Z(  ) = a k e -  k  k=0n For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) If H(X)=k likely at both ,   estimator Z(  ) Z(  ) estimator for

108 Z(  ) = a k e -  k  k=0n For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) If H(X)=k likely at both ,   estimator Z(  ) Z(  ) estimator for

109 For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) P(H(U)=k) P(H(W)=k) e k(  -  ) = Z(  ) Z(  ) Z(  ) Z(  ) estimator for

110 For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) P(H(U)=k) P(H(W)=k) e k(  -  ) = Z(  ) Z(  ) Z(  ) Z(  ) PROBLEM: P(H(W)=k) can be too small estimator for

111 Rough estimator for Z(  ) = a k e -  k  k=0 n For W    we have P(H(W)  [c,d]) = a k e -  k Z(  )  k=c d Z(  ) Z(  ) For U    we have P(H(W)  [c,d]) = a k e -  k Z(  )  k=c d interval instead of single value

112 P(H(U)  [c,d]) P(H(W)  [c,d]) ee e c(  -  )  e 1  If |  -  |  |d-c|  1 then Rough estimator for We also need P(H(U)  [c,d]) P(H(W)  [c,d]) to be large. Z(  ) Z(  ) Z(  ) Z(  ) Z(  ) Z(  ) a k e -  k  k=c d a k e -  k  k=c d e c(  -  ) = a k e -  (k-c)  k=c d a k e -  (k-c)  d k=c

113 Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W)  I)  1/8h We say that I is HEAVY for  We will:

114 Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W)  I)  1/8h We say that I is HEAVY for  We will:

115 Algorithm find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some  * ) use the interval I for the feeler repeat Z(  ) Z(  ) Z(2  ) Z(  ) either * make progress, or * eliminate the interval I * or make a “long move” ANALYSIS:

116 distribution of h(X) where X  ... I = a heavy interval at   I is heavy

117 distribution of h(X) where X  ... I = a heavy interval at    no longer heavy at  ! I is NOT heavy I is heavy

118 distribution of h(X) where X   ’... I = a heavy interval at   ’’ heavy at  ’  I is heavy I is heavy I is NOT heavy

119  I is heavy I is heavy I is NOT heavy I is heavy I is NOT heavy use binary search to find  * **  * +1/(2n)  = min{1/(b-a), ln A} I=[a,b] ’’ 

120  I is heavy I is heavy I is NOT heavy I is heavy I is NOT heavy use binary search to find  * **  * +1/(2n)  = min{1/(b-a), ln A} I=[a,b] How do you know that you can use binary search? ’’ 

121 I is heavy I is heavy How do you know that you can use binary search? I is NOT heavy I is NOT heavy Lemma: the set of temperatures for which I is h-heavy is an interval. a k e -  k  k=0 n a k e -  k  kIkI  1 8h P(h(X)  I)  1/8h for X   I is h-heavy at 

122 How do you know that you can use binary search? a k e -  k  k=0 n a k e -  k  kIkI  1 8h c 0 x 0 + c 1 x 1 + c 2 x 2 +.... + c n x n Descarte’s rule of signs: x=e -  + - ++ sign change number of positive roots  number of sign changes

123 How do you know that you can use binary search? a k e -  k  k=0 n a k e -  k  kIkI  1 h c 0 x 0 + c 1 x 1 + c 2 x 2 +.... + c n x n Descarte’s rule of signs: x=e -  + ++ sign change number of positive roots  number of sign changes -1+x+x 2 +x 3 +...+x n 1+x+x 20 -

124 How do you know that you can use binary search? a k e -  k  k=0 n a k e -  k  kIkI  1 8h c 0 x 0 + c 1 x 1 + c 2 x 2 +.... + c n x n Descarte’s rule of signs: x=e -  + ++ sign change number of positive roots  number of sign changes -

125  I is heavy I is heavy I is NOT heavy **  * +1/(2n) can roughly compute ratio of Z(  )/Z(  ’) for  ’  [ ,  * ] if |  -  |.|b-a|  1 I=[a,b]

126  I is heavy I is heavy I is NOT heavy **  * +1/(2n) can roughly compute ratio of Z(  )/Z(  ’) for  ’  [ ,  * ] if |  -  |.|b-a|  1 I=[a,b] find largest  such that Z(  ) Z(  ) Z(2  ) Z(  ) CC 1. success 2. eliminate interval 3. long move

127

128 if we have sampler oracles for   then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95)

129 1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

130 6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline

131 O(t 2 /  2 ) samples (O(t/  2 ) from each X i ) give 1  estimator of “WANTED” with prob  3/4 Theorem (Dyer-Frieze’91) Appendix – proof of: E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2)

132 How precise do the X i have to be? First attempt – term by term (1  )(1  )(1  )... (1  )  1   t  t  t  t Main idea: each term  (t 2 ) samples   (t 3 ) total n  V[X] E[X] 2 1  ln (1/  )  

133 How precise do the X i have to be? Analyzing SCV is better (Dyer-Frieze’1991) P( X gives (1  )-estimate )  1 - V[X] E[X] 2 1  squared coefficient of variation (SCV) GOAL: SCV(X)   2 /4 X=X 1 X 2... X t

134 How precise do the X i have to be? (Dyer-Frieze’1991) SCV(X) = (1+SCV(X 1 ))... (1+SCV(X t )) - 1 Main idea: SCV(X i )     t  SCV(X) <     SCV(X)= V[X] E[X] 2 E[X 2 ] E[X] 2 = Analyzing SCV is better proof:

135 How precise do the X i have to be? (Dyer-Frieze’1991) SCV(X) = (1+SCV(X 1 ))... (1+SCV(X t )) - 1 Main idea: SCV(X i )     t  SCV(X) <     SCV(X)= V[X] E[X] 2 E[X 2 ] E[X] 2 = Analyzing SCV is better proof: X 1, X 2 independent  E[X 1 X 2 ] = E[X 1 ]E[X 2 ] X 1, X 2 independent  X 1 2,X 2 2 independent X 1,X 2 independent  SCV(X 1 X 2 )=(1+SCV(X 1 ))(1+SCV(X 2 ))-1

136 How precise do the X i have to be? (Dyer-Frieze’1991) X 1 X 2... X t X = Main idea: SCV(X i )     t  SCV(X) <     each term  (t /  2 ) samples   (t 2 /  2 ) total Analyzing SCV is better

137 6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline

138 1 2 4 Hamiltonian 0

139 Hamiltonian – many possibilities 0 1 2 (hardcore lattice gas model)

140 What would be a natural hamiltonian for planar graphs?

141 What would be a natural hamiltonian for planar graphs? H(G) = number of edges natural MC  (1+ ) 1  (1+ ) try G - {u,v} try G + {u,v} pick u,v uniformly at random

142 natural MC  (1+ ) 1  (1+ ) try G - {u,v} try G + {u,v} pick u,v uniformly at random u v u v  (1+ ) n(n-1)/2 1  (1+ ) n(n-1)/2 G G’

143 u v u v  (1+ ) n(n-1)/2 1  (1+ ) n(n-1)/2  G)  number of edges satisfies the detailed balance condition  (G) P(G,G’) =  (G’) P(G’,G) G G’ ( = exp(-  ))

144 6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline

145 Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 )  rel   mix   rel ln (1/  min )  n ln n)  n) (n=3) (discrepancy may be substantially bigger for, e.g., matchings)

146 Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Estimating  (S) 1 if X  S 0 otherwise Y= { X  E[Y]=  (S)... X1X1 X2X2 X3X3 XsXs METHOD 1

147 Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Estimating  (S) 1 if X  S 0 otherwise Y= { X  E[Y]=  (S)... X1X1 X2X2 X3X3 XsXs METHOD 1 X1X1 X2X2 X3X3... XsXs METHOD 2 (Gillman’98, Kahale’96,...)

148 Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Further speed-up X1X1 X2X2 X3X3... XsXs |  t -  | TV  exp(-t/  rel ) Var  (  0 /  ) (   (x)(  0 (x)/  (x)-1) 2 ) 1/2 small  called warm start METHOD 2 (Gillman’98, Kahale’96,...)

149 Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Further speed-up X1X1 X2X2 X3X3... XsXs METHOD 2 (Gillman’98, Kahale’96,...) |  t -  | TV  exp(-t/  rel ) Var  (  0 /  ) (   (x)(  0 (x)/  (x)-1) 2 ) 1/2 small  called warm start sample at  can be used as a warm start for  ’   cooling schedule can step from  ’ to 

150 sample at  can be used as a warm start for  ’   cooling schedule can step from  ’ to  00 11 22 33 mm.... = “well mixed” states m=O( (ln n)(ln A) )

151 00 11 22 33 mm.... = “well mixed” states XsXs X1X1 X2X2 X3X3... XsXs METHOD 2 run the our cooling-schedule algorithm with METHOD 2 using “well mixed” states as starting points

152 00 11 kk Output of our algorithm: k=O * ( (ln A) 1/2 ) small augmentation (so that we can use sample from current  as a warm start at next) still O * ( (ln A) 1/2 ) 00 11 22 33 mm.... Use analogue of Frieze-Dyer for independent samples from vector variables with slightly dependent coordinates.

153 if we have sampler oracles for   then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95)


Download ppt "Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia."

Similar presentations


Ads by Google