Markov Chains Mixing Times Lecture 5

Slides:

Advertisements

Similar presentations

Chapter 4 Euclidean Vector Spaces

Advertisements

3D Geometry for Computer Graphics

Applied Informatics Štefan BEREŽNÝ

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)

Entropy Rates of a Stochastic Process

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.

Algorithmic and Economic Aspects of Networks Nicole Immorlica.

6 6.1 © 2012 Pearson Education, Inc. Orthogonality and Least Squares INNER PRODUCT, LENGTH, AND ORTHOGONALITY.

Symmetric Matrices and Quadratic Forms

Chapter 5 Orthogonality

CSci 6971: Image Registration Lecture 2: Vectors and Matrices January 16, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI.

Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.

Lecture 20 SVD and Its Applications Shang-Hua Teng.

Lecture 18 Eigenvalue Problems II Shang-Hua Teng.

Orthogonality and Least Squares

MOHAMMAD IMRAN DEPARTMENT OF APPLIED SCIENCES JAHANGIRABAD EDUCATIONAL GROUP OF INSTITUTES.

6 6.1 © 2012 Pearson Education, Inc. Orthogonality and Least Squares INNER PRODUCT, LENGTH, AND ORTHOGONALITY.

Dirac Notation and Spectral decomposition

Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.

GROUPS & THEIR REPRESENTATIONS: a card shuffling approach Wayne Lawton Department of Mathematics National University of Singapore S ,

Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.

1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.

8.1 Vector spaces A set of vector is said to form a linear vector space V Chapter 8 Matrices and vector spaces.

Chapter 5 Orthogonality.

Gram-Schmidt Orthogonalization

Elementary Linear Algebra Anton & Rorres, 9th Edition

Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.

Computing Eigen Information for Small Matrices The eigen equation can be rearranged as follows: Ax = x  Ax = I n x  Ax - I n x = 0  (A - I n )x = 0.

Chapter 2 Nonnegative Matrices. 2-1 Introduction.

Chapter 5 MATRIX ALGEBRA: DETEMINANT, REVERSE, EIGENVALUES.

Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …

Chap. 5 Inner Product Spaces 5.1 Length and Dot Product in R n 5.2 Inner Product Spaces 5.3 Orthonormal Bases: Gram-Schmidt Process 5.4 Mathematical Models.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.

Presented by Alon Levin

Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.

1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.

Graphics Graphics Korea University kucg.korea.ac.kr Mathematics for Computer Graphics 고려대학교 컴퓨터 그래픽스 연구실.

Lecture XXVII. Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r  m.

Linear Algebra Review.

CS479/679 Pattern Recognition Dr. George Bebis

Markov Chains and Random Walks

Markov Chains and Mixing Times

Matrices and vector spaces

Industrial Engineering Dep

Random walks on undirected graphs and a little bit about Markov Chains

Postulates of Quantum Mechanics

Matrices and Vectors Review Objective

Path Coupling And Approximate Counting

Systems of First Order Linear Equations

GROUPS & THEIR REPRESENTATIONS: a card shuffling approach

Hidden Markov Models Part 2: Algorithms

Haim Kaplan and Uri Zwick

Numerical Analysis Lecture 16.

Derivative of scalar forms

Lecture 4: Algorithmic Methods for G/M/1 and M/G/1 type models

Chapter 3 Linear Algebra

Ilan Ben-Bassat Omri Weinstein

Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors.

Symmetric Matrices and Quadratic Forms

Maths for Signals and Systems Linear Algebra in Engineering Lectures 10-12, Tuesday 1st and Friday 4th November2016 DR TANIA STATHAKI READER (ASSOCIATE.

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

Maths for Signals and Systems Linear Algebra in Engineering Lecture 18, Friday 18th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.

Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.

Math review - scalars, vectors, and matrices

Linear Algebra: Matrix Eigenvalue Problems – Part 2

Lecture 20 SVD and Its Applications

Symmetric Matrices and Quadratic Forms

Presentation transcript:

Markov Chains Mixing Times Lecture 5 Omer Zentner 30.11.2016

Agenda Notes from previous lecture Mixing time Time reversal & reversed chain Winning Streak Time Reversal (example) A bit about time reversal in Random Walks on Groups Eigenvalues (and bounds for mixing time we get with them) Eigenvalues example

Notes from previous lecture We’ve seen the definition of Total Variation Distance “How similar 2 distributions are” 𝑃−𝑄 𝑇𝑉 = max 𝐴⊂Ω 𝑃 𝐴 −𝑄(𝐴) 𝑃−𝑄 𝑇𝑉 = 1 2 𝑥𝜖Ω 𝑃 𝑥 −𝑄(𝑥) We’ve used TV to define 2 distance measures Distance from stationary distribution 𝑑 𝑡 ≔ max 𝑥𝜖Ω 𝑃 𝑡 𝑥,∙ −𝜋 𝑇𝑉 Distance between rows of the transition matrix 𝑑 𝑡 ≔ max 𝑥,𝑦𝜖Ω 𝑃 𝑡 𝑥,∙ − 𝑃 𝑡 𝑦,∙ 𝑇𝑉

Notes from previous lecture We’ve seen some properties of the distance measures 𝑑 𝑡 ≤ 𝑑 𝑡 ≤2𝑑(𝑡) 𝑑 𝑠+𝑡 ≤ 𝑑 𝑠 𝑑 𝑡 From which we got 𝑑 𝑐𝑡 ≤ 𝑑 𝑐𝑡 ≤ 𝑑 𝑡 𝑐 ≤ (2𝑑(𝑡)) 𝑐 We’ll now continue with defining Mixing Time

Mixing Time Definitions 𝑡 𝑚𝑖𝑥 𝜀 ≔ min {𝑡 :𝑑 𝑡 ≤ 𝜀} 𝑡 𝑚𝑖𝑥 ≔ 𝑡 𝑚𝑖𝑥 1 4

Mixing Time 𝑑 𝑐𝑡 ≤ 𝑑 𝑐𝑡 ≤ 𝑑 𝑡 𝑐 From: 𝑑 𝑡 ≤ 𝑑 𝑡 ≤2𝑑(𝑡) 𝑑 𝑡 ≤ 𝑑 𝑡 ≤2𝑑(𝑡) We get: 𝑑(𝑙𝑡 𝑚𝑖𝑥 𝜀 )≤ 𝑑 𝑙𝑡 𝑚𝑖𝑥 𝜀 ≤ 𝑑 𝑡 𝑚𝑖𝑥 𝜀 𝑙 ≤ (2ε) 𝑙 And with 𝜀= 1 4 : 𝑑 𝑙𝑡 𝑚𝑖𝑥 ≤ (2) −𝑙 𝑑 𝑐𝑡 ≤ 𝑑 𝑐𝑡 ≤ 𝑑 𝑡 𝑐 1. Last inequality – from lemma 4.11 (second bullet in “from”) + definition of 𝑡 𝑚𝑖𝑥 𝜀

Time Reversal The time reversal of an irreducible Markov chain with transition matrix P and stationary distribution π is: 𝑃 𝑥,𝑦 ≔ 𝜋 𝑦 𝑃(𝑦,𝑥) 𝜋(𝑥) Let ( 𝑋 𝑡 ) be an irreducible Markov chain with transition matrix P and stationary distribution π We write ( 𝑋 𝑡 ) for the time reversed chain with transition matrix 𝑃 π is stationary for 𝑃 For every 𝑥 0 ,…, 𝑥 𝑡 𝜖Ω, we have: 𝑷( 𝑋 0 = 𝑥 0 ,…, 𝑋 𝑡 = 𝑥 𝑡 )=𝑷( 𝑋 0 = 𝑥 𝑡 ,…, 𝑋 𝑡 = 𝑥 0 ) Notes regarding time reversal: - If a chain is reversible than it’s transition matrix equals it’s time reversal - That is not the case we are going to look at – we will look at a chain and it’s time-reversed chain, with it’s time-reversal matrix

Time Reversal We’ve seen in lecture 2: If a transition matrix and the stationary distribution have the detailed balance property, then the chain is reversible: This means that the distribution of 𝑋 0 , 𝑋 1 ,…, 𝑋 𝑛 is the same as the distribution of 𝑋 𝑛 , 𝑋 𝑛−1 ,…, 𝑋 0 Follows from the detailed balance property When a chain is reversible, it’s time reversal is the same as itself. We have: 𝑃 = 𝑃

Time Reversal & reversed chains Reversible chain example – undirected graph Non reversible chain example – biased random walk on the n-cycle

Example – Winning streak Repeatedly tossing a fair coin, while keeping track of the length of last run of heads. Memory is limited – can only remember n last results. (𝑋 𝑡 )is a Markov chain with state space {0,…,n} Current state of chain is min⁡{𝑛, # 𝑙𝑎𝑠𝑡 𝑟𝑢𝑛 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠 } 𝑋 𝑡 =2 𝑋 𝑡+1 =3 𝑋 𝑡+2 =0

Example – Winning streak Transition matrix is given by (non-zero transitions): 𝑃 𝑖,0 = 1 2 for 0≤i≤n, 𝑃 𝑖,𝑖+1 = 1 2 for 0≤i<n 𝑃 𝑛,𝑛 = 1 2 - Can show drawing of P

Example – Winning streak stationary distribution for P 𝜋 𝑖 = 1 2 𝑖+1 𝑖𝑓 𝑖=0,1,…,𝑛−1 1 2 𝑛 𝑖𝑓 𝑖=𝑛 Can check: i=0, 𝜋𝑃 𝑖 = 1 2 ∗ 1 2 + 1 4 + …+ 1 2 𝑛 + 1 2 𝑛 0<i<n, 𝜋𝑃 𝑖 =0+0+ …+ 1 2 ∗ 1 2 𝑖 +0+ .. +0 i=n, 𝜋𝑃 𝑖 =0+0+ …+ 1 2 ∗ 1 2 𝑛 + 1 2 ∗ 1 2 𝑛

Example – Winning streak – Time reversal The Time reversal is (non-zero transitions): 𝑃 0,𝑖 = 𝜋 𝑖 𝑓𝑜𝑟 0≤𝑖≤𝑛 𝑃 𝑖,𝑖−1 =1 𝑓𝑜𝑟 0≤𝑖<𝑛 𝑃 𝑛,𝑛 = 𝑃 𝑛,𝑛−1 = 1 2 𝑋 𝑡 =0 Can show time reversal calculations – straight from definition: E.g for 𝑃 0,𝑖 : 𝑃 0,𝑖 = 1 2 𝑖+1 𝑃(𝑖,0) 𝜋(0) = 1 2 𝑖+1 ∗ 1 2 1 2 𝑋 𝑡+1 =3 𝑋 𝑡+2 =2

Example – Winning streak Time reversal For the time reversal of the winning streak: After n steps – distribution is stationary, regardless of initial distribution Why? If 𝑋 𝑡 =0, distribution is stationary for all 𝑡 ′ >𝑡, since 𝑃 0, ∙ = 𝜋 If 𝑋 0 =𝑘<𝑛, transitions force 𝑥 𝑘 =0, so stationary for t > k

Example – Winning streak Time reversal If 𝑋 0 =𝑛 : The location of 𝑋 𝑛 depends on how much time we spent at n For 0 < k < n: probability of 1 2 𝑘 to hold for (k-1) times, and then proceeding on the k-th turn. In this case : 𝑋 𝑘 =𝑛−1 and 𝑋 𝑛 =𝑘−1 ( since = (n-1) –(n-k)) 𝑃 𝑛 𝑛,𝑛 = 1 2 𝑛 Altogether, 𝑃 𝑛 𝑛,∙ =𝜋 If initial distribution is not concentrated on a single state – distribution at time n is a mixture of the distributions for each possible initial state, and is thus stationary.

Example – Winning streak Time reversal Note (lower bound): If the chain is started at n, and then leaves immediately, then at time n-1 it must be at state 1. Hence 𝑃 𝑛−1 𝑛,1 = 1 2 And from the definition of total variation distance we get: 𝑑 𝑛−1 ≥ 𝑃 𝑛−1 𝑛,1 − 𝜋 1 = 1 4 Conclusion - for the reverse winning streak chain we have: 𝑡 𝑚𝑖𝑥 𝜀 =𝑛, for any positive 𝜀< 1 4

Example – Winning streak – conclusion It is possible for reversing a Markov chain to significantly change the mixing time The mixing time of the Winning-Streak will be discussed in following lectures.

Reminder – random walk on a group A random walk on a group G, with incremental distribution μ: Ω = G μ is a distribution over Ω At each step, we randomly choose 𝑔∈𝐺, according to μ, and multiply by it Or, in other words: 𝑃 𝑔, ℎ𝑔 =𝜇 ℎ We’ve seen that for such a chain, the uniform probability distribution 𝑈 is a stationary distribution. Will sometimes be noted as 𝐺 −1 We’ve also seen that if 𝜇 is symmetric (𝜇 𝑔 =𝜇 𝑔 −1 ), the chain is reversible, and P = 𝑃

Mixing Time and Time Reversal inverse distribution: If μ is a distribution on a group G, 𝜇 is defined by: 𝜇 (g) := 𝜇( 𝑔 −1 ), for all 𝑔𝜖𝐺. let P be a transition matrix of a random walk on group G with incremental distribution μ, then the random walk with incremental distribution 𝜇 is the time reversal 𝑃 . Even when μ is not symmetrical, forward and reverse walk distributions are at the same distance from stationary Notes: - Need to explain second bullet?

Mixing Time and Time Reversal Lemma 4.13 Let P be the transition matrix of a random walk on a group G with incremental distribution μ Let 𝑃 be the transition matrix of a walk on G with incremental distribution 𝜇 Let π be the uniform distribution on G. Then, for any 𝑡≥0 𝑃 𝑡 𝑖𝑑,∙ − 𝜋 𝑇𝑉 = 𝑃 𝑡 𝑖𝑑,∙ − 𝜋 𝑇𝑉

Mixing Time and Time Reversal lemma 4.13 - proof Let 𝑋 𝑡 =(𝑖𝑑, 𝑋 1 , …) be a Markov chain with transition matrix P, and initial state id Can write: 𝑋 𝑘 = 𝑔 𝑘 𝑔 𝑘−1… 𝑔 2 𝑔 1 where 𝑔 1 , 𝑔 2 ,… are random elements chosen independently from μ 𝑌 𝑡 = 𝑖𝑑, 𝑌 1 , … − Markov chain with transition matrix 𝑃 , initial state id With increments ℎ 1 ℎ 2 … ℎ 𝑘 chosen independently from 𝜇

Mixing Time and Time Reversal lemma 4.13 – proof cont. For any fixed elements 𝑎 1 ,…, 𝑎 𝑡 𝜖𝐺: 𝑃 𝑔 1 = 𝑎 1 ,…, 𝑔 𝑡 = 𝑎 𝑡 =𝑃 ℎ 1 = 𝑎 𝑡 −1 , …, ℎ 𝑡 = 𝑎 1 −1 From the definition of 𝑃 Summing over all strings such that 𝑎= 𝑎 𝑡 𝑎 𝑡−1 … 𝑎 1 : 𝑃 𝑡 𝑖𝑑, 𝑎 = 𝑃 𝑡 𝑖𝑑, 𝑎 −1 Hence: 𝑎𝜖𝐺 𝑃 𝑡 𝑖𝑑,𝑎 − 𝐺 −1 = 𝑎𝜖𝐺 𝑃 𝑡 𝑖𝑑, 𝑎 −1 − 𝐺 −1 = 𝑎𝜖𝐺 𝑃 𝑡 𝑖𝑑,𝑎 − 𝐺 −1 “from the definition of 𝑃 " − 𝑠𝑖𝑛𝑐𝑒 𝜇 (g) := 𝜇( 𝑔 −1 ), for all 𝑔𝜖𝐺, and we’re looking at a reversed chain Explanation for second bullet – probabilities above describe walks using the different transition matrices 𝐺 −1 - uniform distribution over G

Mixing Time and Time Reversal Corollary If 𝑡 𝑚𝑖𝑥 is the mixing time of a random walk on a group and 𝑡 𝑚𝑖𝑥 is the mixing time of the inverse walk, then 𝑡 𝑚𝑖𝑥 = 𝑡 𝑚𝑖𝑥

Eigenvalues spectral representation of a reversible transition matrix Note: “Because we regard elements of ℝ Ω as functions from Ω to ℝ, we will call eigenvectors of the matrix P eigenfunctions”

Eigenvalues Define another inner-product on ℝ Ω : 𝑓 , 𝑔 𝜋 = 𝑥𝜖Ω 𝑓 𝑥 𝑔(𝑥)𝜋(𝑥)

Eigenvalues

Lemma 12.2 - proof Define a matrix A, as follows: A is symmetric: 𝐴 𝑥,𝑦 = 𝜋 𝑥 𝜋 𝑦 𝑃(𝑥,𝑦) A is symmetric: We assumed P is reversible with respect to π, so we have detailed balance: 𝜋 𝑥 𝑃 𝑥,𝑦 = 𝜋 𝑦 𝑃 𝑦,𝑥 => 𝜋 𝑥 𝜋 𝑦 = 𝑃 𝑦,𝑥 𝑃 𝑥,𝑦 So, A(x,y) = 𝜋 𝑥 𝜋 𝑦 𝑃 𝑥,𝑦 = 𝜋 𝑥 𝜋 𝑦 𝑃(𝑥,𝑦) 𝑃(𝑥,𝑦) = 𝑃 𝑦,𝑥 𝑃 𝑥,𝑦 𝑃(𝑥,𝑦) 𝑃(𝑥,𝑦) = 𝑃(𝑥,𝑦) 𝑃(𝑦,𝑥)

Lemma 12.2 – proof cont. A = 𝜋 1 𝜋 1 𝑃(1,1) ⋯ 𝜋 1 𝜋 𝑛 𝑃(1,𝑛) ⋮ ⋱ ⋮ 𝜋 𝑛 𝜋 1 𝑃(𝑛,1) ⋯ 𝜋 𝑛 𝜋 𝑛 𝑃(𝑛,𝑛) The Spectral Theorem for Symmetric Matrices, guarantees that the inner-product space ( ℝ Ω , ∙ , ∙ ) has an orthonormal basis 𝜑 𝑗 𝑗=1 Ω , such that 𝜑 𝑗 is an eigenfunction with real eigenvalue 𝜆 𝑗

Lemma 12.2 – proof cont. Can also see that 𝜋 is an eigenfunction of A, with eigenvalue 1: 𝐴 𝜋 (𝑖)= 𝜋 (1)∗ 𝜋 𝑖 𝜋 1 𝑃 𝑖,1 + 𝜋 (2)∗ 𝜋 𝑖 𝜋 2 𝑃 𝑖,2 + … = 𝑗=1 𝑛 𝑃 𝑖,𝑗 𝜋 (𝑖)= 𝜋 (𝑖) 𝜋 Column i =

Lemma 12.2 – proof cont. We can decompose A as 𝐷 𝜋 1 2 𝑃 𝐷 𝜋 − 1 2 , when 𝐷 𝜋 is diagonal with 𝐷 𝜋 𝑥,𝑥 =𝜋(𝑥) A = 𝜋 1 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 𝜋 𝑛 𝑃(1,1) ⋯ 𝑃(1,𝑛) ⋮ ⋱ ⋮ 𝑃(𝑛,1) ⋯ 𝑃(𝑛,𝑛) 1 𝜋 1 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 1 𝜋 𝑛

Lemma 12.2 – proof cont. Let’s define 𝑓 𝑗 ≔ 𝐷 𝜋 − 1 2 𝜑 𝑗 We can see that 𝑓 𝑗 is an eigenfunction of P with eigenvalue 𝜆 𝑗 : 𝑃 𝑓 𝑗 =𝑃 𝐷 𝜋 − 1 2 𝜑 𝑗 = I𝑃 𝐷 𝜋 − 1 2 𝜑 𝑗 = 𝐷 𝜋 − 1 2 (𝐷 𝜋 1 2 𝑃𝐷 𝜋 − 1 2 ) 𝜑 𝑗 = 𝐷 𝜋 − 1 2 𝐴 𝜑 𝑗 = 𝐷 𝜋 − 1 2 𝜆 𝑗 𝜑 𝑗 = 𝜆 𝑗 𝑓 𝑗

Lemma 12.2 – proof cont. 𝑓 𝑗 𝑗=1 Ω are orthonormal in respect to the inner product ∙ , ∙ 𝜋 : 𝛿 𝑖𝑗 = 𝜑 𝑖 , 𝜑 𝑗 = 𝐷 𝜋 1 2 𝑓 𝑖 , 𝐷 𝜋 1 2 𝑓 𝑗 = 𝑥𝜖Ω 𝑓 𝑖 𝑥 𝑓 𝑗 (𝑥)𝜋(𝑥) = 𝑓 𝑖 , 𝑓 𝑗 𝜋 => so, we have that 𝑓 𝑗 𝑗=1 Ω is an orthonormal basis for ( ℝ Ω , ∙ , ∙ 𝜋 ), such that 𝑓 𝑗 is an eigenfunction with real eigenvalue 𝜆 𝑗

Lemma 12.2 – proof cont. Now, let’s consider the following function: 𝛿 𝑦 𝑥 = 1 𝑖𝑓 𝑥=𝑦 0 𝑖𝑓 𝑥≠𝑦 𝛿 𝑦 can be written as a vector in ( ℝ Ω , ∙ , ∙ 𝜋 ) , via basis decomposition with 𝑓 𝑗 𝑗=1 Ω : 𝛿 𝑦 = 𝑗=1 Ω 𝛿 𝑦 , 𝑓 𝑗 𝜋 𝑓 𝑗 = 𝑗=1 Ω 𝑓 𝑗 (𝑦)𝜋(𝑦) 𝑓 𝑗

Lemma 12.2 – proof cont. 𝑃 𝑡 𝑓 𝑗 = 𝜆 𝑗 𝑡 𝑓 𝑗 𝑃 𝑡 𝑓 𝑗 = 𝜆 𝑗 𝑡 𝑓 𝑗 𝑃 𝑡 (x,y) = ( 𝑃 𝑡 𝛿 𝑦 )(x) So: 𝑃 𝑡 (x,y) = ( 𝑃 𝑡 ( 𝑗=1 Ω 𝑓 𝑗 𝑦 𝜋 𝑦 𝑓 𝑗 ))(𝑥)= 𝑗=1 Ω 𝑓 𝑗 𝑦 𝜋 𝑦 𝜆 𝑗 𝑡 𝑓 𝑗 (𝑥) Dividing both sides by 𝜋 𝑦 completes the proof

Absolute spectral gap For a reversible transition matrix, we label the eigenvalues of P in decreasing order 1= 𝜆 1 > 𝜆 2 ≥ …≥ 𝜆 Ω ≥−1 Definition: 𝝀 ∗ : 𝜆 ∗ =max { 𝜆 ∗ :𝜆 𝑖𝑠 𝑎𝑛 𝑒𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑃, 𝜆 ≠1} Definition: absolute spectral gap ( 𝜸 ∗ ): 𝛾 ∗ ≔1− 𝜆 ∗ from lemma 12.1 we get that if P is aperiodic and irreducible, then -1 is not an eigenvalue of p, so 𝛾 ∗ > 0 Side note: The spectral gap of a reversible chain is defined by 𝛾 ≔1− 𝜆 2 If the chain is lazy, 𝛾 = 𝛾 ∗

Relaxation time Definition: relaxation time ( 𝒕 𝒓𝒆𝒍 ): 𝑡 𝑟𝑒𝑙 = 1 𝛾 ∗ We will now bound a reversible chain’s mixing time, with respect to its relaxation time

Theorem 12.3 (upper bound)

Theorem 12.4 (lower bound)

Theorem 12.4- proof Suppose 𝑓 is an eigenfunction of P with eigenvalue 𝜆≠1 𝑃𝑓=𝜆𝑓 We’ve seen that eigenfunctions are orthogonal with respect to ∙ , ∙ 𝜋 , and that 1 (vector/function) is an eigenfunction So we have 𝟏 , 𝑓 𝜋 = 𝑦𝜖Ω 𝑓 𝑦 𝜋(𝑦) =0

Theorem 12.4- proof cont. 𝜆 𝑡 𝑓(𝑥) = 𝑃 𝑡 𝑓(𝑥) =| 𝑦𝜖Ω 𝑃 𝑡 𝑥,𝑦 𝑓 𝑦 −𝑓 𝑦 𝜋 𝑦 | ≤ max 𝑦𝜖Ω |𝑓 𝑦 | 𝑦𝜖Ω | 𝑃 𝑡 𝑥,𝑦 −𝜋 𝑦 | ≤ 𝑓 ∞ 2𝑑(𝑡) So, taking x such that |𝑓 𝑥 | = 𝑓 ∞ , we get 𝜆 𝑡 ≤2𝑑(𝑡) Using t= 𝑡 𝑚𝑖𝑥 𝜀 : |𝜆| 𝑡 𝑚𝑖𝑥 𝜀 ≤2 𝜀

Theorem 12.4- proof cont. |𝜆| 𝑡 𝑚𝑖𝑥 𝜀 ≤2 𝜀 |𝜆| 𝑡 𝑚𝑖𝑥 𝜀 ≤2 𝜀 𝑡 𝑚𝑖𝑥 𝜀 1 𝜆 −1 ≥ 𝑡 𝑚𝑖𝑥 𝜀 log 1 𝜆 ≥ log 1 2 𝜀 𝑡 𝑚𝑖𝑥 𝜀 1− 𝜆 𝜆 ≥ log 1 2 𝜀

Theorem 12.4- proof cont. 𝑡 𝑚𝑖𝑥 𝜀 ≥ log 1 2 𝜀 𝜆 1− 𝜆 = log 1 2 𝜀 ( 1 1− 𝜆 −1) 𝑡 𝑟𝑒𝑙 = 1 𝛾 ∗ = 1 1−𝜆 ∗ Maximizing over eigenvalues different from 1, we get: 𝑡 𝑚𝑖𝑥 𝜀 ≥ log 1 2 𝜀 ( 𝑡 𝑟𝑒𝑙 -1)

Mixing time bound using eigenvalues example We’ve seen random walks on the n-cycle, and random walks on groups. A random walk on the n-cycle can be viewed as a random walk on an n-element cyclic group. We will now use this interpretation to find eigenvalues and eigen functions of that chain

Mixing time bound using eigenvalues example Random walk on the cycle of n-th roots of unity Let ω= 𝑒 2𝜋𝑖 𝑛 𝑊 𝑛 ={ω, ω 2 ,…, ω 𝑛−1 ,1} , are the n-th roots of unity Since ω 𝑛 =1, we have ω 𝑗 ω 𝑘 = ω 𝑗+𝑘 = ω 𝑗+𝑘 𝑚𝑜𝑑 𝑛 Hence, ( 𝑊 𝑛 , ∙) is a cyclic group of order n, generated by ω Show drawing on board, or add drawing in here

Random walk on the cycle of n-th roots of unity Now, will consider the random walk on the n-cycle, as the random walk on the multiplicative group 𝑊 𝑛 . our incremental distribution will be the uniform distribution over {ω, ω −1 } As usual, Let P denote the transition matrix for the walk. Show drawing on board, or add drawing in here

Random walk on the cycle of n-th roots of unity Now, let’s examine at the eigenvalues of P If 𝑓 is an eigenfunction of P, then 𝜆𝑓 𝜔 𝑘 =𝑃𝑓 𝜔 𝑘 = 𝑓 𝜔 𝑘−1 +𝑓 𝜔 𝑘+1 2 , 𝑓𝑜𝑟 0≤𝑘≤𝑛−1 Show drawing on board, or add drawing in here

Random walk on the cycle of n-th roots of unity Let’s look at 𝜑 𝑗 𝑗=0 𝑛−1 , when we define 𝜑 𝑗 ( 𝜔 𝑘 ) := 𝜔 𝑘𝑗 i.e. 𝜑 0 =(1, 1, …) 𝜑 1 =(𝜔, 𝜔 2 , …) 𝜑 2 =( 𝜔 2 , 𝜔 4 ,…) … 𝜑 𝑗 is an eigenfunction of P: 𝑃 𝜑 𝑗 ( 𝜔 𝑘 ) = 𝜑 𝑗 ( 𝜔 𝑘−1 ) + 𝜑 𝑗 ( 𝜔 𝑘+1 ) 2 = 𝜔 𝑗𝑘+𝑗 − 𝜔 𝑗𝑘−𝑗 2 = 𝜔 𝑗𝑘 ( 𝜔 𝑗 + 𝜔 −𝑗 2 ) Eigenvalue of 𝜑 𝑗 is 𝜔 𝑗 + 𝜔 −𝑗 2 = cos 2𝜋𝑗 𝑛 Show drawing on board, or add drawing in here

Random walk on the cycle of n-th roots of unity What is the geometrical meaning of this? For any 𝑙,𝑗 the average of 𝜔 𝑙−𝑗 and 𝜔 𝑙+𝑗 is a scalar multiple of 𝜔 𝑙 Since the chord connecting 𝜔 𝑙+𝑗 and 𝜔 𝑙−𝑗 is perpendicular to 𝜔 𝑙 , the projection of 𝜔 𝑙+𝑗 onto 𝜔 𝑙 has length cos 2𝜋𝑗 𝑛 𝜑 𝑗 ( 𝜔 𝑘 ) := 𝜔 𝑘𝑗 is an eigenfunction with eigenvalue cos 2𝜋𝑗 𝑛

Random walk on the cycle of n-th roots of unity – spectral gap & relaxation time 𝜑 𝑗 ( 𝜔 𝑘 ) := 𝜔 𝑘𝑗 is an eigenfunction with eigenvalue cos ( 2𝜋𝑗 𝑛 ) We have 𝜆 2 = cos ( 2𝜋 𝑛 ) Cos(x) = 1 - 𝑥 2 2! + 𝑥 4 4! +… cos ( 2𝜋 𝑛 ) =1− 4 𝜋 2 2 𝑛 2 +𝑂( 𝑛 −4 ) Cosine graph goes down from cos(0) =1, reaches 0 at Pi, and climbs back to 1 when reaching 2Pi

Random walk on the cycle of n-th roots of unity – spectral gap & relaxation time So, spectral gap 𝛾 ≔1 − 𝜆 2 is of order 𝑛 −2 Relaxation time ( 𝑡 𝑟𝑒𝑙 = 1 𝛾 ∗ ) is order 𝑛 2 Note that when n is even, the chain is periodic: cos 2𝜋𝑚 𝑛 =−1 is an eigen value, and so 𝛾 ∗ = 0

Questions? THE END

Thank you! Cosine graph goes down from cos(0) =1, reaches in at Pi, and climbs back to 1 when reaching 2Pi