Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEG4140 Teletraffic Engineering Professor: S.-Y. Robert Li Rm. HSH734 (inside 727), x8369, Tutor: Diana Q. Wang.

Similar presentations


Presentation on theme: "IEG4140 Teletraffic Engineering Professor: S.-Y. Robert Li Rm. HSH734 (inside 727), x8369, Tutor: Diana Q. Wang."— Presentation transcript:

1 IEG4140 Teletraffic Engineering Professor: S.-Y. Robert Li Rm. HSH734 (inside 727), x8369, Tutor: Diana Q. Wang Rm. HSH729, Visit course website: https://course.ie.cuhk.edu.hk/~ierg4140/ frequently to catch spontaneous announcements. Textbook: Lecture notes on course website, under constant revision. Reference Book: Chapters 4~8 of S. Ross, “Probability Models,” 8th ed., Academic Press

2 Prof. Bob Li Assessment scheme: The grading emphasizes on logical reasoning rather than numerical calculation: 10% homework 30% mid-term exam 60% final exam Bonus for actively asking questions in the classroom

3 Prof. Bob Li Lecture notes: Chap 1. Motivating puzzles, markov chain, stopping time, and martingale Chap 2. Poisson process Chap 3. Continuous-time markov chain Chap 4. Renewal process Chap 5. Queueing theory

4 Prof. Bob Li Schedule: TimeContent Week 1 (Sep 8)Some puzzles in stochastic processes & markov Chain I Week 2 (Sep 15)Markov Chain II Week 3 (Sep 22)Time Reversed markov Chains Week 4 (Sep 29)Stopping time, Wald’s equation & Martingale Week 5 (Oct 6)Exponential and Poisson Distribution I Week 6 (Oct 13)Exponential and Poisson Distribution II Week 7 (Oct 20)The Poisson Process I Week 8 (Oct 27)Midterm exam (tentative date) Week 9 (Nov 3)The Poisson Process II Week 10 (Nov 10)Continuous-time markov Chain I Week 11 (Nov 17)Continuous-time markov Chain II Week 12 (Nov 24)Queuing Theory I Week 13 (Dec 1)Queuing Theory II

5 Prof. Bob Li Lecture 1 Sep. 8, 2011

6 Prof. Bob Li Chapter 1. Introduction  motivating puzzles, markov chain, stopping time, and martingale 1.1. Some puzzles in stochastic processes 1.2. Markov chain 1.3. Time reversed markov chain 1.4. Markov Decision Process 1.5. Stopping Time 1.6 Martingales

7 Prof. Bob Li P{Head} = P{Tail} = ½. In a very long binary sequence, all the 64 length-6 patterns appear equally frequently. HTHTTTTHTHHTHTTTHTTTTHTHHTHTTTHTHTHTTHTTHTHTTTHTHHTHTTHTHHTHHT HHTHTTHTHHTHHTHHTHTTTHTHHTTHHTTTHTTTTTHTHHTHHTHHTHTTHTHHTHTTTT HTHHTHTTTHTHHTHHTHHTTTHHTHHTHTHTTHTHTHHTHTHTTHTHHHHTHTHTTHTTH TTTHTHTHTHTHHTTTTHHTHTTHTHTHTHTTHHTHHTHTHTHTHHTHTTTHTTTHHTHTHT HHTTHHTHTTHTTHTTHTHHTHHTHHTHTTHTTHTHTHTHTTHTHTHTTHTHTHHTTHTTHT HTHTTHTHTHHHTHHTHTHTTHTTHTHHTHTTHTHHTHTHHTHTHTTHTTHTHTHTHTTHT HTHTHTTHTHTTTTHTHHTTHHHTTHHTTTHHHHTHHTHTHHHHHHTTHTHTHTHHTHHTH TTHHTHTHTTHHTHTHTTTHTHTHHTHTHHTTHTHHTTHHTTHHHTTHTTHTTHTTHHTHH THTTHTTHTTHTHTHHTHTHTTHTTHTHTTHTTHTTHTHHTHTTHTTTHTHTTHTHHTTH … 1.1 Some puzzles in stochastic processes Pattern occurrence in repeated coin toss So, the average waiting time for every length-6 pattern should be 64. Right?

8 The book has first raised these questions: Prof. Bob Li William Feller’s biblical textbook Average waiting time for HHHHHH = 64? Average waiting time for HHTTHH = 64? Feller did not live to see his 2-volume book.

9 Prof. Bob Li William Feller’s biblical textbook Average waiting time for HHHHHH = 64? Average waiting time for HHTTHH = 64? William Feller found these numbers counter-intuitive Through long derivation by markov chain, Feller did not live to see his 2-volume book.

10 Prof. Bob Li HTHTTTTHTHHTHTTTHTTTTHTHHTHTTTHTHTHTTHTTHTHTTTHTHHTHTTHTHHTHHT HHTHTTHTHHTHHTHHTHTTTHTHHTTHHTTTHTTTTTHTHHTHHTHHTHTTHTHHTHTTTT HTHHTHTTTHTHHTHHTHHTTTHHTHHTHTHTTHTHTHHTHTHTTHTHHHHTHTHTTHTTH TTTHTHTHTHTHHTTTTHHTHTTHTHTHTHTTHHTHHTHTHTHTHHTHTTTHTTTHHTHTHT HHTTHHTHTTHTTHTTHTHHTHHTHHTHTTHTTHTHTHTHTTHTHTHTTHTHTHHTTHTTHT HTHTTHTHTHHHTHHTHTHTTHTTHTHHTHTTHTHHTHTHHTHTHTTHTTHTHTHTHTTHT HTHTHTTHTHTTTTHTHHTTHHHTTHHTTTHHHHTHHTHTHHHHHHTTHTHTHTHHTHHTH TTHHTHTHTTHHTHTHTTTHTHTHHTHTHHTTHTHHTTHHTTHHHTTHTTHTTHTTHHTHHT HTTHTTHTTHTHTHHTHTHTTHTTHTHTTHTTHTTHTHHTHTTHTTTHTHTTHTHHTTH … In the long run, the pattern HHTTHH occurs equally frequently as any other length-6 patterns and hence it occurs after every 64 coin tosses on the average. In more precise terms, the renewal time averages 64 (and  4), which is measured from one appearance of the pattern till the next. Paradoxes in pattern occurrence

11 For a length-6 binary pattern: The renewal time, measured from one appearance to the next, averages 64. The waiting time, measured from the very beginning of the coin-toss process, can be longer. Intuition vs. misconception Quick intuition defies scientific truth when it mixes up the two concepts. Let’s see how far can quick intuition be away from scientific truth? Prof. Bob Li

12 Race between 2 patterns On fair coin-toss, Average waiting for HTHH = 18 // A slightly faster pattern Average waiting for THTH = 20 // A slightly slower pattern In a race between HTHH vs. THTH, the odds are nothing like 10 : 9, but rather … Prof. Bob Li

13 Race between 2 patterns On fair coin-toss, Average waiting for HTHH = 18 // A slightly faster pattern Average waiting for THTH = 20 // A slightly slower pattern In a race between HTHH vs. THTH, the odds are the landslide 5 : 9. nothing like 10 : 9, but rather … Prof. Bob Li

14 On fair coin-toss, Average waiting for HTHH = 18 // A slightly faster pattern Average waiting for THTH = 20 // A slightly slower pattern In a race between HTHH vs. THTH, the odds are the landslide 5 : 9 in favor of THTH. The Sting The fairy-tale race Hare Tortoise Scientific truth and fairy tales happily agree with each other, while ordinary intuition is left out.

15 Later we’ll deal with this topic again by martingales. Homework. Compute P(HH occurs before TTT) with a biased coin where P(Head) = 0.4. It’s a bit like 田忌賽馬 Tortoise often wins by a neck, while a Hare’s win is at least a full length.

16 An imaginary casino Pay $100 to enter the game: Repeatedly toss a fair coin until a Head shows up. Receive $2 n back if Head occurs on the n th toss. E[Return] = = = $  // E[N] can be , even when P(N <  ) = 1. Q. Is this $  vs. $100 a huge advantage to the gambler in practice?

17 Symmetric random walk The moral. Despite the 100% probability for the occurrence of this event in finite time, that finite time is unbounded and, in fact, has an mean value = . On the average, it takes  tosses to achieve the net gain of $1. The average net gain per toss is $1/  = $0, as it should be in fair gamble. Abel & Billy gamble $1 on each toss of a fair coin. This is a zero-sum game. The net winning of Abel is a symmetric random walk. P{Abel nets +$1 sooner or later, i.e., time till +$1 is finite} =1 ?

18 Symmetric random walk (cont’d) Abel & Billy gamble $1 on each toss of a fair coin. This is a zero-sum game. The net winning of Abel is a symmetric random walk. P{Time till Abel nets +$1 is finite} = 1 ? P{Time till Abel nets +$1000 is finite} = 1 ? P{Time till Billy nets +$1000 is finite} = 1 ? Answer: Yes to all, even though it is a zero-sum game.

19 Prof. Bob Li dimensional random walk The street map of Manhattan is an infinite checker board. A drunkard starts a symmetric 2-dimensional random walk at the street corner outside the pub. Q. P{Return to the origin within finite time} = 1? Answer: Yes. 1/4

20 Prof. Bob Li or higher-dim random walk Consider an infinite 3-dimensional checker board. At every cross point, the six directions are equally likely: east, west, north, south, up, and down Q1. P{Return to the origin within finite time} =1? Answer: No. Q2. How about 4-dim random walk?

21 Prof. Bob Li Progressive gambler On casino roulette, a gambler keeps betting on "odd", where P(Odd) = P({1, 3,..., 35}) = 18/38 P(Not odd) = P({0, 00, 2, 4,..., 36}) = 20/38 A gambler starts with a $1 bet. Whenever he loses, he doubles the bet on the next spin. This process continues until he wins, at which time, he nets +$1 regardless of how long the process takes. Q. Is this a sure way to win $1 from the casino? Answer: It is a sure way to enrich the casino.

22 Each move advances the token by a number of steps determined by rolling a die. P{Square #1 will be landed on} = 1/6 P{Square #2 will be landed on} = 1/6 + 1/36 = 7/ A board game Q. Which square has the highest probability to be ever landed on?

23 A board game (cont’d) Let p n = probability of ever landing on the square n. By conditioning on the first roll, p n =  1  k  6 P(1 st roll = k)  P(ever landing on n | 1 st roll = k) =  1  k  6 P(ever landing on n | 1 st roll = k) /6 = (p n-1 + … + p n-6 ) /6 That is, every probability p n, n  1, is the average of its six immediate predecessors. The recursive formula is a 6 th -order difference equation. There are 6 boundary conditions: p 0 = 1, p -1 = p -2 = p -3 = p -4 = p -5 = 0 In this course, we shall solve difference equations quite often.

24 A board game (cont’d) Values of p n are tabulated below. The largest among p 1 to p 6 is p 6. Because every probability mass p n is the average of its six immediate predecessors, it is an easy induction to find p n 6. npnpn npnpn npnpn npnpn -5011/6 =.16677= = /62 =.19448= = /63 =.22699= = /64 = = = /65 = =.2932… 01675/66 = =.2906    2/7=.2857

25 Prof. Bob Li A board game (cont’d) Homework. Show that p 13  p n  p 11 for all n > 13. Homework. If we roll two dice instead of just one, which square has the highest probability to be landed on? Hint: Find the largest among p 1 to p 12. Remark. This board game can be modeled as a markov chain in the obvious way. Conditioning on the first roll is the same as a markov transition, which is also the same as multiplication by the transition matrix.

26 Prof. Bob Li 1.2 Markov chains Definition. A sequence of discrete r.v. X 0, X 1, X 2,... is called a markov chain if there exists a matrix such that // P(X t+1 = j | X t = i) is time-invariant. // Memoryless: Only the newest knowledge X t counts.

27 Memoryless and time-invariant Remarks. Any possible value of any X t is called a state of the markov chain. When there are k (   ) states in the markov chain, the matrix  is square (k by k) and is called the transition matrix of the markov chain. P ij is called the transition probability from state i to state j. Definition. A sequence of discrete r.v. X 0, X 1, X 2,... is called a markov chain if there exists a matrix such that // P(X t+1 = j | X t = i) is time-invariant. // Memoryless: Only the newest knowledge X t counts.

28 Prof. Bob Li Example of 3-state weather model A simple 3-state model of weather assumes that the weather on any day depends only upon the weather of the preceding day. This model can be described as a 3-state markov chain with the transition matrix To sunny rainy cloudy sunny rainy cloudy From The transition graph (or transition diagram) is:

29 Prof. Bob Li Every row sum = 1 in transition matrix The transition equation P ij = P(X t+1 = j | X t = i) says that the i th row of the transition matrix is the conditional distribution of X t+1 given that X t = i. Hence the row sum must be 1. In contrast, column sum is in general not 1. Row sum =1 Column sum  1 To sunny rainy cloudy sunny rainy cloudy From

30 Free 1-dimensional Random Walk At each game, the gambler wins $1 with the probability p and loses $1 with the probability q = 1  p. Let X t be the total net winning after t games. Thusthe states are integers and Modeling by a markov chain, the transition matrix and transition graph are: j = i i

31 Prof. Bob Li Random Walk with an absorbing state In the same gamble as before, the gambler starts with $2. He has to stop gambling if the net winning reaches $  2. In other words, P -2,-2 = 1 and P -2,j = 0 for all j   2. The transition matrix and diagram for the markov chain are: i j  2                             qp qp qp

32 Independence of Partial Memory P(X 3 = j | X 2 = i, X 1 = i 1, X 0 = i 0 ) = P ij To calculate the probabilities for X 3, if we know X 2, then we can throw away the knowledge about both X 1 and X 0. Example. P(X 3 = j | X 2 = i, X 0 = i 0 ) = P ij // Only the newest knowledge counts. Proof. Conditioning on X 1, P(X 3 = j | X 2 = i, X 0 = i 0 ) Problem. Prove that P(X 3 = j | X 2 = i, X 1 = i 1 ) = P ij by conditioning on X 0. // Conditioning on X 1 = P ij

33 Prof. Bob Li Example. P(X 3 = j | X 1 = i, X 0 = i 0 ) = ? Answer. Conditioning on X 2, P(X 3 = j | X 1 = i, X 0 = i 0 ) // By conditioning on X 2 = The ij th entry in  2 = P(X 3 = j | X 1 = i) // Special case of Chapman-Komogorov equation below Conclusion: Only the newest knowledge counts. Independence of Partial Memory (cont’d)

34 Prof. Bob Li Transition Equation in Matrix Form Conditioning on X t, we have where the summation is over all states i. In the matrix form, this becomes,where the row vector denotes the distribution of X t. // Multiplication by  = a transition.

35 Prof. Bob Li Transition Equation in Matrix Form Whether example. The notation of the row vector in this example is simply V t = The transition equation in the matrix form becomes

36 Now we iterate the transition equation starting with V t. V t+1 = V t  // Multiplication with  represents a transition. V t+2 = V t+1  = (V t  )  = V t  2 By induction on n, we arrive at the following Chapman-Komogorov Equation. The matrix  n gives the n-step transitions. That is, V t+n = V t  n. Note. By conditioning on X t, Thus, P(X t+n =j | X t = i) = the ij th entry in the matrix  n // For all t Chapman-Komogorov Equation

37 Prof. Bob Li Prisoner in Dark Cell Q: What should be the "states" in the model of a markov chain? Hole A leads to freedom through a tunnel of a 3-hour journey. Holes B and C are at the two ends of a tunnel of a 2-hour journey. Whenever the prisoner returns to the dark cell, he would immediately enter one of the three holes by a random choice. Q. When can the prisoner get out?

38 Label the “ hour-points" as F, R, & T. Use these points, together with 'cell' and 'freedom' as states. Transition matrix Transition graph Prisoner in Dark Cell (cont’d)

39 As before let V t denote the distribution of X t. Then, V 0 = ( ) // Initially the prisoner is in the cell. V 1 = V 0  = (2/3 0 1/3 0 0) V 2 = V 1  = (0 2/3 0 1/3 0) = V 0  2 V 3 = V 2  = (4/9 0 2/9 0 1/3) = V 0  3 // 1/3 = P(Free after 3 days) V 4 = V 3  = (0 4/9 0 2/9 1/3) = V 0  4 V 5 = V 4  = (8/27 0 4/27 0 5/9) = V 0  5 // 5/9 = P(Free after 5 days) V 6 = V 5  = (0 8/27 0 4/27 5/9) = V 0  6... V  = V   = ( ) = V 0   // 1 = P(Free eventually) Note that V 0   means the second row in the matrix   =. In fact, even with any other V 0 (i.e., any distribution of the initial state), we would still have V  = V   = ( ) = V 0   Thus every row in   is ( ).

40 Prof. Bob Li Lecture 2 Sep. 15, 2011

41 Slotted Aloha multi-access protocol In every timeslot, transmission is successful when there is a unique node actively transmitting. An transmission attempt can be either a new packet or a retransmission. Every backlogged node in every timeslot reattempts transmission with a probability p. The time till reattempt is Geometric 1 (p): P(time = t) = (1  p) t  1 p for all t  1 Note. Geometric 0 (p): P(time = t) = (1  p) t p for all t  0 Node ANode BNode C

42 Markov model of Slotted Aloha Markov model. Using the number of backlogged nodes as the state of the system, a transition from state k is always to some state j ≥ k− Assumption for the convenience of analysis. The probability a i that i new arriving packets intend transmission in a timeslot is fixed and is independent of the state. Every backlogged node in every timeslot reattempts transmission with an independent probability p.

43 Markov model of Slotted Aloha Markov model. Using the number of backlogged nodes as the state of the system, a transition from state k is always to some state j ≥ k− Assumption for the convenience of analysis. The probability a i that i new arriving packets intend transmission in a timeslot is fixed and is independent of the state. The probability b i that i of the backlogged attempt retransmission depends on the state k binomially:

44 Prof. Bob Li The transition probabilities from state k are: P k,k  1 = a 0 b 1 // k > 0; 1 reattempt; no new P k,k = a 0 (1  b 1 ) //  1 reattempt; no new +a 1 b 0 // no reattempt; 1 new P k,k+1 = a 1 (1  b 0 ) // k > 0; ≥ 1 reattempt; 1 new For i  2, P k,k+i = a i //  2 new Markov model. Using the number of backlogged nodes as the state of the system, a transition from state k is always to some state j ≥ k−1. Markov model of Slotted Aloha

45 1.4 Limiting probabilities of markov chain Definition. In the markov chain X 0, X 1, X 2,..., let the row vector V t represent the distribution of X t. If there exists a row vector V  such that regardless of the initial distribution V 0, then the distribution V  is called the limiting distribution of the markov chain. Remarks. The limiting distribution is also called the “stationary state” by its interpretation as a “ random state. ” = V 0   for all V 0. Taking V 0 = ( ), we find V  equal to 1 st row in the matrix  . Similarly, taking V 0 = ( ), we find V  equal to 2 nd row in  . In the same manner, V  is equal to every row in  .

46 Prof. Bob Li Example of limiting probability Example. For the prisoner in the dark cell,

47 Prof. Bob Li Definition. A finite-state markov chain is ergodic if, for some t  1, all entries in the matrix  t are nonzero. // For some particular t, the markov chain can go // from anywhere to anywhere in exactly t steps. Theorem. If a finite-state markov chain is ergodic, then the limiting distribution exists. // Ergodicity is a sufficient condition but, we shall use an // example to show that it is not a necessary condition. Ergodic markov chain

48 Prof. Bob Li Example of ergodic markov chain Example. A salesman travels between HK and Macau. HK Macau All entries in  2 are positive. Hence this markov chain is ergodic.

49 Prof. Bob Li 1)Since V   = V 0 ( )  = V 0 ( ) = V 0 ( ) = V , the limiting distribution is a (row) eigenvector of the transition matrix  with the eigenvalue 1. 2)V  is a probability distribution (sum of entries = 1). Limiting probabilities = eigenvector The normalization by 2) is crucial since any scalar multiple of V  is also an eigenvector with the eigenvalue 1. A row vector with both properties is called the long-run distribution. In the ergodic case, V  is the unique long-run distribution.

50 Example of a 2-state markov chain All entries in  2 are positive. Hence this markov chain is ergodic. Since the limiting distribution V  is a row eigenvector of  with eigenvalue 1, V   = V  or V  (   I) = (0 0 … 0) The existence of an eigenvector renders (   I) a singular matrix. Write V  = (x y). Thus, x/2 + y = x x/2 = y are linearly dependent equations. The span the full rank minus 1, because the eigenspace is 1-dimensional. Together with the normalization equation x + y = 1, We can solve x = 2/3 and y = 1/3. HK Macau

51 Limiting distribution of a 3-state chain are positive. Hence the markov chain is ergodic. Write the limiting distribution as V  = (x y z). We want to calculate x, y, and z. Since V  is a distribution, x + y + z = 1, On the other hand, V  has to be a row eigenvector of  with eigenvalue 1. Thus, we have the linearly dependent equations: 0.1x y = x 0.9x + 0.5y + 0.1z = y 0.45y + 0.9z = z They are worth only two equations. Together with the normalization equation, we get V  = ( ). Given, all entries in                            

52 Color-Blindness Gene Color-blindness is X-genetic. There are two kinds of X genes: X 0 = color blind X 1 = normal Thus there are three types of female: X 0 X 0 = color blind X 0 X 1 = normal // X 1 gene dominates X 0 X 1 X 1 = normal and two types of male: X 0 Y = color blind X 1 Y = normal Question. In a “ stationary ” society, a survey shows that approximately 10% of the male population is color blind. However, the survey also finds too few color blind females to make a percentage estimate. What can we do?

53 Prof. Bob Li Solution. From the percentage of color blind males, we assert the 1 : 9 ratio between the X 0 gene and the X 1 gene. Then, model the problem by the mother-to-daughter transition matrix  : Daughter X 0 X 0 X 0 X 1 X 1 X 1 X 0 X 0 Mother X 0 X 1 X 1 X 1 From the nature of the problem, we know the markov chain is ergodic. The preceding calculation also proves this and, in fact, yields the limiting distribution V  = ( ). Thus, 1% of females are color blind, while another 18% are carriers of the color blindness gene. Color-Blindness Gene (cont’d)

54 Prof. Bob Li Convolution decoding Digital communications often stipulate inspection of the pattern of the last few, say, 3 received bits. Assume each bit is independently random with P(bit = 1) = p. All 3-step transition probabilities are positive. Hence the markov chain is ergodic. The components of V  are calculated as: These are not surprising at all because of the independence assumption.

55 Prof. Bob Li Example of a 2-state markov chain The markov chain is not ergodic. Nevertheless, the limiting distribution V  = (0 1) exists.         lim n n  ()           nn n

56 Prof. Bob Li Example of Oscillation There exists zero entries in  t for all t. Hence the markov chain is not ergodic, but rather periodic. In fact, there is no limiting distribution. Let V 0 = (1 0). Then V t alternates between (1 0) and (0 1). Nevertheless, there exists the eigenvector ( ) of the transition matrix  with the eigenvalue 1. State 0State 1

57 Prof. Bob Li Ergodicity, limiting distributions, eigenvector Summary of theorems. For finite-state markov chains, Ergodicity // All entries in the matrix  t are nonzero for some t  1.  The limiting distribution V  exists. // regardless of the initial distribution V 0.  Eigenvalue 1 of the transition matrix with 1-dim eigenspace // Rank of the matrix  I is full minus 1 (dim of eigenspace = 1). // Long-run distr. = unique normalized eigenvector with eigenvalue 1  Eigenvalue 1 of the transition matrix // The matrix  I is a singular.  det(  I) = 0 Q. When there is no limiting distribution, there may or may not be the eigenvalue 1. What happens then? See the next example.

58 Prof. Bob Li A gambler wants to go home. A gambler in Macau needs $3 to return to HK but has only $2. He gambles $1 at a time, and wins with probability p.  = The matrix  I has the rank 2 (< the full rank 4). There are two linearly independent eigenvectors of eigenvalue 1: ( ) and ( ). However, there is no limiting distribution. The probability of ending in the state $3 instead of $0 depends on the initial distribution V 0. For the gambler, V 0 = ( ) pp 1p1p 1p1p

59 Classification of states in a markov chain Definition. For the markov chain X 0, X 1, X 2,... The probability of recurrence at a state i is f i = P(X t = i for some t > 0 | X 0 = i)  1 The state i is said to be recurrent when f i = 1 and transient when f i < 1. // For sure, the state will be revisited sooner or later. The multiplicity of visiting state i, denoted by M i (   ), is the number of indices t > 0 such that X t = i. Theorem. The conditional r.v. (M i | X 0 = i) is: deterministically equal to  when f i = 1 Geometric 0 (1  f i ) distributed when f i < 1 // That is, P(M i = n | X 0 = i) = f i n (1  f i ) Intuitive proof. Think of every revisit to state i as either Head or Tail depending whether it is the final visit. Thus P(Head) = 1  f i & P(Tail) = f i.

60 Computational Proof. Assuming that f i < 1, we shall prove by induction on n that P(M i = n | X 0 = i) = (1  f i ) f i n P(M i = 0 | X 0 = i) P(M i = 0 | X 0 = i) = P(X t  i for all t>0 | X 0 =i) = 1  f i P(M i = 1 | X 0 = i) =  t>0 {P(M i =1 | X 0 = i and 1 st recurrence at time t)  P(1 st recurrence at t | X 0 = i)} =  t>0 {P(no recurrence after time t | X t = i)  P(1 st recurrence at time t | X 0 = i)} =  t>0 {P(M i =0 | X 0 = i)  P(1 st recurrence at time t | X 0 = i)} // Time-invariant = (1  f i )  t>0 P(1 st recurrence at time t | X 0 = i)} = (1  f i ) f i P(M i = n | X 0 = i) =  t>0 {P(M i = n | X 0 = i and 1 st recurrence at time t)  P(1 st recurrence at t | X 0 = i)} =  t>0 { P(n  1 recurrence after time t | X t = i)  P(1 st recurrence at time t | X 0 = i)} =  t>0 {P(M i = n  1 | X 0 = i)  P(1 st recurrence at time t | X 0 = i)} // Induction on n = (1  f i ) f i n  1  t>0 P(1 st recurrence at time t | X 0 = i)} = (1  f i ) f i n // Geometric 0 (1  f i ) distribution Classification of states (cont’d)

61 Corollary. E[M i | X 0 = i] = 1 / (1  f i ) for f i  1. Corollary. E[M i | X 0 = i] =  if and only if f i = 1. Proposition. E[M i | X 0 = i] = entry in the matrix  t ) Proof. Given i, let Y t be the characteristic r.v. of the event X t = i. Thus, and hence E[M i | X 0 = i] = // Mean of sum is sum of means. = entry in the matrix  t ) // Chapman-Komogorov Classification of states (cont’d)

62 Prof. Bob Li Summary. f i = 1   entry in the matrix  t ) =  // Chapman-Komogorov // The final condition can be used to verify the recurrence of 1- // or 2-dim symmetric random walk but not of the 3-dim one. Corollary. In a finite-state markov chain, there is at least one recurrent state. In an infinite-state markov chain however, it is possible that all states are transient. We shall see that this is the case for the 1-dim asymmetric free random walk. Classification of states (cont’d)

63 Prof. Bob Li Example. In the 3-state weather model, all three states are recurrent. Example. In the markov chain for the prisoner in the dark cell, only the “ freedom ” state is recurrent. Examples

64 Prof. Bob Li Lecture 3 Sep. 22, 2011

65 Prof. Bob Li 1-dim free random walk At each game, the gambler wins $1 with the probability p and loses $1 with the probability q = 1  p. Let X t be the total net winning after t games. Theorem. In an asymmetric 1-dimensional free random walk, every state is transient. In the symmetric 1-dimensional free random walk, every state is recurrent and hence the process almost surely will return to the initial state within finite time. // “ Almost surely ” means “ with probability 1. ” Proof. Next slide.

66 Proof. 0 for odd t when t = 2n // To go from i to i in exactly 2n steps, there must // be exactly n steps to the right and n to the left. The state i is transient or recurrent hinges on the convergence the series = // t = 2n =  // Stirling's Formula: as n  If p  1/2, then 4pq < 1 and the series converges. If p = 1/2, then 4pq = 1 and the series diverges. // By the “ integral test ” 1-dim free random walk

67 Problem. Show that the symmetric random walk will almost surely reach the state to the right of the initial state. Solution. 1 = P[will ever return to 0 | X 0 = 0] = (1/2) P[will ever return to 0 | X 0 = 0 and X 1 = 1] + (1/2) P[will ever return to 0 | X 0 = 0 and X 1 =  1] // Conditioning on X 1 under the a priori of X 0 = 0 = (1/2) P[will ever return to 0 | X 1 = 1] + (1/2) P[will ever return to 0 | X 1 =  1] // Memoryless property: only the newest information counts = P[will ever reach state 0 | X 1 =  1] // by symmetry 1-dim symmetric random walk Prof. Bob Li

68 Positive recurrent states Definition. For the markov chain X 0, X 1, X 2,... The waiting time T i for state i is the smallest index t > 0 such that X t = i. When there is no such an index t, define T i = . The state i is positive recurrent if E[T i | X 0 = i] < . Theorem. Positive recurrent  recurrent Proof. E[T i | X 0 = i] <   P(T i =  | X 0 = i) = 0  P(No recurrence | X 0 = i) = 0  State i is recurrent. Prof. Bob Li

69 Recurrent Transient Positive recurrent Non-positive recurrent e.g., symmetric 1-dim random walk as we shall prove later Amah Rock ( 望夫石 ), Shatin, HK CUHK State classification Immortal 夫

70 Prof. Bob Li Theorem. Recurrent states in a finite-state markov chain are all positive recurrent. Proof. Omitted. Remarks. Fact. 2-dim symmetric free random walk is also recurrent. It cannot be positive recurrent. Why? Fact. 3-dim symmetric free random walk turns out to be transient. 4-dim symmetric free random walk is also transient. Why? (cont’d) Positive recurrent states

71 Prof. Bob Li Equivalent classes of states Definition. State j is said to be accessible from state i if, starting from state i, the probability is nonzero for the markov chain to ever enter state j, that is, the ij th entry in the matrix  t is positive for some t ≥ 0. If two states i and j are accessible to each other, we say that they communicate. // Mathematically, state communication is an “ equivalence relation, ” // that is: reflexive, symmetric, and transitive. Example. A 3-state markov chain with only one equivalent class of states.

72 Prof. Bob Li Example. A markov chain with four states in three equivalent classes: {0, 1}, {2} and {3}. Only state 2 is transient. Equivalent classes of states (cont’d)

73 Prof. Bob Li Theorem. Recurrence/transience is a class property. // Thus it makes sense to mention a recurrent class or a transient class. Proof. Let states i and j belongs to an equivalent class. Write for the ij th entry in  k. Thus and for some k and m. Suppose that i is recurrent, i.e. We shall prove that so is j. Equivalent classes of states (cont’d)

74 Prof. Bob Li Example. A markov chain with five states in three equivalent classes:  {0, 1} is recurrent  {2, 3} is recurrent  {4} is transient Equivalent classes of states (cont’d)

75 Prof. Bob Li skip Equivalent classes of states (cont’d) Definition. When there is only one equivalence class, the markov chain is said to be irreducible. Theorem. All states of a finite-state irreducible markov chain are recurrent (and hence positive recurrent by an aforementioned theorem). Proof. Not all states can be transient in a finite-state markov chain. Problem. Prove that a transient state cannot be accessed from a recurrent state.

76 Skip Calculation of expected time & multiplicity Example of Gambler’s Ruin A gambler in Macau needs $7 to return to HK but has only $3. He starts to wage $1 at a time and wins with probability p = 0.4. The transition matrix of his fortune is as right One way to calculate f ij = P(X t = j for some t > 0 | X 0 = i) is by first considering the following quantity s i,j = E[number of visits to j | start at i] and by conditioning on whether the state j is ever entered (See p.197 of [Ross, 6 th ed.] for detail.) Below, we calculate s i,j for all i = 1 to

77 Prof. Bob Li skip Example of Gambler’s Ruin (cont’d) Conditioning on the outcome of the initial play, s i,j =  i,j // The Knonecker delta  i,j = 1 when i = j; else = 0 + p E[number of visits to j | start at i; initial play wins] + q E[number of visits to j | start at i; initial play loses] =  i,j + p s i+1,j + q s i  1,j Denote by S the 6  6 matrix (s ij ) 1  i  6,1  j  6. Then, S = I +  T  S where I is the 6  6 identity matrix and  T denotes the matrix within blue circumscription. Thus, I = S   T  S = (I   T )  S S = (I   T )  1 As p = 0.4 and q = 0.6, we can explicitly calculate this inverse matrix and find, for instance, s 3,2 = and s 3,5 =.9228

78 Calculation of expected time till freedom The transition graph The only recurrent state, i.e., the terminal state

79 Method 1: Given the initial state = cell, let the frequency of entering the internal tunnel be M T, which is Geometric 0 (1/3) distributed. Thus, E[Waiting time for freedom | initial state = cell] = E[2M T +3] = 2EM T +3 = 2*2+3 = 7 Method 2: Calculate from markov chain. For every state S, define the conditional r.v. T S = (Waiting time for freedom | initial state = S) and adopt the abbreviation e S = ET S. We have the following matrix equation. This matrix equation means conditioning on the outcome of the 1 st step. Calculation of expected time till freedom

80 Prof. Bob Li In other words, we have the following system of linear equations: From these, we can solve for e S for all states S. // Even though we only wanted to compute e cell, the // markov-chain computation gives e S for all states S. It turns out that e R = 1, e F = 2, e cell = 7, e T = 8 // Mean of sum is sum of means. Calculation of expected time till freedom

81 Example of pattern occurrence in coin tossing Q. In fair-coin tossing, the expected waiting time for the pattern HTHT to occur consecutively is not 16. A solid arrow means HEAD and a dotted means TAIL. Probability of every arrow is 0.5. For every state S, define the conditional r.v. T S = (waiting time for HTHT | initial state = S) and write e S = ET S. We have the following equation. The terminal state

82 which implies // Mean of sum is sum of means. Solving the five simultaneous equations, it turns out that e null = 20. Alternative calculation of this quantity by martingale is instantly. See: S.-Y. R. Li, “A martingale approach to the study of occurrence of sequence patterns in repeated experiments,” Annals of Probability, v.8, pp , Example of pattern occurrence in coin tossing

83 Prof. Bob Li Q. What is the expected waiting time for THTT to occur consecutively? For every state S, define e S = E[waiting time for THTT | initial state = S] It turns out that e null = 18. Example of pattern occurrence in coin tossing

84 Prof. Bob Li Q. What is the expected waiting time for either HTHT or THTT, whichever occurs first? Homework. For every state S, define e S = E[waiting time till HTHT or THTT | initial state = S] In particular, e HTHT = e THTT = 0. Calculate e null. Example of pattern occurrence in coin tossing Two terminal states

85 Probabilities toward multiple terminal states Q. Which pattern between THTT and HTHT is more likely to occur first? For every state S, define p S as P(HTHT eventually occurs | initial state = S] In particular, p HTHT = 1 and p THTT = 0. The above question becomes: p null = ? This question does not pertain to the “ elapsed time ” at any state. The state H eventually leads to HT and hence can be merged into state HT. Similarly we also merge the states T into the state TH. The transition graph: A B

86 Prof. Bob Li The column vector of p S is an eigenvector of the transition matrix with eigenvalue 1. (Do not confuse this with the limiting distribution, which is a row eigenvector.) Probabilities toward multiple terminal states

87 Random-walk gambler A gambler in Macau needs $n for boat fare to go home but has only $i. He gambles $1 at a time and wins with probability p each time. The two terminal states are at ends. Denote p i = P{The gambler will reach $n | starting at $i}. We are to calculate p i for all i.

88 Calculation by markov chain p i = p P{Will reach $n | start from $i; initial play wins} + (1  p) P{Will reach $n | start from $i; initial play loses} // Conditioning on the outcome of the initial step again // except that we do not use the matrix notation this time. = p P{Will reach $n | start from $i+1} + (1  p) P{Will reach $n | start from $i  1} Thus, p i = p p i+1 + (1  p) p i  1 The problem becomes to solve the 2 nd -order difference equation with the boundary conditions: p 0 = 0 and p n = 1.

89 p i = p p i+1 + (1  p) p i  1 p p i + (1  p) p i = p p i+1 + (1  p) p i  1 (1  p)(p i  p i  1 ) = p(p i+1  p i ) Write x i = p i+1  p i for 0  i < n. (1  p) x i  1 = p x i This is only a 1 st -order difference equation. Thus {x i } 0  i

90 p i+1 = p i + (p 1  0) // p 0 = 0 p i+1 = p i + p 1 = [p i-1 + p 1 ] + p 1 // Recursively = p i-1 + [ + ] p 1 = … = p 0 + [1+ … + + ] p 1 // Telescope; p 0 = 0 Thus, p i = [1+ … + + ] p 1 Calculation by markov chain

91 It remains to calculate p 1 from the boundary condition. Thus take i = n in particular, Later, a martingale will turn this long calculation into an instant one. Calculation by markov chain

92 Prof. Bob Li Example. A gambler in Macau starts with $900, needs $1000 to return to HK, gambles $1 at a time on the casino roulette, and wins with probability p = 18/38 at each bet. (Thus, N = 1000 and i = 900.) Then, (1  p)/p = 10/9 p i = = (10/9)  100 =  10  5 With the probability , the gambler is doomed. Gambler’s Ruin

93 Remark. Optimal strategy over random walk Assumptions. The gambler in Macau needs $N to go home but has only $i. At the casino he can bet any integer amount at a time and the probability of winning is 18/38. Since this is not fair gamble to him. It is called a “super martingale,” while the “martingale” concept formalizes fair gamble. Strategy to optimize the chance of going home: 1.Whenever he has $N/2 or less, bet all. 2.Whenever he has more than $N/2, bet the amount that he is short of. Intuition reasoning: Because of the disadvantage, he is better off keeping the gambling process as short as possible. The formal proof of the optimality of the said strategy is beyond the scope of this course. Prof. Bob Li

94 Skip Simplex algorithm in linear programming Minimize c t  x // c t = (c 1 … c n ), x t = (x 1 … x n ) subject to A  x = b // A is an m  n matrix of rank m. // b t = (b 1 … b m ) and x  0 The optimal value of x is at an extreme point of the feasibility region, a polytope in the n-dim space, and hence has at least n–m components equal to 0. There are extreme points. Label them from 1 to N by the increasing value of the objective function ct  x. The simplex method always moves from an extreme point to a better extreme point.

95 Prof. Bob Li Efficiency of simplex algorithm in linear programming The simplex method always moves from an extreme point to a better extreme point. Assumption for the convenience of performance analysis: When the algorithm is at the L th point, the next extreme point will be equally likely to be any one among the (L–1) st, (L – 2) nd, …, 1 st. Thus,

96 Efficiency of simplex algorithm in linear programming (cont’d) Let T i be the number of transitions to go from state i to state 1. In particular, T 1 = 0.

97 Efficiency of simplex algorithm in linear programming (cont’d) Shifting the index, Take the difference between the last two equations. // Approx. by Stirling ’ s formula

98 Efficiency of simplex algorithm in linear programming (cont’d) Let c = n/m. Therefore, under the said assumption, the number of simplex- method iterations from the worst start is only Numerical example. If n = 8000 and m = 1000, then c = 8, and

99 Prof. Bob Li 1.4 Time reversed markov chain

100 Prof. Bob Li Reverse transition probability Let P ij denote the transition probability and  i the stationary probability of an ergodic markov chain. Assume that the markov chain has been in operation for a long time. Consider the reverse process where n > m  . Define the reverse transition probability Q ij At a time m  ,

101 Prof. Bob Li  j P ji = stationary rate at which the chain flows from j to i  i Q ij = stationary rate at which the chain flows backward from i to j Mnemonic interpretation of  j P ji =  i Q ij

102 Prof. Bob Li Time reversed markov chain Theorem. The reverse process where m  , is by itself a markov chain with the transition probability Q ij =  j P ji /  i.

103 at the time m+1 Prof. Bob Li B, A}

104 Prof. Bob Li B, A} at the time m+1

105 The formula Q ij =  j P ji /  i calculates: Reverse transition matrix [Q ij ] i,j Forward transition matrix [P ij ] i,j limiting distribution ( …  i …) The proposition below offers a way to calculate: Reverse transition matrix [Q ij ] i,j Forward transition matrix [P ij ] i,j limiting distribution ( …  i …)

106 Proposition 1. For an irreducible stationary ergodic markov chain with the transition probabilities P ij, if one can find positive numbers  i and Q ij meeting the equations: // “Irreducible” = can reach any state i from any state j. (a)  i Q ij =  j P ji for all i, j (b)  i  i = 1 // Normalizer (c)  i Q ji = 1 for each j // Normalizer then  i are the limiting probabilities and Q ij are the transition probabilities of the reversed chain. Proof. Summing both sides of (a) over i, In the matrix form, ( …  i … ) [P ij ] i,j = ( …  i … ) That is, the row vector ( …  i … ) is an eigenvector of the transition matrix [P ij ] i,j with the eigenvalue 1. It must be the limiting distribution because of (b). This, together with (c), shows that Q ij are the transition probabilities of the reversed markov chain.

107 Example of light bulb at the lighthouse When the light bulb at a very old lighthouse fails during day n, it is replaced at the beginning of the day n+1. Define the integer random variables: L = lifespan of light bulb (rounded up to integer) // Distribution of L is by manufacturer X n = age of the bulb at the end of day n Thus, {X n, n = 1, 2, 3, … } is a markov chain. // e.g., {X n, n = 1, 2, 3, … }={1, 2, 3, 1, 1, … } The transition probabilities are:

108 In the long run, the transition probabilities of the reverse chain are: When j > 1, clearly Q j,j  1 = 1 For all i  1, Q 1,i = P{Age of the light bulb yesterday was i | light bulb today is a new one} = P{Age of the light bulb yesterday was i | a light bulb failed yesterday} = P{Age of a light bulb on its day of failing is i} // Yesterday was also in limiting distribution. = P{Lifespan of a light bulb is i} = P{L = i} // Distribution of L is given. By the above Proposition 1, if we shall solve {  i } from the equations (a) and (b) below. Then, {  i } is the limiting distribution: (a) for all i, j // Need this for j =1 and j  1 (b)  i  i = 1 // Normalizer (c)  i Q ji = 1 for each j // Clearly true

109 Taking j = 1 in (a), From (b), for all i  1 Thus, part of (a) together with (b) uniquely determines {  i }. We have yet to verify the remaining part of (a), that is, for all j  1 (and all i  1). Since Q j,j  1 = 1, it suffices to verify just for j = i+1 > 1, that is, to show.

110 Prof. Bob Li Definition. A stationary ergodic markov chain is said to be time reversible if P ij = Q ij for all i and j. // Recall that  i Q ij =  j P ji or, equivalently, for all i and j. // The stationary rate at which the chain flows from i to j // is equal to the stationary rate at which it flows from j to i. An analogy of the stationary rate of transition is the annual trade volume from a nation to another. The limiting distribution is reached when there is overall trade balance for every nation. A time-reversible markov chain is like trade balance between any two nations. This is a stronger statement. Time reversibility

111 Prof. Bob Li Proposition 2. The existence of any nonnegative numbers  i summing to 1 such that make the markov chain time reversible. Moreover, such numbers  i represent the limiting probabilities. Proof. // The proof is similar to and simpler than that of Proposition 1. Summing both sides of the equation over i, //  i P ji = 1 = the total probability That is, the row vector (…  i …) is an eigenvector of the transition matrix [P ij ] with the eigenvalue 1. Hence it is the limiting distribution.

112 Random walk between two blocking states The fact that {  j } is the limiting distribution corresponds to the red cut. It balances the influx to state i with the outflow from it. In other words, {  j } is a (row) eigenvector of the transition matrix with eigenvalue 1. Because the transition graph is a single string of states, the influx to the state group {0, 1, …, i−1} balances with the outflow from it. In other words, the flow from i  1 to i and the flow from i to i  1 are in equilibrium, as embodied by the simple blue cut. Thus, the process is time reversible. Prof. Bob Li

113 Random walk between two blocking states The flow from i  1 to i and the flow from i to i  1 are in equilibrium. This is equivalent to the eigenvector statement and is mathematically simpler. We now calculate  j from it. This is only a 1 st -order difference equation, and the boundary condition is the equation of normalization:  i  i = 1.

114 Prof. Bob Li Taking the product of the above equalities, we can express all  i in terms of  0. Next, calculate  0 from the boundary condition … (Homework)

115 Prof. Bob Li Lecture 4 Sep. 29, 2011  Oct. 6, 2011

116 Prof. Bob Li Time-reversibility of random walk 0  M  The special case when  i =  for all i

117 Prof. Bob Li Time-reversibility of random walk 0  M  The special case when  i =   ½ for all i // Here   1, i.e.,   ½

118 M M ½ ½ ½ ½ ½ ½ ½ ½ Time-reversibility of random walk 0  M  The special case when  i = ½ for all i. Hence  i = 1/(M+1) for all i. // Oscillation between 0 and M leads to uniform stationary distribution. Intuitive interpretation. Consider random walk around the following cycle, in which every line indicates bidirectional transition.

119 Prof. Bob Li 1 1 Another way of oscillation M−1 ½ ½ ½ ½ ½ ½ ½ ½ M ½ ½ It turns out that  i = 1/M for 0 < i < M and  0 =  M = 1/2M. Intuitive interpretation. Consider random walk around the cycle below.  When  i = ½ for all i except that  0 = 1 and  M = 0

120 Prof. Bob Li Time-reversibility of random walk 0  M  The special case of a two-urn model M molecules are distributed between two urns. At each transition, one of the M molecules is chosen at random for relocation from its urn to the other. = =

121 Prof. Bob Li Model the number of molecules in one urn as a random walk between 0 and M with  i = (M  i)/M. Hence, Applying the equation  i  i = 1, Conclusion. In the limiting distribution, it is as if all M molecules are distributed to the two urns randomly and independently. Homework: Give an intuitive interpretation of this conclusion.

122 Prof. Bob Li Traveling around a weighted graph States of the markov chain = nodes in a graph Every edge (i, j) in the graph is associated with a weight w ij = w ji  0. Transition probabilities P ij = w ij /  k w ik // Weight proportional transition We want to find  i such that  i  i = 1 and because of Proposition 2.   i w ij /  k w ik =  j w ji /  k w jk   i /  k w ik =  j /  k w jk // w ij = w ji   i /  k w ik = c for some constant c independent of i 1 =  i  i = c  i  k w ik // Calculate c by normalization.  c = 1 /  i  k w ik   i =  k w ik /  i  k w ik

123 Prof. Bob Li Traveling around a weighted graph Interpretation of formula  i =  k w ik /  i  k w ik Approximating by rational numbers, we may assume that w ij are all integers. Replace each edge (i, j) with w ij non-weighted edges. Then, a transition from a node i means moving through a randomly selected outgoing edge. The formula says that, regardless of the network topology, the limiting frequency of visiting a node is proportional to the quantity of its adjacent edges.

124 Prof. Bob Li 1.5 Stopping time

125 Prof. Bob LI Theorem 1. Let X 1, X 2,..., X n,... be i.i.d. and N  0 be a discrete random variable that is independent of all X n. Then, // Think of N as the number of customers to a shop // and X n as the amount spent by the n th customer. Proof. // Conditioning on N // N is independent of all X n. // Mean of sum is sum of means. = EX 1 EN

126 Stopping time and Wald’s equation Q. Roll a die repeatedly until the side “4” shows up. How many rolls does it take on the average? Calculative Solution. Let N  1 represent the waiting time for “4”. P(N = n) = (5/6) n  1 (1/6) // Geometric 1 (1/6) EN = 6 Intuitive Solution. Let X n = 1 or 0 depending on the event for the n th roll to show “4.” // X n is called the characteristic r.v. of the event. Thus EX n = P{X n = 1} = 1/6. We speculate that EN is the reciprocal of this probability 1/6. If Theorem 1 in the above could apply, then we could justify the intuition by EN EX 1 = EN / 6 However, Theorem 1 requires the independence of N from all X n, which unfortunately is not the case. So, what to do? We shall strengthen Theorem 1 by relaxing the independence condition in it.

127 Definition. Consider a sequence X 1, X 2,..., X n,... of random variables. A nonnegative integer random variable N is called a stopping time (or stopping rule) of this sequence if For every n, the event {N  n} is independent of the random variable X n. // For all n and all x, the events {N  n} and {X n  x} are independent. // Equivalently, the characteristic r.v. of the event {N  n} is indep. of X n. X n represents the outcome of the n th experiment. Right before the n th experiment, you need to decide whether to stop without knowing how X n will turn out. The decision, though, may depend on the past knowledge on X 1, X 2,..., X n  1. Hence, the stopping time N is not independent of the process X 1, X 2,..., X n  1, but it abides with causality in life. Stopping time

128 Prof. Bob LI Examples of stopping times Roll a die until the side “4” appears. Gamble $1 at a time at a casino until midnight or losing all money. Gamble $1 at a time at a casino until there is only $k left. Examples of non stopping times Gamble $1 at a time at a casino until right before losing the last $1. Gamble $1 at a time at a casino until right before my luck turns bad.

129 Theorem (Wald's Equation). Let X 1, X 2,..., X n,... be i.i.d. with a stopping time N. If EN < , then Proof. Define the r.v. // Characteristic r.v. of the event {N  n} Observe two things: The stopping time means the indep. between Y n and X n. X n Y n = X n when N ≥ n = 0 otherwise Hence we can remove the randomness in the upper bound of the summation by Abraham Wald Wald’s equation

130 Prof. Bob LI Thus, // Mean of sum is sum of means. // X n and Y n are indep. = // Identical distribution // Y n = characteristic r.v. of event {N  n}. = EX 1 EN Wald’s equation

131 Prof. Bob LI Example. Gambler’s ruin on casino roulette Q. A gambler starts with $100 and intends to keep betting $1 on ODD at the roulette until he loses all. How long does this process take on the average? Answer. Let X n be the net winning on the n-th bet. Then, X 1, X 2,..., X n,... are i.i.d. and X n = Thus, EX n = P(X n = 1)  P(X n =  1) =  1/19. Let the random variable N represent the number of bets till losing all. Clearly, N is a stopping time for the process X 1, X 2,..., X n,... By Wald’s equation, Hence EN = 1900 < .

132 Example of non-positive recurrent markov chain Theorem. Symmetric 1-dim free random walk is not positive recurrent. // It was shown to be recurrent. Proof. Let Y n be the winning on the n-th game in fair gamble. This defines i.i.d. The symmetric 1-dim free random walk is represented by the markov chain X 0, X 1, X 2,..., where X n =  j≤n Y j. Let T 01 be the waiting time for state 1 given X 0 = 0. Clearly T 01 is a stopping time for the markov chain. Claim that ET 01 = . If ET 01 < , Wald's Equation would have yielded the following contradiction: 1 = E [ ] = ET 01 EY 1 = ET 01  0 = 0 Thus ET 01 must be . Similarly, ET 10 =  = ET  1,0. Conditioning on the first move, we have ET 00 = 1 + ET 10 / 2 + ET  1,0 / 2 = 

133 Prof. Bob LI Revisiting the board game Let p n = P{ever landing on square #n}. Previously, we found that p 6  p n for all n. Q. lim n  p n = ? Intuitive solution. As n gets very large, the value of p n should have little correlation to n. Thus, lim n  p n exists. The average value per roll of the die is 7/2. So, 2 out of every 7 squares should be landed. Therefore lim n  p n = 2/7. Advance the token by rolling a die.

134 To articulate the intuition in rigor Denote X n = outcome of the n th roll Y n = characteristic r.v. of the event {ever landing on n}  p n = P{Y n =1} = EY n T k = waiting time until the square k is passed = Y 1 + Y 2 + … + Y k + 1 // Count only landed squares.  k+1  X 1 + X 2 + … +  k+6 // Count both landed and passed squares. Clearly T k is a stopping time for the process X 1, X 2, …, X n, … From Wald's Equation, E[X 1 + X 2 + … + ] = ET k EX 1 = ET k  7/2 k+1  ET k  7/2  k+6 2k/7 + 2/7  ET k  2k/7 + 12/7 Since ET k = EY 1 + … + EY k + 1 = p 1 + … + p k + 1, 2k/7 + 2/7  p 1 + p 2 + … + p k + 1  2k/7 + 12/7

135 2/7  5/7k  (p 1 + p 2 + … + p k )/k  2/7 + 5/7k p 1 + p 2 + … + p k k lim n  p n = lim k  = 2/7 Homework. 1.If we roll two dice instead of just one, show that lim n  p n = 1/7. 2.Define W k as the waiting time until the square k is landed or passed. Show that both T k and W k are stopping times for the process X 1, X 2, …, X n, … 3.Clearly T k = Y 1 + Y 2 + … + Y k + 1. How does W k relate to Y 1 + Y 2 + … + Y k ? To articulate the intuition in rigor

136 Prof. Bob LI 1.6 Martingales

137 Sum of randomly many i.i.d. Theorem. When X and Y are independent r.v., E[XY] = EX EY, Cov(X,Y) = 0 and hence Var(X+Y) = Var(X) + Var(Y) Proof. E[XY] // By independence = EX EY Cov(X,Y) = E[(X  EX)(Y  EY)] = E[XY]  E[X(EY)]  E[(EX)Y] + EX EY = E[XY]  (EY)EX  (EX)EY + EX EY = EX EY  EY EX  EX EY+ EX EY = 0

138 Prof. Bob LI Corollary. Let X 1, X 2,..., X n be i.i.d. Then, Var(X 1 +X X n ) = n Var(X 1 ) Proof. Var(X 1 +X X n ) = Var(X 1 ) + Var(X 2 ) Var(X n ) // By independence = n Var(X 1 ) // Identical distribution Contrast: Var(nX) = n 2 Var(X) // The contrast leads to Central Limit Theorem, // Law of Large Numbers, Chebychev inequality, etc. Proof. Var(nX) = E[(nX) 2 ]  (E[nX]) 2 = E[n 2 X 2 ]  (n EX) 2 = n 2 E[X 2 ]  n 2 (EX) 2 = n 2 Var(X)

139 Prof. Bob LI Martingale Definition. A process X 1, X 2,..., X n,... is called a martingale if, for all k, E|X k | <  and E[X k+1 | X k = x k, …, X 1 = x 1 ] = x k // This is sometimes abbreviated as E[X k+1 | X k, …, X 1 ] = X k Example. X k = net cumulative winning after k games in fair gamble. Counterexample. When you gamble on casino roulette, E[X k+1 | X k = x k, …, X 1 = x 1 ] < x k The process X 1, X 2,..., X n,... is then called a “super-martingale” by J. Doob ironically.

140 Prof. Bob LI Intuitive definition. A martingale is a stochastic process X 0, X 1, X 2,... with X k = cumulative net winning after k games in fair gamble Rigorous definition. A martingale is a stochastic process X 0, X 1, X 2,... such that E|X k | <  and for all k. Martingale = fair gamble

141 Prof. Bob LI History of martingale The word martingale came from middle-age French. It was first related to the concept of probability through a betting system in the 18th century. Paul Levy (1886~1971) formulated the mathematical concept.

142 Prof. Bob LI Martingale Stopping Theorem. (Joseph Doob) Let X 0 =0, X 1,..., X n,... be a martingale and N a stopping time. If E|X N | <  and, then EX N = 0. Martingale Stopping Theorem The intuitive fact of E[Net gain in fair gamble upon stopping] = 0 is rigorously formulated as: Proof. Omitted.

143 A gambler brings in $1 and bets on Head in fair-coin tossing. Upon the winning every time, he doubles his fortune and then parley the next coin toss. The process continues until he loses. Thus the stopping time N is Geometric 1 (1/2) distributed. Upon stopping, the net winning of the gambler is $(  1), which differs from the initial winning of $0. Therefore, the Martingale Stopping Theorem does not apply here. This example shows that the “lim inf condition” in the Martingale Stopping Theorem is not superfluous. An artificial casino of fair gamble

144 Gambler’s random walk A gambler in Macau needs $n for boat fare to go home but has only $i. So he gambles $1 at a time and wins with probability p each time. Recall the long markov-chain calculation for We shall derive this with almost no calculation at all. Denote p i = P{Will reach $n | starting at $i}.

145 Instant calculation by martingale The random walk is not a martingale. Labels of states are in an arithmetic progression. Deploy a geometric progression instead with the common ratio r = = odds.

146 The random walk is not a martingale. Labels of states are in an arithmetic progression. Deploy a geometric progression instead with the common ratio r = = odds. By symmetry, we shall assume that p < q. $r k  1, the loss is $(r k  r k  1 ) From $r k to $r k+1, the gain is $(r k+1  r k ) = $r(r k  r k  1 ) The random walk becomes a martingale X 0 = r i, X 1,..., X n,... Instant calculation by martingale

147 Prof. Bob LI The time T till the gambler reaches either end is a stopping time. From the Martingale Stopping Theorem, ri = EXTri = EXT = p i r n + (1  p i )  1  // Same solution as before. Instant calculation by martingale

148 Prof. Bob LI Possible applications of this martingale r = = odds Financial engineering. Take r in this martingale to be the rate of interest, inflation, depreciation, … Information engineering. r can be, for instance, the attenuation of signal strength.

149 Prof. Bob LI Expected waiting time Let W = waiting time from state i till reaching either end = a stopping time Q: EW = ? Below we give a solution by a different martingale. We have solved by markov chain or by martingale.

150 Prof. Bob LI If p < q, there is an average loss $(q  p) per step. If p > q, there is an average gain $(p  q) per step. By symmetry, we shall assume the former case. Expected waiting time

151 The average loss per step is $(q  p). To convert the random walk into a martingale, one way is to compensate the gambler by the amount $(q  p) per bet. By the martingale stopping theorem, expected net gain upon stopping is 0. E[p i n (q  p)W] = i n + (q  p) EW = i Thus, EWEW Expected waiting time

152 Prof. Bob LI Homework. In NBA championship, a “best 4 out of 7” series is played between Miami and Dallas. Assume that the result of every game will be a toss-up. A friendly bookie accepts bets at fair odds on every single game. Starting with an initial capital of $10,000, your mission is to place your bets and eventually double the money if Miami wins the championship. Show that there is one and only one way to accomplish this mission. Hint. Formulate the problem rigorously by the martingale concept. Think of expected value after each game.

153 Roll a special die repeatedly, where P{ a } = 1/2 P{ b } = 1/3 P{ c } = 1/6 The average waiting time for the pattern aba = ? How is this calculated and why so? Non-binary non-uniform pattern 14. S.-Y. R. Li, “A martingale approach to the study of occurrence of sequence patterns in repeated experiments,” Annals of Probability, v.8, pp , 1980.

154 a b a 1 =2 =3 =1 =2 =3 = Formula. The average waiting time for the pattern aba is P{a} 1 +1 + P{b} 2 +2 + P{a} 33 How to calculate average waiting time? Calculating 3 bits:

155 1/ / /2 1 = 14 Formula. The average waiting time for the pattern aba is Worth 12 Worth How to calculate average waiting time? 1 =2 =3 =1 =2 =3 =

156 Proof by an artificial casino A gambler arrives with $1 and bets on the pattern aba by rolling the special die: This is fair gamble! If a appears on the 1 st roll, he receives $2 in total. Then, parlay on the 2 nd roll. If b appears, he receives $6 in total. Then, parlay on the 3 rd roll. If a appears again, he receives $12 in total. Game over.

157 Gambling team enters the artificial casino Y 1 = a = 1= 2

158 = 0 = 2 Gambling team enters the artificial casino Y 1 = a = 1 Y 2 = c

159 = 0 Gambling team enters the artificial casino Y 1 = a Y 2 = c Y 3 = b Y 4 = a Y 5 = a Y 6 = b Y 7 = a = 1 = 0= 2= 0= 2 = 6= 12 Total receipt = regardless of the outcome of die rolling.  EN = 14 Because this is fair gamble, 0 = E[Net gain upon stopping] = 14 – EN Net gain of the team upon stopping = – N, where N is the random variable representing the waiting time for aba.

160 1/ / /2 1 = 14 Average waiting time for a pattern Formula. The average waiting time for the pattern aba is 3 rd last gambler has $12 in the end Last gambler has $2 in the end

161 Prof. Bob Li Lecture 5 Oct. 13, 2011

162 Generalizing the example Formula. For a pattern B = b 1 b 2 …b m, calculate  j as before. Then, the average waiting time for B is P{Y=b 1 }  1 + P{Y=b 2 }  m-2 + …  m-1 + P{Y=b m } mm Prof. Bob Li

163 Prof. Bob LI 1/  aba  aba = 1/ /2 1 = 14 Example. The average waiting time for the pattern aba is The correlation operator  1 =2 =3 =1 =2 =3 = 1 0 1

164  acab  abc = a c a b a b c Correlation  between two different patterns P{Y=a}  1 + P{Y=b} 2 +2 + P{Y=c} 33 1 =2 =3 =1 =2 =3 = 0 1 0

165 At one point, the outcome pattern is acab. Y 1 = aY 2 = cY 3 = aY 4 = b = 0 acab  abc = Artificial casino for betting on pattern abc = 6= 0

166 Prof. Bob LI 1/2 1 +1 + AB =AB = 2 +2 + 3 +3 + 44 = 2  binary(  4  3  2  1 ) Example. Av. wait for HTHH = HTHH  HTHH = 2  binary(1001) = 18 Av. wait for HTHH = THTH  THTH = 2  binary(1010) = 20 HTHH  THTH = 2  binary(0000) = 0 THTH  HTHH = 2  binary(0101) = 10 Correlation  in the special case of fair-coin toss

167 Prof. Bob LI THTH vs. HTHH There are two teams of gamblers called Tortoises and Hares. Before each toss of the fair coin, a new Tortoise joins the casino and bets $1 on the pattern THTH and a new Hare bets $1 on the pattern HTHH. Denote: N A = waiting time for A = THTH in coin tossing N B = waiting time for B = HTHH in coin tossing N = min{N A, N B }, a stopping time for coin-toss process p = P{THTH prevails upon the stopping time N} // 1  p = P{HTHH prevails upon the stopping time N} We want to calculate p.

168 Let X n be the fortune of Tortoise minus the fortune of Hare after n games. The process X 1, X 2, X 3, … is a martingale, because the gamble is fair. (X N | The pattern THTH prevails) = A  A  A  B (X N | The pattern HTHH prevails) = B  A  B  B By the Martingale Stopping Theorem, 0 = EX N = p(A  A  A  B) + (1  p)(B  A  B  B) Hence p : (1  p) = (B  B  B  A) : (A  A  A  B) = (18  0) : (20  10) = 9 : 5 Equivalently, p = 9/14. Homework. Compute P(A occurs before B) with a biased coin where P(H) = 0.4. Prof. Bob LI

169 Alternative calculation of THTH vs. HTHH B  B = EN B = EN + E[N B  N] = EN + p E[N B  N | N = N A ] + (1  p) E[N B  N | N = N B ] = EN + p(B  B  A  B) + 0  EN = p A  B+ (1  p) B  B Symmetrically, EN = (1  p) B  A + p A  A  p A  B+ (1  p) B  B = p A  A+ (1  p) B  A p : (1  p) = (B  B  B  A) : (A  A  A  B) = (18  0) : (20  10) = 9 : 5 This alternative calculation can also calculate EN = 90/7. Prof. Bob LI

170 Let Y 1, Y 2, …, Y n, … be i.i.d. representing outcomes of repeated experiments. Given a collection of sequence patterns: T, T, G, C, T A, T, G, C G, G, G, G, G, G, G C, C, A C, A, T, C What is the probability for each pattern to win the race? What is the average waiting time until a winner emerges? The general problem Prof. Bob LI

171 Let Y 1, Y 2, …, Y n, … be i.i.d. representing outcomes of repeated experiments. Given a collection of sequence patterns: T, T, G, C, T A, T, G, C G, G, G, G, G, G, G C, C, A C, A, T, C What is the probability for each pattern to win the race? What is the average waiting time until a winner emerges? Any number of patterns Uneven lengths Non-uniform probability distribution Non-binary: A, T, G, C, … Seems a hard problem! The general problem

172 Prof. Bob LI The Main Theorem The Main Theorem [Li '80] Main Theorem. Let the random variable N be the waiting time of the i.i.d. process Y 1, Y 2, …, Y n, … till any of the n competing patterns A 1, A 2, …, A n appears. Denote by p i the winning probability of A i. Then, // In lifetime, I rarely had such a simple solution to a seemingly formidable problem.

173 Prof. Bob LI The Main Theorem The Main Theorem [Li '80] Main Theorem +. Let the pattern A be present since the beginning. Let the random variable N be the stopping time of the i.i.d. process Y 1, Y 2, …, Y n, … till any of the n competing patterns A 1, A 2, …, A n appears. Denote by p i the winning probability of A i. Then,

174 Prof. Bob LI A lemma to the Main Theorem +. Lemma. Let pattern A be present since the beginning. Then, the expected waiting time for a pattern B is B  B  A  B // The pattern B is not a connected subsequence of pattern A. The lemma is the special case of Main Theorem + with only one pattern A 1 = B.

175 Prof. Bob LI The average waiting time for a pattern B = 2  L(B, B). The odds for pattern B to precede pattern A are [L(A, A)– L(A, B)] : [L(B, B)–L(B, A)] John Conway first discovered the integer binary(  k …  2  1 ) in fair coin-toss and called it the leading number L(A, B). His computation algorithms below were quoted by Martin Gardner’ Mathematical games column in [Scientific American, 1974]:  Applying the aforementioned Main Theorem to just one or two patterns in fair coin-toss Earlier results on fair coin-toss

176 Prof. Bob LI Even earlier, William Feller found that, in fair coin-toss: Average waiting time for HHHHHH = 126 Average waiting time for HHTTHH = 70 He also considered the biased coin with P{H} = p and P{T} = q. Through lengthy markov-chain argument: P{A run of m H’s precedes a run of n T’s.} Earlier results on coin-toss  Applying Main Theorem to particular coin-toss patterns

177 Mathematical irony The more general the concept, the more transparent is the theory. Any number of patterns Uneven lengths Non-binary Non-uniform probability distribution


Download ppt "IEG4140 Teletraffic Engineering Professor: S.-Y. Robert Li Rm. HSH734 (inside 727), x8369, Tutor: Diana Q. Wang."

Similar presentations


Ads by Google