Presentation is loading. Please wait.

Presentation is loading. Please wait.

. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.

Similar presentations


Presentation on theme: ". Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow."— Presentation transcript:

1 . Markov Chains as a Learning Tool

2 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow Markov Process Simple Example rain no rain 0.6 0.4 0.8 0.2 Stochastic Finite State Machine:

3 3 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow Markov Process Simple Example Stochastic matrix: Rows sum up to 1 Double stochastic matrix: Rows and columns sum up to 1 The transition matrix: Rain No rain Rain No rain

4 4 Markov Process Markov Property: X t +1, the state of the system at time t+1 depends only on the state of the system at time t X1X1 X2X2 X3X3 X4X4 X5X5 Stationary Assumption: Transition probabilities are independent of time ( t ) Let X i be the weather of day i, 1 <= i <= t. We may decide the probability of X t+1 from X i, 1 <= i <= t.

5 5 – Gambler starts with $10 (the initial state) - At each play we have one of the following: Gambler wins $1 with probability p Gambler looses $1 with probability 1-p – Game ends when gambler goes broke, or gains a fortune of $100 (Both 0 and 100 are absorbing states) 01 2 99 100 p p p p 1-p Start (10$) Markov Process Gambler’s Example 1-p

6 6 Markov process - described by a stochastic FSM Markov chain - a random walk on this graph (distribution over paths) Edge-weights give us We can ask more complex questions, like Markov Process 01 2 99 100 p p p p 1-p Start (10$)

7 7 Given that a person’s last cola purchase was Coke, there is a 90% chance that his next cola purchase will also be Coke. If a person’s last cola purchase was Pepsi, there is an 80% chance that his next cola purchase will also be Pepsi. coke pepsi 0.1 0.9 0.8 0.2 Markov Process Coke vs. Pepsi Example transition matrix: coke pepsi coke pepsi

8 8 Given that a person is currently a Pepsi purchaser, what is the probability that he will purchase Coke two purchases from now? Pr [ Pepsi  ?  Coke ] = Pr [ Pepsi  Coke  Coke ] + Pr [ Pepsi  Pepsi  Coke ] = 0.2 * 0.9 + 0.8 * 0.2 = 0.34 Markov Process Coke vs. Pepsi Example (cont) Pepsi  ? ?  Coke

9 9 Given that a person is currently a Coke purchaser, what is the probability that he will buy Pepsi at the third purchase from now? Markov Process Coke vs. Pepsi Example (cont)

10 10 Assume each person makes one cola purchase per week Suppose 60% of all people now drink Coke, and 40% drink Pepsi What fraction of people will be drinking Coke three weeks from now? Markov Process Coke vs. Pepsi Example (cont) Pr[X 3 =Coke] = 0.6 * 0.781 + 0.4 * 0.438 = 0.6438 Q i - the distribution in week i Q 0 = (0.6,0.4) - initial distribution Q 3 = Q 0 * P 3 =(0.6438,0.3562)

11 11 Simulation: Markov Process Coke vs. Pepsi Example (cont) week - i Pr[X i = Coke] 2/3 stationary distribution coke pepsi 0.1 0.9 0.8 0.2

12 How to obtain Stochastic matrix? u Solve the linear equations, e.g., u Learn from examples, e.g., what letters follow what letters in English words: mast, tame, same, teams, team, meat, steam, stem. 12

13 How to obtain Stochastic matrix? u Counts table vs Stochastic Matrix 13 Pastme\0 a01/7 5/700 e4/7001/702/7 m1/8 003/8 s1/503/5001/5 t1/70004/72/7 @03/8 2/800

14 Application of Stochastic matrix u Using Stochastic Matrix to generate a random word: l Generate most likely first letter l For each current letter generate most likely next letter 14 Aastme\0 a-127-- e4--5-7 m12--58 s1-4--5 t1---57 @-368-- If C[r,j] > 0, let A[r,j] = C[r,1]+C[r,2]+…+C[r,j] C

15 Application of Stochastic matrix u Using Stochastic Matrix to generate a random word: l Generate most likely first letter: Generate a random number x between 1 and 8. If 1 <= x <= 3, the letter is ‘s’; if 4 <= x <= 6, the letter is ‘t’; otherwise, it’s ‘m’. l For each current letter generate most likely next letter: Suppose the current letter is ‘s’ and we generate a random number x between 1 and 5. If x = 1, the next letter is ‘a’; if 2 <= x <= 4, the next letter is ‘t’; otherwise, the current letter is an ending letter. 15 Aastme\0 a-127-- e4--5-7 m12--58 s1-4--5 t1---57 @-368-- If C[r,j] > 0, let A[r,j] = C[r,1]+C[r,2]+…+C[r,j]

16 Supervised vs Unsupervised u Decision tree learning is “supervised learning” as we know the correct output of each example. u Learning based on Markov chains is “unsupervised learning” as we don’t know which is the correct output of “next letter”. 16

17 K-Nearest Neighbor u Features l All instances correspond to points in an n- dimensional Euclidean space l Classification is delayed till a new instance arrives l Classification done by comparing feature vectors of the different points l Target function may be discrete or real-valued

18 1-Nearest Neighbor

19 3-Nearest Neighbor

20 20 Example: Identify Animal Type 14 examples 10 attributes 5 types What’s the type of this new animal?

21 K-Nearest Neighbor u An arbitrary instance is represented by(a 1 (x), a 2 (x), a 3 (x),.., a n (x)) l a i (x) denotes features u Euclidean distance between two instances d(x i, x j )=sqrt (sum for r=1 to n (a r (x i ) - a r (x j )) 2 ) u Continuous valued target function l mean value of the k nearest training examples

22 Distance-Weighted Nearest Neighbor Algorithm u Assign weights to the neighbors based on their ‘distance’ from the query point l Weight ‘may’ be inverse square of the distances  All training points may influence a particular instance  Shepard’s method

23 Remarks + Highly effective inductive inference method for noisy training data and complex target functions + Target function for a whole space may be described as a combination of less complex local approximations + Learning is very simple - Classification is time consuming (except 1NN)


Download ppt ". Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow."

Similar presentations


Ads by Google