Presentation on theme: "Markov chains Assume a gene that has three alleles A, B, and C. These can mutate into each other. Transition probabilities Transition matrix Probability."— Presentation transcript:
Markov chains Assume a gene that has three alleles A, B, and C. These can mutate into each other. Transition probabilities Transition matrix Probability matrix Left probability matrix: The column sums add to 1. Right probability matrix: The row sums add to 1. Transition matrices are always square The trace contains the probabilities of no change. A B C A B C 68% of A stays A, 12% mutates into B and 20% into C. 7% mutates from B to A and 10% from C to A.
Calculating probabilities Probabilities to reach another state in the next step. Probabilities to reach another state in exactly two steps. The probability to reach any state in exactly n steps is given by
Assume for instance you have a virus with N strains. Assume further that at each generation a strain mutates to another strain with probabilities a i →j. The probability to stay is therefore 1-Σa i →j. What is the probability that the virus is after k generations the same as at the beginning?
Initial allele frequencies Allele frequencies in the first generation Given initial allele frequencies. What are the frequencies in the next generation?
A Markov chain is a process where step n depends only on the transition probabilities at step n-1 and the realized values at step n. A Marcov chain doesn’t have a memory. Andrey Markov (1856-1922) Transition probabilities might change. The model assumes constant transition probabilities.
Does our mutation process above reach in stable allele frequencies or do they change forever? Do we get stable frequencies? X n is a steady-state, stationary probability, or equilibrium vector. The associated eigenvalue is 1. The equilibrium vector is independent of the initial conditions. The largest eigenvalue (principal eigenvalue) of every probability matrix equals 1 and there is an associated stationary probability vector that defines the equilibrium conditions (Perron- Frobenius theorem).
Eigenvalues and eigenvectors of probability matrices Column sums of probability matrices are 1. Row sums might be higher. The eigenvalues of probability matrices and their transposes are identical. One of the eigenvalues of a probability matrix is 1. If one of the entries of P is 1, the matrix is called absorbing. In this case the eigenvector of the largest eigenvalue contains only zeros and one 1. Absorbing chains become monodominant by one element. To get frequencies the eigenvector has to be rescaled (normalized).
Normalizing the stationary state vector Frequencies have to add to unity! Stationary frequencies
Final frequencies The sum of the eigenvector entries have to be rescaled. N=1000
Do all Markov chains converge? Closed part Recurrent part Periodic chain Recurrent and aperiodic chains are called ergodic. The probability matrix theorem tells that every irreducible ergodic transition matrix has a steady state vector T to which the process converges. You can leave every state. State D cannot be left. The chain is absorbing.
Absorbing chains A C D B It is impossible to leave state D A chain is called absorbing if it containes states without exit. The other states are called transient. Any absorbing Markov chain finally converges to the absorbing states. Closed part Absorbing part
The time to reach the absorbing state HomeBar Assume a druncard going randomly through five streets. In the first street is his home, in the last a bar. At either home or bar he stays. 0.5
The canonical form We rearrange the transition matrix to have the s absorbing states in the upper left corner and the t transient states in the lower right corner. We have four compartments After n steps we have; The unknown matrix contains information about the frequencies to reach an absorbing state from stateB, C, or D. Transient part
Multiplication of probabilities gives ever smaller values Simple geometric series The entries n ij of the matrix B contain the probabilities of ending in an absorbing state i when started in state j. The entries n ij of the fundamental matrix N of Q contain the expected numbers of time the process is in state i when started in state j.
The sum of all rows of N gives the expected number of times the chain is is state i (afterwards it falls to the absorbing state). t is a column vector that gives the expected number of steps (starting at state i) before the chain is absorbed. The druncard’s walk The expected number of steps to reach the absorbing state. The probability of reaching the absorbing state from any of the transient states.
Expected return (recurrence) times C A D E B If we start at state D, how long does it take on average to return to D? If u is the rescaled eigenvector of the probability matrix P, the expected return time t ii of state i back to i is given by the inverse of the i th element u i of the eigenvector u. The rescaled eigenvector u of the probability matrix P gives the steady state frequencies to be in state i. 0.33 0.25 0.05 0.15 0.25 0.50 0.35 In the long run it takes about 9 steps to return to D
First passage times in ergodic chains If we start at state D, how long does it take on average to reach state A? C A D E B 0.33 0.25 0.05 0.15 0.25 0.50 0.35 Applied to the original probability matrix P the fundamental matrix N of P contains information on expected number of times the process is in state i when started in state j. D C A D E B D E B A C A 0.25 0.05 0.25 0.33 0.15 0.25 0.330.350.05 0.0125 0.012375 0.00144375 We have to consider all possible ways from D to A. The inverse of the sum of these probabilities gives the expected number of steps to reach from point j to point k. The fundamental matrix of an ergodic chain D E D C A …… 0.25 0.33 0.25 0.05 0.00103125 W is the matrix containing only the rescaled stationary point vector. The expected average number of steps t jk to reach from j to k comes from the entries of the fundamental matrix N divided through the respective entry of the (rescaled) stationary point vector.
You have sunny, cloudy, and rainy days with respective transition probabilities. How long does it take for a sunny day to folow a rainy day? How long does it take that a sunny day comes back?
Probabilities of DNA substitution We assume equal substitution probabilities. If the total probability for a substitution is p: A T CG p p p p p The probability that A mutates to T, C, or G is P ¬A =p+p+p The probability of no mutation is p A =1-3p Independent events The probability that A mutates to T and C to G is P AC =(p)x(p) p(A →T)+ p(A →C) +p(A →G) +p(A →A) =1 The construction of evolutionary trees from DNA sequence data
The probability matrix A T C G A T C G What is the probability that after 5 generations A did not change? The Jukes - Cantor model (JC69) now assumes that all substitution probabilities are equal.
Arrhenius model The Jukes Cantor model assumes equal substitution probabilities within these 4 nucleotides. Substitution probability after time t Transition matrix Substitution matrix t A,T,G,C A The probability that nothing changes is the zero term of the Poisson distribution The probability of at least one substitution is The probability to reach a nucleotide from any other is The probability that a nucleotide doesn’t change after time t is
Probability for a single difference This is the mean time to get x different sites from a sequence of n nucleotides. It is also a measure of distance that dependents only on the number of substitutions What is the probability of n differences after time t? We use the principle of maximum likelihood and the Bernoulli distribution
Gorilla Pan paniscus Pan troglodytes Homo sapiens Homo neandertalensis Time Divergence - number of substitutions Phylogenetic trees are the basis of any systematic classificaton