Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dayhoff’s Markov Model of Evolution. Brands of Soup Revisited Brand A Brand B P(B|A) = 2/7 P(A|B) = 2/7.

Similar presentations


Presentation on theme: "Dayhoff’s Markov Model of Evolution. Brands of Soup Revisited Brand A Brand B P(B|A) = 2/7 P(A|B) = 2/7."— Presentation transcript:

1 Dayhoff’s Markov Model of Evolution

2 Brands of Soup Revisited Brand A Brand B P(B|A) = 2/7 P(A|B) = 2/7

3 Brands of Soup Revisited Brand A Brand B P(B|A) = p = 2/7 P(A|B) = p = 2/7 P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1) P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1) Transition Diagram Conditional Probability Formulas

4 Brands of Soup Revisited Brand A Brand B P(B|A) = p = 2/7 P(A|B) = p = 2/7 P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1) P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1) Transition Diagram Conditional Probability Formulas Matrix Representation

5 Brands of Soup Revisited Brand A Brand B P(B|A) = p = 2/7 P(A|B) = p = 2/7 P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1) P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1) Transition Diagram Conditional Probability Formulas Matrix Representation

6 Brands of Soup Revisited Brand A Brand B P(B|A) = p = 2/7 P(A|B) = p = 2/7 P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1) P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1) Transition Diagram Conditional Probability Formulas Matrix Representation

7 Brands of Soup Revisited Brand A Brand B P(B|A) = p = 2/7 P(A|B) = p = 2/7 P(Ak)= P(Ak-1) (1-p)+P(Bk-1 ) p = 5/7 P(Ak-1) + 2/7 P(Bk-1) P(Bk)= P(Ak-1 ) p + P(Bk-1) (1-p) = 2/7 P(Ak-1) + 5/7 P(Bk-1) Transition Diagram Conditional Probability Formulas Matrix Representation

8 Markov Processes Can Be Represented by Matrices e.g., a 3-state process: 1/2 1/3 1/4 Can be represented with this matrix:

9 Each Step Involves an Inner Product

10

11 Markov Matrix Properties Sum of probabilities in a row must be 1 No change = diagonal matrix If well-behaved*, multiplying the matrix by itself many times converges to a limit –This limit matrix has identical column elements –The rows of the limit matrix are the “equilibrium probabilities” for the process *(1) Every state can transition to every other state at least indirectly, and (2) the least common denominator of any cycle in the transition diagram is 1

12 Ask Mathematica! Recall m =

13 Margaret Dayhoff Had a large (for 1978) database of related proteins DAYHOFF, M. O., R. M. SCHWARTZ, and B. C. ORCUTT. 1978. A model of evolutionary change in proteinsA model of evolutionary change in proteins. (pp 345-352 in M. 0. DAYHOFF, ed. Atlas of protein sequence and structure. Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C.) Asked “what is the probability that two aligned sequences are related by evolution?”

14 Dayhoff Model Amino acids change over time independently of their position in a protein. (simplifying assumption) The probability of a substitution depends only on the amino acids involved and not on the prior history (Markov model).

15 A Sequence Alignment >gi|1173266|sp|P44374|RS5_HAEIN 30S ribosomal protein S5 Length = 166 Score = 263 bits (672), Expect = 1e-70 Identities = 154/166 (92%), Positives = 159/166 (95%) Query: 1 MAHIEKQAGELQEKLIAVNRVSKTVKGGRIFSFTALTVVGDGNGRVGFGYGKAREVPAAI 60 M++IEKQ GELQEKLIAVNRVSKTVKGGRI SFTALTVVGDGNGRVGFGYGKAREVPAAI Sbjct: 1 MSNIEKQVGELQEKLIAVNRVSKTVKGGRIMSFTALTVVGDGNGRVGFGYGKAREVPAAI 60 Query: 61 QKAMEKARRNMINVALNNGTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGV 120 QKAMEKARRNMINVALN GTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGV Sbjct: 61 QKAMEKARRNMINVALNEGTLQHPVKGVHTGSRVFMQPASEGTGIIAGGAMRAVLEVAGV 120 Query: 121 HNVLAKAYGSTNPINVVRATIDGLENMNSPEMVAAKRGKSVEEILG 166 NVL+KAYGSTNPINVVRATID L NM SPEMVAAKRGK+V+EILG Sbjct: 121 RNVLSKAYGSTNPINVVRATIDALANMKSPEMVAAKRGKTVDEILG 166 (Example alignment from a BLAST search)

16 Observed Substitution Frequencies A R30 N10917 D1540532 C331000 Q9312050760 E2660948310422 G579101561621030112 H2110322643102432310 I663036131783503 L9517370075151740253 K5747732285014710460234339 M2917000207705720790 F20770000172090167017 P345672710 93404950743 47 S7721374329811747864502620321682040269 T5902016957103731501412952200281073696 W027300000301300100170 Y203360300100401323100260022236 V36520131733273797306613031777105043186017 ARNDCQEGHILKMFPSTWY

17 Building a Markov Model From the observed substitution data, Dayhoff et al. were able to estimate the joint probabilities of two amino acids substituting for eachother. This yields a big, diagonally symmetric matrix of probabilities. The diagonal elements M ab are close to 1. But the matrix of joint probabilities, P(b∩a) does not represent a Markov process. Recall the elements of a Markov process’ matrix are conditional probabilities, P(b|a) = P(b∩a) / P(a). P(a) is just the probability (frequency) of an amino acid, so each column in M ab is divided by the frequency of the corresponding amino acid. The diagonal elements are still all close to 1. Dayhoff then adjusts the small non-diagonal elements by a common factor that makes the expected number of amino acid substitutions equal to 1 in 100. The diagonal elements are then adjusted to make each row add up to 1 as required by the law of total probability. This is the PAM1 Markov matrix (PAM = Point Accepted Mutation; 1 = 1% substitution frequency).

18 Using the PAM Model The PAM1 Markov matrix can be multiplied by itself to yield the PAM2 Markov matrix, and again to yield the PAM3 matrix, etc. PAM1 is a “unit of evolutionary distance”. PAM250 is commonly used. Note that 250% of the amino acids have not been substituted – it’s more like 80%. The PAM Markov Matrices arrived at by matrix multiplication need to be converted into the scoring matrices that one would use for BLAST or CLUSTALW.

19 Probability of an Alignment In a random model, the probability of the independent alignment of two proteins x and y is the product of the probabilities { q a } for all the amino acids. In a match model, the proteins have descended from a common ancestor protein and the amino acid sequences are no longer independent. In this model, the probability can be expressed as a matrix of joint probabilities {{ p ab }} Dayhoff and coworkers could estimate these probabilities from the frequencies of amino acid substitutions she observed in her database of evolutionarily related proteins. (Note that the { q i } are not all the same value of 1/20.) (Note that the p ij = p ji because neither protein is “first”.)

20 A Log-Odds Score We are interested in the ratio of the match model probability of alignment to the random model probability: In practice, we usually take the log of these quantities for a substitution “scoring” matrix. This changes the multiplications into additions and reduces round-off error. S(a,b) defines the number you usually see in a substitution matrix. These numbers are usually rounded to integers to ease computation.

21 Questions? I will post a Mathematica notebook.


Download ppt "Dayhoff’s Markov Model of Evolution. Brands of Soup Revisited Brand A Brand B P(B|A) = 2/7 P(A|B) = 2/7."

Similar presentations


Ads by Google