Presentation is loading. Please wait.

Presentation is loading. Please wait.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological.

Similar presentations


Presentation on theme: "CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological."— Presentation transcript:

1 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark gorm@cbs.dtu.dk

2 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods 1.Construct multiple alignment of sequences 2.Construct table listing all pairwise differences (distance matrix) 3.Construct tree from pairwise distances Gorilla : ACGTCGTA Human : ACGTTCCT Chimpanzee: ACGTTTCG GoHuCh Go-44 Hu-2 Ch- Go Hu Ch 2 1 1 1

3 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Optimal Branch Lengths: Least Squares Fit between given tree and observed distances can be expressed as “sum of squared differences”:Fit between given tree and observed distances can be expressed as “sum of squared differences”: Q =  (D ij - d ij ) 2 Q =  (D ij - d ij ) 2 Find branch lengths that minimize Q - this is the optimal set of branch lengths for this tree.Find branch lengths that minimize Q - this is the optimal set of branch lengths for this tree. S1 S3 S2 S4 a b c d e Distance along tree D 12  d 12 = a + b + c D 13  d 13 = a + d D 14  d 14 = a + b + e D 23  d 23 = d + b + c D 24  d 24 = c + e D 34  d 34 = d + b + e Goal: j>i

4 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Superimposed Substitutions Actual number ofActual number of evolutionary events:5 Observed number ofObserved number of differences:2 Distance is (almost) always underestimatedDistance is (almost) always underestimated ACGGTGC C T GCGGTGA

5 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Model-based correction for superimposed substitutions Goal: try to infer the real number of evolutionary events (the real distance) based onGoal: try to infer the real number of evolutionary events (the real distance) based on 1. Observed data (sequence alignment) 2. A model of how evolution occurs

6 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Jukes and Cantor Model Four nucleotides assumed to be equally frequent (f=0.25)Four nucleotides assumed to be equally frequent (f=0.25) All 12 substitution rates assumed to be equalAll 12 substitution rates assumed to be equal Under this model the corrected distance is:Under this model the corrected distance is: D JC = -0.75 x ln(1-1.33 x D OBS ) For instance:For instance: D OBS =0.43 => D JC =0.64 ACGT A -3     C    G    T   

7 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Other models of evolution

8 CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS General Time Reversible Model Time-reversibility: The amount of change from state x to y is equal to the amount of change from y to x π A x P AG = π G x P GA => π A x π G x  = π G x π A x 


Download ppt "CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Distance Matrix Methods: Models of Evolution Anders Gorm Pedersen Molecular Evolution Group Center for Biological."

Similar presentations


Ads by Google