Inferring phylogenetic trees: Distance and maximum likelihood methods

Inferring phylogenetic trees: Distance and maximum likelihood methods
GENOME 373: Genomic Informatics Prof. William Stafford Noble

Outline Distance methods Maximum likelihood Fitch-Margoliash
Neighbor joining UPGMA Maximum likelihood

One-minute responses Is the parsimony model biologically accurate?
No. Parsimony ignores back-mutation, parallel mutation, etc. The following tree can have a score of 2 or 3, correct? Correct. However, the idea of parsimony is to select the tree with the smallest number of mutations along the tree. Is it biologically acceptable to make the assumptions of the JC model? No. The assumptions are made for statistical reasons – essentially, we often don’t know the proper values for the more parameter-rich models. What other considerations can be taken to get a better tree? The most important ones are site-by-site variation in mutation rate, and dependencies between adjacent sites. Is there any way to check whether the tree obtained is significant? You can check whether individual branches are significant using something called “bootstrap analysis.” Still unclear how to use these trees in a biological way. Primarily, these trees are used to understand evolutionary history. Will we be using any of the phylogeny software in this class? No.

One-minute responses What’s a real event that is your “oracle” that tells you the true evolutionary history of substitutions for Jukes-Cantor? There is no oracle, and luckily, you don’t need one in order for Jukes-Cantor to work. It was difficult to understand how you were computing parsimony scores at first.

Distance methods Fitch-Margoliash Neighbor-joining UPGMA Multiple
sequence alignment Pairwise distance matrix Phylo- genetic tree

Star topology Sum of all branches is S*=a+b+c+d+e.
Summing all distances in the matrix counts each edge four times (e.g., dAB, dAC, dAD and dAE). Hence, the sum of all distances in the matrix is 4S*.

Adding one branch Sum of branches is S = a + b + c + d + e + f
= (dAC + dAD + dAE + dBC + dBD + dBE)/6 + dAB/2 + (dCD + dCE + dDE)/3

Neighbor joining Add one branch to the star topology and compute the difference between S* and S. Repeat for each pair of leaves in the tree. Choose the pair that yields the largest difference (the closest neighbors). Join that pair. Repeat until all pairs are joined.

UPGMA Unweighted pair group method with arithmetic mean.
Also known as agglomerative hierarchical clustering. Basic idea: iteratively connect the two most closely related sequences.

UPGMA Scer Spar Smik Sbay Skud Scas Sklu 30 40 32 323 253 31 26 17 201
30 40 32 323 253 31 26 17 201 229 25 35 290 219 298 227 316 243 322 300 315 95 226

UPGMA Find the smallest off-diagonal element in the matrix. Scer Spar
Smik Sbay Skud Scas Sklu 30 40 32 323 253 31 26 17 201 229 25 35 290 219 37 298 227 316 243 322 300 315 95 226 Find the smallest off-diagonal element in the matrix.

UPGMA Compute the average between the two rows and columns. Scer Spar
Smik Sbay Skud Scas Sklu 30 40 32 323 253 31 26 17 201 229 25 35 290 219 37 298 227 316 243 322 300 315 95 226 Compute the average between the two rows and columns.

UPGMA Scer Spar Smik Sbay Skud Scas Sklu 30 36 323 253 31 21.5 201 229
30 36 323 253 31 21.5 201 229 31.5 32.5 294 222.5 316 243 322 300 315 95

UPGMA Each merger creates a subtree. Smik Sbay Scer Spar Smik-Sbay
Skud Scas Sklu 30 36 323 253 31 21.5 201 229 31.5 32.5 294 222.5 316 243 322 300 315 95 Smik Sbay Each merger creates a subtree.

Maximum likelihood for each possible tree
for each column of the alignment compute the likelihood of the column, given the tree return the tree with the highest likelihood Similar to parsimony, but capable of using a model of evolution. Computationally expensive. DNAML is the Phylip program for maximum likelihood. FastDNAML is a fast clone (

Computing the likelihood
ACGCGTTGGG ACGCAATGAA ACACAGGGAA + Pr(column|tree,model) T T A G What is the probability of observing this column, given this tree and an assumed model of evolution?

A C G A A A A A A T T T T A G T A G T A G Solution: Enumerate all possible assignments to the internal nodes. Compute the probability of each tree, and sum.

ACGCGTTGGG ACGCAATGAA ACACAGGGAA + A Pr(column|tree,model) T A T T A G What is the probability of observing this column, given this assigned tree and an assumed model of evolution?

The probability of observing a substitution from A to T on a branch of length m is given by the evolutionary model. πA, πC, πG, πT The probability of the ancestral observation being A is just πA. A m T A T T A G

πA, πC, πG, πT L0 A L1 L2 T A L5 L3 L4 L6 T T A G The desired probability is the product of the probabilities of the branches. L(tree) = L0  L1  L2  L3  L4  L5  L6

A C G A A A A A A T T T T A G T A G T A G tree1 tree2 tree3 The probability of the tree is the sum of the probabilities of the individual trees. L(tree) = L(tree1) + L(tree2) + L(tree3) + …

Maximum likelihood revisited
for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability

Maximum likelihood revisited
for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability Multiply probabilities of independent events. Add probabilities of mutually exclusive events.

Overview Parsimony Distance methods Maximum likelihood
Computing distances Finding the tree Fitch-Margoliash Neighbor-joining UPGMA Maximum likelihood

Inferring phylogenetic trees: Distance and maximum likelihood methods

Similar presentations

Presentation on theme: "Inferring phylogenetic trees: Distance and maximum likelihood methods"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inferring phylogenetic trees: Distance and maximum likelihood methods

Similar presentations

Presentation on theme: "Inferring phylogenetic trees: Distance and maximum likelihood methods"— Presentation transcript:

Similar presentations

About project

Feedback