Download presentation
Presentation is loading. Please wait.
Published bySilvester Anthony Modified over 9 years ago
1
Mareike Fischer How many characters are needed to reconstruct the true tree? Mareike Fischer and Mike Steel Future Directions in Phylogenetic Methods and Models, 17 – 21 Dec 07
2
Mareike Fischer The Problem Given: Sequence of characters (e.g. DNA) Wanted: Reconstruction of the ‘true’ tree Solution: Maximum Parsimony, Maximum Likelihood, etc. But: Is the sequence long enough for a reliable reconstruction?
3
Mareike Fischer Previous Approaches 1.Churchill, von Haeseler, Navidi (1992) 4 taxa scenario Observations: The probability of reconstructing the true tree increases with the length of the interior edge. “Bringing the outer nodes closer to the central branch can increase [this probability] dramatically.” more characters Rec. Prob. int. edge
4
Mareike Fischer Previous Approaches 2. Yang (1998) 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length 5 different tree-shapes were investigated Observations: ‘Farris Zone’: MP better ‘Felsenstein Zone’: ML better The optimal length for the interior edge ranges between 0.015 and 0.025. Tree length Rec. Prob.
5
Mareike Fischer Our Approach Limitation: Most previous approaches are based on simulations. Our approach: Mathematical analysis of influence of branch lengths on tree reconstruction. We investigate MP first and consider other methods afterwards.
6
Mareike Fischer Already known x y y y y Here, the number k of characters needed to reconstruct the true tree grows at rate. But what happens if we fix the ratio (y:=px), and then take the value of x that minimizes k? Steel and Székely (2002):
7
Mareike Fischer Our Approach Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, 2-state symmetric model. x px
8
Mareike Fischer Main Result k grows at least at rate p 2 For the optimal value of x, k grows at rate p 2 For ‘reliable’ MP reconstruction:
9
Mareike Fischer The constants c ε and c ε ’ determine the size ε of the area under the curve of the Standard Normal Distribution. The Standard Normal Distribution
10
Mareike Fischer Idea of Proof: 1. Applying the CLT. Then (by CLT) Set X i i.i.d., and Note that the true tree T 1 will be favored over T 2 if and only if Z k >0.
11
Mareike Fischer Idea of Proof: 2. The Hadamard Representation Since the X i are i.i.d., μ k and σ k depend only on k and the probabilities P(X 1 =1) and P(X 1 =-1). These probabilities can using the ‘Hadamard Representation’: (Here, θ=e -2x.) Thus, for fixed p, the ratio to find a value of x that minimizes k. Note that P(X 1 =1) and P(X 1 =-1) only depend on x and p. can be used
12
Mareike Fischer The Hadamard Representation
13
Mareike Fischer Idea of Proof: 2. X i are i.i.d. Since the X i are i.i.d., we have
14
Mareike Fischer Summary and Extension For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p 2. Can other methods do better (e.g. rate p)? No! [Can be shown using the ‘Hellinger distance’.]
15
Mareike Fischer The Hellinger Distance S: set of site patterns p, q: probability distributions
16
Mareike Fischer Outlook Questions for future work: What happens when you approach the ‘Felsenstein Zone’? What happens in general with different tree shapes or more taxa?
17
Mareike Fischer Thanks… … to my supervisor Mike Steel, … to the Newton Institute for organizing this great conference, … to the Allan Wilson Centre for financing my research, … to YOU for listening or at least waking up early enough to read this message.
18
Mareike Fischer The only true tree… Merry Christmas! … is a Christmas tree. (And it does not even require reconstruction!)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.