Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca How to date Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca.

Similar presentations


Presentation on theme: "Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca How to date Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca."— Presentation transcript:

1 Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
How to date Xuhua Xia

2 Objectives Two major objectives of molecular phylogenetics
Branching patterns (speciation or gene duplication events) Dating of the speciation or gene duplication events Classification of methods Criteria used Maximum likelihood (ML) method (e.g., PAML) Bayesian methods (e.g., BEAST) Least-squares (LS) method based on distance-based matrices (e.g., DAMBE) Hard- or soft-bound Global or local clock Data needed A topology A set of aligned sequences OR a distance matrix satisfying the molecular clock hypothesis (either globally or locally) Calibration points One or more from fossil record From sampling time for rapidly evolving species (e.g., RNA viruses) Slide 2

3 Rationale Given two species i and j, we can compute the evolutionary distance between them (dij which is the number of substitutions per site), but how do we know the time (t) that these two species have diverged from their common ancestor? Knowing dij alone does not give us time. We also need to know the rate of change (r, which is equivalent to speed). If we know that a runner runs at a constant speed of 10 km/hr and that he has covered a distance of 20 km, then he has run 2 hours ( = 20km/(10 km/hr) If we know that r is 0.02 substitutions/myr, and dij = 0.04 substitutions, then the two species have diverged from each other 2 myr (or 1 million each from their common ancestor). Estimation: r (rate) and T (time). Slide 3

4 The LS method in linear regression
Y = a + b x X Y R(Residual) 3 11.5 a+b*3 – 11.5 2 7.5 a+b*2 – 7.5 1 5 a+b*1 – 5 4 14 a+b*4 – 14 RSS = 0 means a perfect fit of the linear model to the data. A large RSS means a poor fit. Slide 4

5 The rational of the LS method
4 2 1 3 4 Sp1 Sp2 d12 Sp3 d13 d23 Sp4 d14 d24 d34 Slide 5

6 Multiple calibration points
4 2 1 3 4 Sp1 Sp2 d12 Sp3 d13 d23 Sp4 d14 d24 d34 Slide 6

7 Slide 7

8 a) b) gibbon sumatran orangutan gorilla
20.655±1.221 3.104±0.273 orangutan 14±0 gorilla Hard calibration point = 14 million years 7.079±0.527 bonobo 1.754±0.184 a) chimpanzee 7±0 Hard calibration point = 7 million years human gibbon sumatran 20.903±1.503 3.206±0.280 orangutan 14.757±0.217 gorilla Soft calibration point = 14 million years bonobo 7.258±0.530 b) 1.818±0.180 Soft calibration point = 7 million years chimpanzee 5.487±0.434 human

9 Dating with local clocks
OTU1 OTU2 7 OTU OTU a) b) 6 OTU4 T1 = 10 3 OTU3 3 t2 = 5 2 r2 OTU2 RSS = 0 r0 = 0.6, r1 = 3, r2 = 1.2 2 t3 = 5 r1 OTU1 c) OTU4 T1 = 10 OTU3 t2 = OTU2 RSS = r = t3 = OTU1

10 Method comparison RY07 BEAST Slide 10

11 calibration time = 10 Myr Pan 7.890±1.308 Homo calibration time = 35 Myr 9.450±1.522 Gorilla 12.988±2.053 32.564±4.103 Pongo 56.059±5.436 Macaca Callithrix Daubentonia Cheirogaleus 26.761±3.528 Mirza calibration time = 77 Myr 7.9±1.4 M.ravelobensis M.tavaratra 21.639±2.906 78.210±1.871 M.berthae 49.231±4.101 2.2±0.4 M.myoxinus 5.3±0.9 2.3±0.4 M.rufus1 37.682±3.785 4.617±0.808 M.rufus2 9.7±1.3 4.348±0.845 M.sambiranensis M.griseorufus 7.089±1.119 M.murinus 66.992±5.038 Lepilemur 36.351±2.849 Propithecus Hapalemur 26.049±2.955 9.608±1.533 Lemur 14.668±1.861 Eulemur 18.125±2.285 Varecia Loris 46.073±5.575 Galago

12 calibration time = 77 my calibration time = 35 my
Galago [ , ] Loris Varecia [ , ] Eulemur [ , ] Lemur [ , ] [6.8351, ] Hapalemur Propithecus [25.534, ] [53.26,79.827] Lepilemur M.murinus [4.8645,8.6395] M.griseorufus calibration time = 77 my [7.3,11.7] M.sambiranensis [2.7447,4.8025] M.rufus2 [ , ] [3.3,5.3] M.rufus1 [3.7,6.0] [1.6314,2.9896] [ ,21.595] M.myoxinus [ , ] [1.3,2.6] [68.927, ] M.berthae M.tavaratra An alternative way of presenting variation of the estimated time is by a confidence interval. [5.7784,9.3783] M.ravelobensis [ ,29.488] Mirza Cheirogaleus Daubentonia Callithrix [ , ] Macaca [ , ] Pongo [ , ] calibration time = 35 my calibration time = 10 my Gorilla [8.059, ] Homo [5.5629,9.3534] Pan

13

14 Rationale of Tip-Dating
t1=? t2=? t3=? t4=? t5=? 15 yr 25 yr 30 yr 20 yr 10 yr 40 50 30 40 20 RSS=(d12/r+15-2*t1)2+(d13/r+10-2*t3)2+(d14/r+20-2*t3)2 +(d15/r+30-2*t5)2+(d16/r+25-2*t5)2+(d23/r *t3)2 +(d24/r *t3)2+(d25/r *t5)2+(d26/r *t5)2 +(d34/r *t2)2+(d35/r *t5)2+(d36/r *t5)2 +(d45/r *t5)2+(d46/r *t5)2+(d56/r *t4)2 r = 0.01 Slide 14

15 Final dated tree s6@1965 1950 s5@1960 1940 s4@1970 1960 s3@1980 1950
Slide 15

16 Dates with standard deviation
1,902.81±8.42 1,875.51±13.38 1,817.23±16.71 1,791.98±21.08 1,792.77±22.00 1,770.96±24.30 1,766.78±25.64 Slide 16

17 Dating and cospeciation
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 Slide 17 P1

18 Dating and cospeciation
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 Coincidental cophylogeny

19

20

21 Dating gene duplication
S6B S5B S4B S3B S1B S2B Dating gene duplication events: 1. The gene duplication event occurred at T0. 2. Two approaches are used to approximate T0. If the duplicated genes, or their third codon positions, conform to molecular clock, then estimate T0 (next slide) If duplicated genes do not conform to molecular clock, then use genes that do conform to molecular clock to estimate T1, which underestimates T0. S11A S12A S10A S9A S8A S7A S6A S5A S4A S3A S1A S2A T2 T0 T1

22 Estimating T0 Ti and Ti' can be estimated with either nonsynonymous substitutions or synonymous substitutions, designated Ti.N and Ti.N', and Ti.S and Ti.S' Ideally, both paralogous genes conform to molecular clock and Ti = Ti', i = : Ti.N  Ti.N', and Ti.S  Ti.S': Rare. Ti.N  Ti.N', and Ti.S  Ti.S': Very rare Ti.N  Ti.N', and Ti.S  Ti.S': Common Ti.N  Ti.N', and Ti.S  Ti.S': Most common. S6B S5B S4B S3B S1B S2B T3 T2 T4 T1 T5 T0 T3' S2A T2' S1A T4' S3A T1' S4A T5' S5A S6A S7A S8A S9A S10A S12A S11A

23 Dating gene duplication
S2B The loss of the paralogous lineage leading to S5B and S6B (or S5A and S6A) leads to an underestimate of the gene duplication time, shifting from Node M to Node N. Failing to sample the lineage leading to S5B and S6B has the same effect. S11A S12A S10A S9A S8A S7A S6A S5A S4A S3A S1A S2A S1B N S3B M S4B S5B S6B

24 Application: HIV transmission in NA
Hypothesis 1 (Advocated by Haiti scientists): Some North Americans went to Africa and contracted HIV-1. They took advantage of the sex tourists in Haiti in 1960s and transmitted HIV-1 to Haitian women. Hypothesis 2: Haitians contracted HIV-1 earlier than the sex tourism in 1960s. Some North Americans went there to take advantage of the sex tourism, contracted HIV-1, and brought the disease to North America Differences in predictions in both HIV-1 topology and time of divergence. Slide 24 Xuhua Xia


Download ppt "Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca How to date Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca."

Similar presentations


Ads by Google