Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distance-based methods Xuhua Xia

Similar presentations


Presentation on theme: "Distance-based methods Xuhua Xia"— Presentation transcript:

1 Distance-based methods Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca

2 Xuhua Xia Slide 2 Lecture Outline Objectives in this lecture –Grasp the basic concepts distance-based tree-building algorithms –Learn the least-squares criterion and the minimum evolution criterion and how to use them to construct a tree Distance-based methods –Genetic distance: generally defined as the number of substitutions per site. JC69 distance K80 distance TN84 distance F84 distance TN93 distance LogDet distance –Tree-building algorithms (UPGMA): UPGMA Neighbor-joining Fitch-Margoliash FastME

3 Xuhua Xia Slide 3 Genetic Distances Genetic distances: Assuming a substitution model, we can obtain the genetic distance (i.e., difference) between two nucleotide or amino acid sequences, e.g., JC K80 TN93:

4 Xuhua Xia Slide 4 Calculation of K JC69 AACGACGATCG: Species 1 AACGACGATCG AACGACGATCG: Species 2 t t The time is 2t between Species 1 to Species 2 Sp1: AAG CCT CGG GGC CCT TAT TTT TTG || | ||| ||| | ||| ||| || Sp2: AAT CTC CGG GGC CTC TAT TTT TTT p = 6/24 = 0.25 K = 0.304099 Genetic distances are scaled to be the number of substitutions per site.

5 Xuhua Xia Slide 5 Numerical Illustration Sp1: AAG CCT CGG GGC CCT TAT TTT TTG || | ||| ||| | ||| ||| || Sp2: AAT CTC CGG GGC CTC TAT TTT TTT What are P and Q? P = 4/24, Q = 2/24 Comparison of distances: P = 0.25 Poisson P = -ln(1-p) = 0.288 K JC69 = 0.304099 K K80 = 0.3150786

6 Xuhua Xia Slide 6 Distance-based phylogenetic algorithms

7 Xuhua Xia Slide 7 A Star Tree (Completely Unresolved Tree) Human Chimpanzee Gorilla Orangutan Gibbon

8 Xuhua Xia Slide 8 Genetic Distance Matrix Matrix of Genetic distances (D ij ): HumanChimpGorillaOrangGibbon Human0.0150.0450.1430.198 Chimp0.0300.1260.179 Gorilla0.0920.179 Orang0.179 Gibbon

9 Xuhua Xia Slide 9 HumanChimpGorillaOrangGibbon Human0.0150.0450.1430.198 Chimp0.0300.1260.179 Gorilla0.0920.179 Orang0.179 Gibbon D (hu-ch),go = (D hu,go + D ch,go )/2 = 0.038 D (hu-ch),or = (D hu,or + D ch,or )/2 = 0.135 D (hu-ch),gi = (D hu,gi + D ch,gi )/2 = 0.189 hu-chGorillaOrangGibbon hu-ch0.0380.1350.189 Gorilla0.0920.179 Orang0.179 Gibbon Human Chimp Gorilla Orang Gibbon Gorilla Orang Gibbon Human Chimp UPGMA Orang Gibbon Gorilla Human Chimp (hu,ch),(go,or,gi) ((hu,ch),go),(or,gi)

10 Xuhua Xia Slide 10 HumanChimpGorillaOrangGibbon Human0.0150.0450.1430.198 Chimp0.0300.1260.179 Gorilla0.0920.179 Orang0.179 Gibbon D (hu-ch-go),or = (D hu,or + D ch,or + D go,or )/3 = 0.120 D (hu-ch-go),gi = (D hu,gi + D ch,gi +D go,gi )/3 = 0.185 hu-ch-goOrangGibbon hu-ch-go0.1200.185 Orangutan0.179 Gibbon D (hu-ch-go-or),gi = (D hu,gi + D ch,gi +D go,gi + D or,gi )/4 = 0.184 Orang Gibbon Gorilla Human Chimp Gibbon Orang Gorilla Human Chimp UPGMA (((hu,ch),go),or),gi)

11 Xuhua Xia Slide 11 Phylogenetic Relationship from UPGMA HumanChimpGorillaOrangGibbon Human0.0150.0450.1430.198 Chimp0.0300.1260.179 Gorilla0.0920.179 Orang0.179 Gibbon hu-chGorillaOrangGibbon hu-ch0.0380.1350.189 Gorilla0.0920.179 Orang0.179 Gibbon hu-ch-goOrangGibbon hu-ch-go0.1200.185 Orang0.179 Gibbon

12 Xuhua Xia Slide 12 Branch Lengths ((hu,ch),(go,or,gi)) (((hu,ch),go),(or,gi)) ((((hu,ch),go),or),gi) D hu-ch = 0.015 D (hu-ch),go = (D hu,go + D ch,go )/2 = 0.038 D (hu-ch),or = (D hu,or + D ch,or )/2 = 0.135 D (hu-ch),gi = (D hu,gi + D ch,gi )/2 = 0.189 D (hu-ch-go),or = (D hu,or + D ch,or + D go,or )/3 = 0.120 D (hu-ch-go),gi = (D hu,gi + D ch,gi +D go,gi )/3 = 0.185 D (hu-ch-go-or),gi = (D hu,gi + D ch,gi +D go,gi + D or,gi )/4 = 0.184 ((hu:0.0075,ch:0.0075),(go,or,gi)) (((hu:0.0075,ch:0.0075):0.019,go:0.019),(or,gi)) ((((hu:0.0075,ch:0.0075):0.0115,go:0.019):0.041,or:0.06):0.032,gi:0.092) Human Chimp Gorilla Orang Gibbon 0.0075 0.019 0.06 0.092

13 Xuhua Xia Slide 13 Final UPGMA Tree Human Chimp Gorilla Orang Gibbon 0.092 0.060 0.019 0.0075 19 13 8 6 MY ((((hu:0.0075,ch:0.0075):0.0115,go:0.019):0.041,or:0.06):0.032,gi:0.092);

14 Xuhua Xia Slide 14 Distance-based method Distance matrix Tree-building algorithms –UPGMA –Neighbor-joining –FastME –Fitch-Margoliash Criterion-based methods: the least squares method –Branch-length estimation –Tree-selection criterion

15 Xuhua Xia Slide 15 For three OTUs 1 2 3 10.0920.179 20.179 3 1 2 3 1 d 12 d 13 2 d 23 3 d 12 = x 1 + x 2 d 13 = x 1 + x 3 d 23 = x 2 + x 3 x1x1 2 1 x3x3 x2x2 3

16 Xuhua Xia Slide 16 Least-square method 4 x1x1 3 2 1 x5x5 x4x4 x3x3 x2x2 4 Sp1 Sp2 0.3 Sp3 0.4 0.5 Sp4 0.4 0.6 0.6 4 Sp1 Sp2 d 12 Sp3 d 13 d 23 Sp4 d 14 d 24 d 34

17 Xuhua Xia Slide 17 Least-square method 4 x1x1 3 2 1 x5x5 x4x4 x3x3 x2x2 d’ 12 = x 1 + x 2 d’ 13 = x 1 + x 5 + x 3 d’ 14 = x 1 + x 5 + x 4 d’ 23 = x 2 + x 5 + x 3 d’ 24 = x 2 + x 5 + x 4 d’ 34 = x 3 + x 4 (d 12 - d’ 12 ) 2 = [d 12 – ( x 1 + x 2 )] 2 (d 13 - d’ 13 ) 2 = [d 13 – ( x 1 + x 5 + x 3 )] 2 (d 14 - d’ 14 ) 2 = [d 14 – ( x 1 + x 5 + x 4 )] 2 (d 23 - d’ 23 ) 2 = [d 23 – ( x 2 + x 5 + x 3 )] 2 (d 24 - d’ 24 ) 2 = [d 24 – ( x 2 + x 5 + x 4 )] 2 (d 34 - d’ 34 ) 2 = [d 34 – ( x 3 + x 4 )] 2 Least-squares method: Find x i values that minimize SS

18 Xuhua Xia Slide 18 Least-squares method SS = [d 12 – ( x 1 + x 2 )] 2 + [d 13 – ( x 1 + x 5 + x 3 )] 2 + [d 14 – ( x 1 + x 5 + x 4 )] 2 + [d 23 – ( x 2 + x 5 + x 3 )] 2 + [d 24 – ( x 2 + x 5 + x 4 )] 2 + [d 34 – ( x 3 + x 4 )] 2 Take the partial derivative of SS with respective to x i, we have  SS/  x 1 := -2 d 12 + 6 x 1 + 2 x 2 - 2 d 13 + 4 x 5 + 2 x 3 - 2 d 14 + 2 x 4  SS/  x 2 := -2 d 12 + 2 x 1 + 6 x 2 - 2 d 23 + 4 x 5 + 2 x 3 - 2 d 24 + 2 x 4  SS/  x 3 := -2 d 13 + 2 x 1 + 4 x 5 + 6 x 3 - 2 d 23 + 2 x 2 - 2 d 34 + 2 x 4  SS/  x 4 := -2 d 14 + 2 x 1 + 4 x 5 + 6 x 4 - 2 d 24 + 2 x 2 - 2 d 34 + 2 x 3  SS/  x 5 := -2 d 13 + 4 x 1 + 8 x 5 + 4 x 3 - 2 d 14 + 4 x 4 - 2 d 23 + 4 x 2 - 2 d 24 Setting these partial derivatives to 0 and solve for x i, we have x 1 = d 13 /4 + d 12 /2 - d 23 /4 + d 14 /4 - d 24 /4 x 2 = d 12 /2 - d 13 /4 + d 23 /4 - d 14 /4 + d 24 /4, x 3 = d 13 /4 + d 23 /4 + d 34 /2 - d 14 /4 - d 24 /4, x 4 = d 14 /4 - d 13 /4 - d 23 /4 + d 34 /2 + d 24 /4, x 5 = - d 12 /2 + d 23 /4 - d 34 /2 + d 14 /4 + d 24 /4 + d 13 /4

19 Xuhua Xia Slide 19 Least-squares method x 1 = d 13 /4 + d 12 /2 - d 23 /4 + d 14 /4 - d 24 /4 x 2 = d 12 /2 - d 13 /4 + d 23 /4 - d 14 /4 + d 24 /4, x 3 = d 13 /4 + d 23 /4 + d 34 /2 - d 14 /4 - d 24 /4, x 4 = d 14 /4 - d 13 /4 - d 23 /4 + d 34 /2 + d 24 /4, x 5 = - d 12 /2 + d 23 /4 - d 34 /2 + d 14 /4 + d 24 /4 + d 13 /4 4 Sp1 Sp2 0.3 Sp3 0.4 0.5 Sp4 0.4 0.6 0.6 x 1 = 0.075 x 2 = 0.225 x 3 = 0.275 x 4 = 0.325 x 5 = 0.025 4 x1x1 3 2 1 x5x5 x4x4 x3x3 x2x2

20 Xuhua Xia Slide 20 Minimum Evolution Criterion 4 x1x1 3 2 1 x5x5 x4x4 x3x3 x2x2 4 x1x1 2 3 1 x5x5 x4x4 x3x3 x2x2 3 x1x1 2 4 1 x5x5 x4x4 x3x3 x2x2 The minimum evolution (ME) criterion: The tree with the shortest TreeLen is the best tree.


Download ppt "Distance-based methods Xuhua Xia"

Similar presentations


Ads by Google