Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic Tree Construction using Pathway Analysis Bioengineering 190C Project By: Harry Choi Nick Lin Gabe Kwong Li Yan Christina Yau.

Similar presentations

Presentation on theme: "Phylogenetic Tree Construction using Pathway Analysis Bioengineering 190C Project By: Harry Choi Nick Lin Gabe Kwong Li Yan Christina Yau."— Presentation transcript:

1 Phylogenetic Tree Construction using Pathway Analysis Bioengineering 190C Project By: Harry Choi Nick Lin Gabe Kwong Li Yan Christina Yau

2 Background Traditional Approach Comparison of single orthologs between organisms Distance matrix generation from similarity scores Hierarchical Clustering Tree Construction Disadvantage Sensitive to choice of gene for comparison Possible inconsistency of trees generated

3 Our Approach N annotated organisms to be clustered Reference organism is chosen Pathway in the reference organism is chosen Pool of orthologs in the N organisms is generated by BLAST Analysis of pool of ortholog generated vector representing each organisms Distance calculation from vectors Hierarchical Clustering Tree Construction

4 Rationale for Approach Pathway takes into account multiple genes Individual differences between genes not directly taken into account All genes considered are related to each other in cellular function Better conservation of actual function than sequence identities Better consistency in trees generated

5 Program Design Modular design divided into following portions: BLAST Analysis of BLAST results Hierarchical Clustering Tree Construction Design allows for reuse of components in different applications with minor changes Design allows individual subrountines to be used recursively to generate desired results with minimal changes

6 Step 1: BLAST Conserved proteins from Wnt pathway will be used as example

7 Wnt pathway Wnt proteins form a family of highly conserved secreted signaling molecules that regulate cell-to-cell interactions during embryogenesis. Wnt genes and Wnt signaling are also implicated in cancer. Wnt pathway is found in many organisms such as: Drosophila, Caenorhabditis elegans, Xenopus, Chiecken, Mouse, Zebrafish, and Human.

8 Wnt pathway (cont) Choose 6 most conserved proteins from this pathway as seed proteins: Wnt Frizzled Dsh Apc Axin Tcf (Roel Nusse, 2002)

9 5 organisms Drosophila: 54455 sequences Mouse: 77143 sequences C. elegans: 62256 sequences Zebrafish: 3069 sequences Xenopus: 5174 sequences

10 Strategy Seed protein (pr1) from Organism 1 (O1) blast against 4 other organisms : Secondary seed proteins (pr 2, …, 9) blast against respective 4 other organism: O1O2O3O4O5 pr1pr2 pr3 pr4 pr5 pr6 pr7 Pr8 pr9 O1O2O3O4O5 :::::: :::::: pr5:::::: :::::: O1O2O3O4O5 pr11 pr18 pr19 pr3pr5 pr6 pr20 pr7 pr13 pr21 pr22 pr9 pr23 O1O2O3O4O5 pr1 pr24 pr3 pr25 pr26 pr4pr7 pr14 pr15 pr9 pr16 pr17 pr23 O1O2O3O4O5 pr1 pr10 pr11 pr2pr4 pr6 pr12 pr8 pr13 pr14 pr15 pr16 pr17


12 Example output file from BLAST: 15 secondary seed proteins wg_85190 wg_celegans_7508752 1.70e-41 wg_85190 wg_celegans_3880389 1.70e-41 wg_85190 wg_celegans_17539494 1.70e-41 wg_85190 wg_zebrafish_103816 1.20e-80 wg_85190 wg_zebrafish_833600 1.20e-80 wg_85190 wg_zebrafish_18859559 1.20e-80 wg_85190 wg_zebrafish_139740 1.20e-80 wg_85190 wg_xenopus_65236 1.40e-76 wg_85190 wg_xenopus_69039 1.40e-76 wg_85190 wg_xenopus_139748 1.40e-76 wg_85190 wg_mouse_293671 2.50e-78 wg_85190 wg_mouse_387388 2.50e-78 wg_85190 wg_mouse_69037 2.50e-78 wg_85190 wg_mouse_13529431 2.50e-78 wg_85190 wg_mouse_139744 2.50e-78

13 Example output file from BLAST (cont) wg_celegans_7508752 wg_celegans_7508752_drosophila_6537292 1.30e-90 wg_celegans_7508752 wg_celegans_7508752_drosophila_12018324 1.30e-90 wg_celegans_7508752 wg_celegans_7508752_xenopus_422628 1.10e-96 wg_celegans_7508752 wg_celegans_7508752_xenopus_313268 1.10e-96 wg_celegans_7508752 wg_celegans_7508752_xenopus_465484 1.10e-96 wg_celegans_7508752 wg_celegans_7508752_mouse_202406 2.40e-96 wg_celegans_7508752 wg_celegans_7508752_mouse_227507 2.40e-96 wg_celegans_7508752 wg_celegans_7508752_mouse_111253 2.40e-96 wg_celegans_7508752 wg_celegans_7508752_mouse_14789729 2.40e-96 wg_celegans_7508752 wg_celegans_7508752_mouse_6678599 2.40e-96 wg_celegans_7508752 wg_celegans_7508752_mouse_14424475 2.40e-96 wg_celegans_7508752 wg_celegans_7508752_zebrafish_1256778 2.30e-94 wg_celegans_7508752 wg_celegans_7508752_zebrafish_18859567 2.30e-94 wg_celegans_7508752 wg_celegans_7508752_zebrafish_2501662 2.30e-94

14 Step 2: Analysis of BLAST Results ie. Metric Determination

15 Metric Determination Common Algorithms used to calculate a distance metric from similarity scores include (1-%Identity) and S = e (-d/ 2) (Shepard 1987). A different algorithm is used for this project.

16 Rules Metric must Satisfy The distance between a gene and itself must be zero Dii = 0. Communitive property: Dij = Dji. Triangular inequality: Dij + Dik Djk. i j k Dij Dik Djk

17 Our Algorithm Determine unique gene pool from all the organisms that meet the threshold for a particular gene in pathway. Gene pool Wg-Drosophila Celegans_17531491 g2 g3 g4 g2 g3 g2 g4 g1 Yes g1 Is g1 Unique? Is g2 unique? No

18 Gene Vectors Drosophila Mouse Zebrafish... g1 g2 g3. gn Genepool of entire Wnt pathway 100...1100...1 011...0011...0 000...1000...1 Homologous gn found in Zebrafish No Homolog of gn found in Mouse.

19 Euclidean Distance Vectors are in N dimensional space Determine Euclidean Distance by taking the root of the differences squared. Dij = (D i1 -D j1 ) 2 + …+ (D in -D jn ) 2 = (1-0) 2 + (1-1) 2 + (0-1) 2 + …

20 Distance Matrix 0 0 0 0... O 1 O 2 O 3......... On O 1 O 2 O 3. On D 21 D 31 D n1 D 32 Since Euclidean distances commute Matrix is Triangular.

21 Step 3: Hierarchical Clustering

22 Hierarchical Clustering There are two types of clustering: Successive Fusions (Agglomerative Clustering) Separation (Divisive Clustering)

23 Hierarchical Clustering In this project, agglomerative clustering algorithm has been employed Idea: The most similar objects are first grouped. These are then merged according to their similarities, until all are fused into one single cluster

24 Hierarchical Clustering Any N x N triangular matrix containing the pairwise distances between the organisms D = {djk} Input of the clustering program:

25 Hierarchical Clustering Feed the NXN matrix as the input and the clustering method will output a (N-1)X(N-1) matrix In this case, it will be a 4X4 matrix:

26 Hierarchical Clustering New Distances are determined between the new group and each of the remaining organisms

27 Hierarchical Clustering Continue with the clustering until all the organisms fused into one cluster D (135)(24) = min {d (135)(2), d (135)(4) } = min {7, 6} = 6 d (135)2 = min {d (35)2, d 12 } = min{7, 9} = 7 d (135)4 = min {d (35)4, d 14 } = min{8,6} = 6

28 Hierarchical Clustering The outputs of each run: The names of the organisms that are grouped together The distance between the two organisms After N-1 number of iterations, the outputs are saved to a file and they will be used to draw the phylogenetics tree.

29 Step 4: Phylogenetic Tree Construction

30 Tree Construction Sample input Flat file of clusters and distances e.g. sample1.txt AB4.5 BC5.2 ED5.8 CE12.4 Or e.g. sample2.txt AB4.5 ABC5.2 ED5.8 ABCED12.4

31 Tree Construction Sample Input (continued) Requirements for input file: Each line must represent one cluster First entries are leaves in the cluster Last entry is the distance No more than two new leaves can be added in a cluster Each entry must be delimited by a tab Flexibility File can have all leaves in the cluster or a new leaf and any leaf from previous clusters Subroutine can be reuse to generate tree from any file by modifiying one line of code

32 Tree Construction Method Subclass intree Read in file line by line Add new leaves to Vector leaves Array elts tracks the number of leaves added to the vector in each cluster Array d tracks the distances between elements in the cluster Subclass treed Convert distances to pixels Draw tree and leaves in Jframe Draw scale of distance in Jframe

33 Tree Construction Sample Output A B C D E 1 2 3 4 5 6 7 8

Download ppt "Phylogenetic Tree Construction using Pathway Analysis Bioengineering 190C Project By: Harry Choi Nick Lin Gabe Kwong Li Yan Christina Yau."

Similar presentations

Ads by Google