Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical on phylogenetic trees based on sequence alignments Kyrylo Bessonov November 26th, 2013.

Similar presentations


Presentation on theme: "Practical on phylogenetic trees based on sequence alignments Kyrylo Bessonov November 26th, 2013."— Presentation transcript:

1 Practical on phylogenetic trees based on sequence alignments Kyrylo Bessonov November 26th, 2013

2 Talk plan How to build phylogenetic trees of types – Unrooted – Rooted Context – comparison of viral proteins of dengue virus Examples on phylogenetic tree building – Dengue virus

3 Building a phylo tree using ape Ape - Analyses of Phylogenetics and Evolution – Functions to create and manipulate phylo trees – Graphical exploration of phylogenetic data To build a phylogenetic tree – Download protein sequences from DB – Align sequences – Calculate pairwise distance using ape – Visualize a phylogenetic tree

4 Building an unrooted phylogenetic tree (1) #install req. libraries install.packages("seqinr") install.packages("muscle") install.packages("ape") library("seqinr") library("muscle") library("ape") multipleSeqAlignment <- function (seqnames, seqs){ #umax is an object of class fasta from muscle package fasta_seqs_Object=umax; tmp=data.frame(V1=rep(0,length(seqs)),V2=rep(0,length(seqs))) for(i in 1:length(seqs)){ tmp[i,1]=seqnames[i] tmp[i,2]=paste(seqs[[i]],collapse="") } fasta_seqs_Object$seqs=tmp #multiple sequence alignment #remove conflicting ape library from the memory try(detach("package:ape"), silent=T) alignment=muscle(seqs=fasta_seqs_Object, out = NULL) alignment_ape=ape::as.alignment(matrix(alignment$seqs[,2])) alignment_ape$nam=alignment$seqs[,1] return (alignment_ape) }

5 Building an unrooted phylogenetic tree (2) #main part of the code choosebank("swissprot") #selects database for query seqnames <- c("P06747", "P0C569", "O56773", "Q5VKP1") seqs=list() for(i in 1:length(seqnames)){ query <- query(paste("AC=",seqnames[i],sep="")) seqs[i]=getSequence(query) } #multipleSeqAlignment() is defined on previous slide alignment_ape <- multipleSeqAlignment(seqnames, seqs); mydist <- dist.alignment(alignment_ape) #nj() performs the neighbor-joining tree estimation by Saitou and Nei mytree <- nj(mydist) mytree$tip.label=c("Q5VKP1-\nWestern Caucasian bat virus\nphosphoprotein","P06747-\nrabies virus\nphosphoprotein","P0C569-\nMokola virus\nphosphoprotein","O56773-\nLagos bat virus\nphosphoprotein") plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=0.8, no.margin=T, srt=50)

6 Unrooted Phylogenetic Tree Phylogenetic tree showing distance between 4 protein viral sequences the genetic distance between O56773 and P0C569 is the smallest

7 Unrooted phylogenetic tree (1) The lengths of the branches in the plot of the tree are proportional to the amount of evolutionary change (estimated by number of mutations) along the tree branches This is an unrooted phylogenetic tree as it does not contain an outgroup sequence, that is a sequence of a protein that is known to be more distantly related to the other proteins in the tree than they are to each other.

8 Unrooted phylogenetic tree(2) As a result, we cannot tell which direction evolutionary time ran in along the internal branches of the tree. For example, we cannot tell whether the node representing the common ancestor of (O56773, P0C569) was an ancestor of the node representing the common ancestor of (Q5VKP1, P06747), or the other way around.

9 Distance matrix Inspecting calculated distance matrix between aligned sequences confirms results seen in phylogenetic tree Closest pair is O56773 and P0C559 proteins Q5VKP1P06747P0C569 P P0C O

10 Rooted phylogenetic tree In order to convert the unrooted tree into a rooted tree, we need to add an outgroup sequence – Outgroup a taxon outside the group of interest will branch off at the base of phylogeny Caenorhabditis elegans (UniProt accession Q10572 and Caenorhabditis remanei (UniProt E3M2K8) If we were to build a phylogenetic tree of the Fox-1 homologues in verterbrates, the distantly related sequence from worms would probably be a good choice of outgroup, since the protein is from a different taxa/group (worms)

11 Building an rooted phylogenetic tree (1) #BUILDIN ROOTED TREE OF PROTEIN SEQUNCES (FOX1) #Q9NWB1 - Human #Q17QD3 - Cow #Q95KI0 - Monkey #A1A5R1 - Rat #Q Worm C.elegans(Root) #E1G4K8 - Eye worm seqnames <- c("Q9NWB1","Q17QD3","Q95KI0","A1A5R1","Q10572","E1G4K8") choosebank("swissprot") #selects database for query seqs=list() for(i in 1:length(seqnames)){ query <- query(paste("AC=",seqnames[i],sep="")) seqs[i]=getSequence(query) } alignment_ape <- multipleSeqAlignment(seqnames, seqs); mydist <- dist.alignment(alignment_ape)

12 Building an rooted phylogenetic tree (2) library("ape") mytree <- nj(mydist) mytree$tip.label=c("E1G4K8-Eye worm ", "Q10572-C.elegans(Root)", "A1A5R1-Rat", "Q9NWB1-Human", "Q17QD3-Cow", "Q95KI0-Monkey") myrootedtree <- root(mytree, outgroup="Q10572-C.elegans(Root)", r=TRUE) #Phylogenetic tree with 6 tips and 5 internal nodes. #Tip labels: #[1] "E1G4K8" "Q8WS01" "Q9VT99" "A8NSK3" "Q10572" "E3M2K8" #Rooted; includes branch lengths. plot.phylo(myrootedtree, edge.color = "blue", edge.width = 3, type="p")

13 Rooted tree of FOX1 proteins The invertebrates are grouped together Worms form a distinct group yet with large genetic distance Human FOX1 is closest to monkey and cow sequences outgroup (worms)

14 Distance matrix E1G4K8Q10572A1A5R1Q9NWB1Q17QD3 Q A1A5R Q9NWB Q17QD Q95KI As expected, eye worms are the mostly distantly related species to vertebrates Cow and monkey have the closest relationship and the lowest genetic distance Table legend: Q9NWB1 – HumanQ95KI0 – MonkeyQ Worm C.elegans (Root) Q17QD3 – CowA1A5R1 – RatE1G4K8 - Eye worm

15 Rooted tree Time runs from left to right Monkey, Cow and Human have common ancestor 3 Ancestor 1 is common to ancestors 2 and 3 TIME

16 Exercises on phylogenetic tree building Q1. Calculate the genetic distances (i.e. genetic distance) between the following NS1 proteins from different Dengue virus strains: Dengue virus 1 NS1 protein (Uniprot ID: Q9YRR4), Dengue virus 2 NS1 protein (UniProt: Q9YP96), Dengue virus 3 NS1 protein (UniProt: B0LSS3), and Dengue virus 4 NS1 protein (UniProt: Q6TFL5). Which viruses are the most closely related, and which are the least closely related, based on the genetic distances? Note: Dengue virus causes Dengue fever, which is classified by the WHO as a neglected tropical disease. There are four main types of Dengue virus, Dengue virus 1, Dengue virus 2, Dengue virus 3, and Dengue virus 4. Q2. Build an unrooted phylogenetic tree of the NS1 proteins from Dengue virus 1, Dengue virus 2, Dengue virus 3 and Dengue virus 4, using the neighbour-joining algorithm. Which are the most closely related proteins, based on the tree?

17 Q3. The Zika virus is related to Dengue viruses, but is not a Dengue virus, and so therefore can be used as an outgroup in phylogenetic trees of Dengue virus sequences. UniProt accession Q32ZE1 consists of a sequence with similarity to the Dengue NS1 protein, so seems to be a related protein from Zika virus. Build a rooted phylogenetic tree of the Dengue NS1 proteins based on an alignment, using the Zika virus protein as the outgroup. Which are the most closely related Dengue virus proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2? Exercises on phylogenetic tree building

18 Answers Question 1: Summary of viral proteins and Uniprot accession numbers: Uniprot ID: Q9YRR4 Dengue virus 1 NS1 protein UniProt: Q9YP96Dengue virus 2 NS1 protein UniProt: B0LSS3Dengue virus 3 NS1 protein UniProt: Q6TFL5Dengue virus 4 NS1 protein seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5") choosebank("swissprot") #selects database for query seqs=list() for(i in 1:length(seqnames)){ query <- query(paste("AC=",seqnames[i],sep="")) seqs[i]=getSequence(query) } alignment_ape <- multipleSeqAlignment(seqnames, seqs); mydist <- dist.alignment(alignment_ape); mydist

19 Answers Q1. The distance matrix is as follows The most distant are Q9YP96(V2) and Q6TFL5(V4) with genetic distance of 0,33 while the most closely related are Q9YP96(V1) and BOLSS3(V3) with genetic distance of 0,227 Q6TFL5Q9YRR4Q9YP96 Q9YRR Q9YP B0LSS

20 Answers Question 2: library("ape") mytree <- nj(mydist) #plotting unrooted tree plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0) #clean the sequences from gaps seqs_trim=seqs for(i in 1:length(seqs)){ start=regexpr("DMGY", paste(seqs_trim[[i]],collapse="") ) [1] stop=regexpr("GEDG", paste(seqs_trim[[i]],collapse="") ) [1] seqs_trim[[i]]=seqs_trim[[i]][start:stop] } alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim); mydist <- dist.alignment(alignment_ape);mydist library("ape") mytree <- nj(mydist) #plotting unrooted tree based on alignment of whole protein sequences plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0)

21 Question 2 (continued): alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim); mydist <- dist.alignment(alignment_ape);mydist library("ape") mytree <- nj(mydist) #tree based on the best aligned portion plot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0) Answers

22 The resulting Q2 un-rooted tree This un-rooted tree agrees with the genetic distance matrix calculated in Q1. The tree suggests that BOLSS3 and Q9YP96 are the mostly related proteins. To improve quality of the tree it is best to select region that has minimal number of gaps between protein sequences Below you can see that there are regions with lots of gaps. Let’s build another tree based on the bolded(most conserved) region to see if it is the same Q6TFL5 DMGCVVSWNGKELKC…KDQKAVHADMGYWIESSKNQTWQIEKASLIEVKTCLWPKTHTL…GMEIRPLSEKEENMVKSQVTA Q9YRR DMGYWIESEKNETWKLARASFIEVKTCIWPKSHTL…GMEI Q9YP96 DSGCVVSWKNKELKC…KDNRAVHADMGYWIESALNDTWKIEKASFIEVKNCHWPKSHTL…GMEIRPLKEKEENLVNSLVTA B0LSS ASHADMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTL… Alignment of proteins: Built using the full lengths of proteins

23 Answers The resulting tree looks the same but we had achieved overall better resolution between proteins Q6TFL5Q9YRR4Q9YP96 Q9YRR Q9YP B0LSS Built using the bolded region Whole protein sequences used Best aligned portion of protein sequences used Q6TFL5Q9YRR4Q9YP96 Q9YRR Q9YP B0LSS

24 Answers Question 3: #Q3 building rooted tree based on Q89277 (yellow fever virus) as out group library("seqinr") library("muscle") library("ape") seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5", "Q89277") choosebank("swissprot") #selects database for query seqs=list() for(i in 1:length(seqnames)){ query <- query(paste("AC=",seqnames[i],sep="")) seqs[i]=getSequence(query) } alignment_ape <- multipleSeqAlignment(seqnames, seqs); mydist <- dist.alignment(alignment_ape);mydist library("ape") mytree <- nj(mydist) myrootedtree <- root(mytree, outgroup="Q89277", r=TRUE) plot.phylo(myrootedtree,type="p", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0)

25 Answers Q3 asks to build a rooted tree using out-group yellow fever virus (Q89277) Most closely related viruses: – BOLSS3 and Q9YP96 This rooted tree tells you which of the Dengue virus NS1 proteins branched off the earliest from the ancestors. Unrooted tree does not provide ancestry information (i.e. time sequence) Q89277Q6TFL5Q9YRR4Q9YP96 Q6TFL Q9YRR Q9YP B0LSS outgroup

26 References Ape library for phylogenetic trees and ancestry with bootstrap methods project.org/web/packages/ape/ape.pdf project.org/web/packages/ape/ape.pdf


Download ppt "Practical on phylogenetic trees based on sequence alignments Kyrylo Bessonov November 26th, 2013."

Similar presentations


Ads by Google