Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple sequence alignment

Similar presentations


Presentation on theme: "Multiple sequence alignment"— Presentation transcript:

1 Multiple sequence alignment
Tutorial 5 Multiple sequence alignment

2 Multiple Sequence Alignment – When?
More than two sequences DNA Protein Evolutionary relation Homology  Phylogenetic tree Detect motif GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C

3 Multiple Sequence Alignment – How?
Dynamic Programming Optimal alignment Exponential in #Sequences Progressive Efficient Heuristic GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC A D B C

4 Hierarchical Clustering
A way to represent similarities graphically. Sums up a pairwise distance matrix as a dendrogram. Not all matrices can be embedded in a tree without error. TGTTAAC TGT-AAC TGT--AC ATGT--C ATGTGGC

5 ClustalW Pairwise alignment – calculate distance matrix Guided tree
Progressive alignment using the guide tree “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice”, J D Thompson et al

6 Progressive (incremental)
ClustalW Progressive (incremental) At each step align two existing alignments or sequences. Gaps present in older alignments remain fixed. Uses the Neighbor Joining algorithm.

7 Neighbor Joining Algorithm
An agglomerative hierarchical clustering method. Constructs unrooted tree. 7

8 Neighbor Joining (Not assuming equal divergence)
Step by step summary: Calculate all pairwise distances. Pick two nodes (i and j) for which the relative distance is minimal (lowest). Define a new node (x). Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Continue until two nodes remain – connect with edge.

9 Step 1. Calculate all pairwise distances.
B C D E E D C B A 41 39 22 - 43 20 18 10

10 Measuring Distance Problem: unrelated sequences approach a fraction of difference expected by chance  The distance measure converges. Jukes-Cantor

11 Measuring Distance (cont)
Euclidean Distance: Given a multiple sequence alignment, calculate the square root of the sum of the score at every position between two sequences the score increases proportionally to the extent of dissimilarity between residues

12 Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
Relative distance between i and j Distance between i and j from the distance table Negative values As the average distance from the common ancestor to the rest of the nodes increases, Mij has a lower value. Select pair that produce lowest value Reevaluate M with every iteration Distance of i from all other sequences Number of leaves (=sequences) left in the tree

13 Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
B A 41 39 22 - 43 20 18 10

14 Step 2. Pick two nodes (i and j) for which the relative distance is minimal (lowest).
Etc. E D C B A -44 -47.3 -74 - -57.3 -64 A,B is the pair with the minimal Mi,j distance. The Mij Table is used only to choose the closest pairs (lowest value) and not for calculating the distances

15 Step 3. Define a new node (x)
B C D E X

16 Step 4. Calculate Dix and Djx - the distance of the chosen nodes I and J to the new node X, as well as the distance from X to all other nodes. Now we’ll calculate the distance from X to all other nodes. E D C X 31 29 - 20 18 10

17 Step 5 - Continue until two nodes remain
X -44 -49 - New Mi,j table A B C D E X Y

18 E D Y 11 9 - 10 New Di,j table Only 2 nodes are left. Let’s calculate all the distances to Z A B C D E X Y Z

19 And in newick tree format
The tree 6 4 E D C 5 9 12 10 B A 20 Z Y X And in newick tree format ((C(D,E))(A,B))

20 ClustalW - Input Input sequences Scoring matrix Gap scoring
Input sequences Scoring matrix Gap scoring Output format address

21 Match strength in decreasing order: * : .
ClustalW - Output Match strength in decreasing order: * : .

22 ClustalW - Output

23 ClustalW - Output

24 ClustalW - Output

25 Pairwise alignment scores
ClustalW - Output Pairwise alignment scores Building tree Building alignment Final score

26 ClustalW - Output

27 Match strength in decreasing order: * : .
ClustalW Output Sequence names Sequence positions Match strength in decreasing order: * : .

28 ClustalW - Output

29 ClustalW - Output Branch length

30 ClustalW - Output

31 ClustalW - Output


Download ppt "Multiple sequence alignment"

Similar presentations


Ads by Google