Presentation is loading. Please wait.

Presentation is loading. Please wait.

What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)

Similar presentations


Presentation on theme: "What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)"— Presentation transcript:

1 What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs) Profiles (a basic understanding)

2 Biological Definitions for Related Sequences Homologues are similar sequences in two different organisms that have been derived from a common ancestor sequence. Homologues can be described as either orthologues or paralogues. Orthologues are similar sequences in two different organisms that have arisen due to a speciation event. Orthologs typically retain identical or similar functionality throughout evolution. Paralogues are similar sequences within a single organism that have arisen due to a gene duplication event. Xenologues are similar sequences that do not share the same evolutionary origin, but rather have arisen out of horizontal transfer events through symbiosis, viruses, etc.

3 So this means … Source: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html

4 Multiple Sequence Alignment Sequences can be mutated or rearranged to perform an altered function. which changes in the sequence have caused a change in the functionality. which changes in the sequence have caused a change in the functionality. Multiple sequence alignment: the idea is to take three or more sequences and align them so that the greatest number of similar characters are aligned in the same column of the alignment. hold information about which regions have high mutation rates over evolutionary time and which are evolutionarily conservedhold information about which regions have high mutation rates over evolutionary time and which are evolutionarily conserved identification of regions or domains that are critical to functionality.identification of regions or domains that are critical to functionality. Sequences can be conserved across species and perform similar or identical functions.

5 What to ask yourself How do we get a multiple alignment? (three or more sequences) Which way is best? –Do we go for max accuracy, least computational time or the best compromise? What do we want to achieve each time?

6 Multiple alignment profiles Gribskov et al. 1987 ACDWYACDWY - i fA.. fC.. fD..  fW.. fY.. Gapo, gapx Position dependent gap penalties Core region Gapped region Gapo, gapx fA.. fC.. fD..  fW.. fY.. fA.. fC.. fD..  fW.. fY..

7 Profile building ACDWYACDWY Gap penalties i 0.3 0.1 0  0.3 0.51.0 Position dependent gap penalties 0.5 0  0 0.5 0 0.5 0.2  0.1 0.2 1.0 Example: Each aa is represented as a frequency, penalties as weights

8 ACD……VWY sequence profile Profile-sequence alignment

9 ACD..YACD..Y ACD……VWY profile Profile-profile alignment

10 Multiple alignment methods Multi-dimensional dynamic programming Progressive alignment Iterative alignment

11 Simultaneous multiple alignment Multi-dimensional dynamic programming The combinatorial explosion: 2 sequences of length n –n 2 comparisons Comparison number increases exponentially – i.e. n N where n is the length of the sequences, and N is the number of sequences – Impractical for even a small number of short sequences quite quickly

12 Multi-dimensional dynamic programming (Murata et al, 1985) Sequence 1 Sequence 2 Sequence 3

13 The MSA approach MSA (Lipman et al., 1989, PNAS 86, 4412) Calculate all pairwise alignment scores (SP). Use the scores to predict a tree. Calculate pair weights based on the tree. Produce a heuristic alignment based on the tree. Calculate the maximum weight for each sequence pair. Determine the spatial positions that must be calculated to obtain the optimal alignment. Perform the optimal alignment. Report the weight found compared to the maximum weight previously found extremely slow and memory intensive Max 8-9 sequences of ~250 residues

14 The DCA approach DCA (Stoye et al 1997) Iteratively split at optimal cut points Use MSA Concatenate

15 So in effect … Sequence 1 Sequence 2 Sequence 3

16 Multiple alignment methods Multi-dimensional dynamic programming Progressive alignment Iterative alignment

17 Progressive alignment 1) Perform pairwise alignments of all of the sequences 2) Use the alignment scores to produces a dendrogram using neighbour-joining methods 3) Align the sequences sequentially, guided by the relationships indicated by the tree Biopat (first method ever) MULTAL (Taylor 1987) DIALIGN (1&2, Morgenstern 1996) PRRP (Gotoh 1996) Clustal (Thompson et al 1994) Praline (Heringa 1999) T Coffee (Notredame 2000) POA (Lee 2002)

18 Progressive multiple alignment 1 2 1 3 4 5 Guide treeMultiple alignment Score 1-2 Score 1-3 Score 4-5 Scores Similarity matrix 5×5 Scores to distances Iteration possibilities

19 General progressive multiple alignment technique (follow generated tree) 1 3 2 5 1 3 1 3 1 3 2 5 2 5 4 d root

20 Praline progressive strategy 1 3 2 1 3 1 3 1 3 2 5 2 5 4 d 4

21 There are problems: Accuracy is very important !!!! Errors are propagated into the progressive steps “ Once a gap, always a gap” Feng & Doolittle, 1987

22 Pair-wise alignment quality versus sequence identity (Vogt et al., JMB 249, 816-831,1995)


Download ppt "What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)"

Similar presentations


Ads by Google