Presentation is loading. Please wait.

Presentation is loading. Please wait.

BNFO 602 Multiple sequence alignment Usman Roshan.

Similar presentations


Presentation on theme: "BNFO 602 Multiple sequence alignment Usman Roshan."— Presentation transcript:

1 BNFO 602 Multiple sequence alignment Usman Roshan

2 Optimal pairwise alignment Sum of pairs (SP) optimization: find the alignment of two sequences that maximizes the similarity score given an arbitrary cost matrix. We can find the optimal alignment in O(mn) time and space using the Needleman-Wunsch algorithm. Recursion: Traceback: where M(i,j) is the score of the optimal alignment of x 1..i and y 1..j, s(x i,y j ) is a substitution scoring matrix, and g is the gap penalty

3 Multiple sequence alignment “Two sequences whisper, multiple sequences shout out loud”---Arthur Lesk Computationally very hard---NP-hard

4 Multiple sequence alignment Unaligned sequences GGCTT TAGGCCTT TAGCCCTTA ACACTTC ACTT Aligned sequences _G_ _ GCTT_ TAGGCCTT_ TAGCCCTTA A_ _CACTTC A_ _C_ CTT_ Conserved regions help us to identify functionality

5 Sum of pairs score

6 What is the sum of pairs score of this alignment? 1. Since computing the alignment with the optimal SP score is NP-hard we resort to heuristics. 2. Plenty of work done in this area. Many standard heuristic approaches in computer science have been applied. 3. Popular programs are based on profiles.

7 Profile A profile can be described by a set of vectors of nucleotide/residue frequencies. For each position i of the alignment, we we compute the normalized frequency of nucleotides A, C, G, and T

8 Aligning a profile vector to a nucleotide ClustalW/MUSCLE –Let f be the profile vector –Score(f,j)= –where S(i,j) is substitution scoring matrix

9 Iterative alignment (heuristic for sum-of-pairs) Pick a random sequence from input set S Do (n-1) pairwise alignments and align to closest one t in S Remove t from S and compute profile of alignment While sequences remaining in S –Do |S| pairwise alignments and align to closest one t –Remove t from S

10 Iterative alignment Once alignment is computed randomly divide it into two parts Compute profile of each sub-alignment and realign the profiles If sum-of-pairs of the new alignment is better than the previous then keep, otherwise continue with a different division until specified iteration limit

11 Progressive alignment Idea: perform profile alignments in the order dictated by a tree Given a guide-tree do a post-order search and align sequences in that order Widely used heuristic

12 Popular alignment programs ClustalW: most popular, progressive alignment MUSCLE: progressive and iterative combination; uses the log expectation score T-COFFEE: consistency based alignment; align sequences in multiple alignment to be close to the optimal pairwise alignment PROBCONS: expected accuracy; probabilistic consistency progressive based scheme MAFFT: alignment based on Fast Fourier Transform

13 Evaluation of multiple sequence alignments Compare to benchmark “true” alignments Use simulation Measure conservation of an alignment Measure accuracy of phylogenetic trees How well does it align motifs? More…

14 Benchmarking alignment programs http://nar.oxfordjournals.org/content/38/ 15/4917.abstract


Download ppt "BNFO 602 Multiple sequence alignment Usman Roshan."

Similar presentations


Ads by Google