Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple sequence alignment (msa)

Similar presentations


Presentation on theme: "Multiple sequence alignment (msa)"— Presentation transcript:

1 Multiple sequence alignment (msa)
Lecture 8 CS566

2 Motivation “Two swallows do not make a summer”
Discover conserved regions Predict important regions of the protein Discover domains Search for additional members of a protein family (profile-based searching) Build phylogenetic trees Lecture 8 CS566

3 Topics Scoring schemes Optimal Heuristic algorithms Pairwise N-way
Multidimensional dynamic programming Heuristic algorithms Progressive Iterative Lecture 8 CS566

4 Scoring schemes Alignment score = l Cl Column Score Cl Ideally
Based on n-way joint probability (n-generalized AAS) Sum of Pairs i<j sij Based on amino acid substitution matrices Gap-gap = 0; Gap-char = -g Commonest scheme used Fallacious: Assumes only 2-way and not n-way joint probabilities Score not proportional to number of sequences in alignment N-way sums Need to know central point of reference (ancestral sequence) Lecture 8 CS566

5 Multidimensional Dynamic Programming
Line up n sequences in a grid having n dimensions Score each cell as the maximum of Lining up all corresponding characters AND All possible combinations of gaps and characters Note choice made Reconstruct alignment by traceback Global or Local dynamic programming? Space complexity? Time complexity? Lecture 8 CS566

6 MSA – Efficient Multidimensional Dynamic Programming
Carillo-Lipman MSA algorithm Uses pair-wise dynamic programming to identify sub-matrix regions of near-optimality n-dimensional dynamic programming carried out within space of intersection of near-optimal regions Still limited to only a few sequences Is this an optimal algorithm or not? Lecture 8 CS566

7 Progressive alignment
New concepts Consider aligning alignments to alignments/sequences en bloc Hierarchical/Sequential order of alignment (“Once a cobbler, always a cobbler”) Heuristic Fast Lecture 8 CS566

8 Progressive alignment - Clustal
Compute all pairwise alignments Convert alignment scores into distances Build guide tree (phylogenetic tree) Align sequences in order suggested by ‘guide tree’ Position specific scoring system used Gap costs depend on position Composition based scoring system used Percentage similarity dictates choice of scoring matrix Weighting based on composition bias Only ‘cross-terms’ (profile-profile) used in scoring Lecture 8 CS566

9 Progressive alignment - Clustal
ClustalV (Now history!) ClustalW (Takes weighting into account for composition bias) ClustalX (Graphical interface) Lecture 8 CS566

10 Iterative refinement-1
“Once a cobbler, now a king!” Iterative algorithm: Compute all pairwise similarities Start with best pair Add ‘most-similar’ sequence to profile successively till none left Remove and re-align each sequence till convergence Lecture 8 CS566

11 Iterative refinement-2
Genetic programming-based msa Create initial random alignment Score alignment Retain better scoring half of alignment Mutate remaining half of alignment with ideas from genetic recombination Random gap insertion En bloc shifts Probabilistic order of alignment Score resulting alignment Iterate till convergence Lecture 8 CS566


Download ppt "Multiple sequence alignment (msa)"

Similar presentations


Ads by Google