Download presentation
Presentation is loading. Please wait.
1
. Class 5: Multiple Sequence Alignment
2
Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWESNG-- Homologous residues are aligned together in columns Homologous - in the structural and evolutionary sense Ideally, a column of aligned residues occupy similar 3d structural positions
3
Multiple alignment – why? u Identify sequence that belongs to a family Family – a collection of homologous, with similar sequence, 3d structure, function or evolutionary history u Find features that are conserved in the whole family Highly conserved regions, core structural elements
4
The relation between the divergence of sequence and structure [Durbin p. 137, redrawn from data in Chothia and Lesk (1986)]
5
Scoring a multiple alignment (1) Important features of multiple alignment: uSuSome positions are more conserved than others Position specific scoring uSuSequences are not independent (related by phylogenetic tree) Ideally, specify a complete model of molecular sequence evolution
6
Scoring a multiple alignment (2) Unfortunately, not enough data … Assumption (1) Columns of alignment are statistically independent.
7
Minimum entropy Assumption (2) Symbols within columns are independent Entropy measure
8
Sum of pairs (SP) Columns are scored by a “sum of pairs” function, using a substitution scoring matrix Note:
9
Multidimensional DP
11
Complexity Space: Time:
12
Pairwise projections of MA
13
MSA (i) [Carrillo and Lipman, 1988]
14
MSA (ii)
15
MSA (iii) Algorithm sketch
16
Progressive alignment methods (i) Basic idea: construct a succession of PW alignments Variatoins: u PW alignment order u One growing alignment or subfamilies u Alignment and scoring procedure
17
Progressive alignment methods (ii) Most important heuristic – align the most similar pairs first. Many algorithms build a “guide tree”: u Leaves – sequence u Interior nodes – alignments u Root – complete multiple alignment
18
Feng-Doolittle (1987) u Calculate all pairwise distances using alignment scores: u Construct a guide tree using hierarchical clustering u Highest scoring pairwise alignment determines sequence to group alignment
19
Profile alignment u Use profiles for group to sequence and group to group alignments CLUSTALW (Thompson et al., 1994): Similar to Feng-Doolittle, but uses profile alignment methods Numerous heuristics
20
Iterative Refinement u Addresses “frozen” sub-alignment problem u Iteratively realign sequences or groups to a profile of the rest u Barton and Sternberg (1987) Align two most similar sequences Align current profile to most similar sequence Remove each sequence and align it to profile
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.