Download presentation
Presentation is loading. Please wait.
Published byBeatrice Hill Modified over 8 years ago
1
Multiple alignment: Feng- Doolittle algorithm
2
Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved regions and function (more data) Better estimate of significance when using a sequence of unknown function Must use multiple alignments when establishing phylogenetic relationships Alignment of more than two sequences Usually gives better information about conserved regions and function (more data) Better estimate of significance when using a sequence of unknown function Must use multiple alignments when establishing phylogenetic relationships
3
Dynamic programming extended to many dimensions? No – uses up too much computer time and space E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 10 4 matrix elements If 3 sequences, 8 x 10 6 matrix elements If 6 sequences, 6.4 x 10 13 matrix elements No – uses up too much computer time and space E.g. 200 amino acids in a pairwise alignment – must evaluate 4 x 10 4 matrix elements If 3 sequences, 8 x 10 6 matrix elements If 6 sequences, 6.4 x 10 13 matrix elements
4
Need to find more efficient method Sacrifice certainty of optimum alignment for certainty of good alignment but faster Need to find more efficient method Sacrifice certainty of optimum alignment for certainty of good alignment but faster
5
Feng-doolittle algorithm Does all pairwise alignments and scores them Converts pairwise scores to “distances” D = -logS eff = -log [(S obs –S rand )/(S max – S rand )] S obs = pairwise alignment score S rand = exoected score for random alignment S max = average of self-alignments of the two sequences Does all pairwise alignments and scores them Converts pairwise scores to “distances” D = -logS eff = -log [(S obs –S rand )/(S max – S rand )] S obs = pairwise alignment score S rand = exoected score for random alignment S max = average of self-alignments of the two sequences
6
As S max approaches S rand (increasing evolutionary distance), S eff goes down; to make the distance measure positive, use the -log
7
Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences Sequences can be aligned with sequences or groups; groups can be aligned with groups Once the distances have been calculated, construct a guide tree (more in the phylogeny class) – tells what order to group the sequences Sequences can be aligned with sequences or groups; groups can be aligned with groups
8
Sequence-sequence alignments: dynamic programming Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned Sequence-sequence alignments: dynamic programming Sequence-group alignments: all possible pairwise alignments between sequence and group are tried, highest scoring pair is how it gets aligned to group Group-group alignments: all possible pairwise alignments of sequences between groups are tried; highest scoring pair is how groups get aligned
9
Example Seq1Seq2 Seq3Seq4 Seq5 Alignment 1 Alignment 2 Alignment 3 Final alignment
10
Notice that this method does not guarantee the optimum alignment; just a good one. Gaps are preserved from alignment to alignment: “once a gap, always a gap” Notice that this method does not guarantee the optimum alignment; just a good one. Gaps are preserved from alignment to alignment: “once a gap, always a gap”
11
In-class exercise Retrieve sequences from multalign.apr into BioScout Run Gap in BioScout on all combinations of the sequences in multalign.apr; use a gap penalty of 6 and an extension penalty of 2 Record alignment scores of each pairwise comparison Save pairwise alignments Retrieve sequences from multalign.apr into BioScout Run Gap in BioScout on all combinations of the sequences in multalign.apr; use a gap penalty of 6 and an extension penalty of 2 Record alignment scores of each pairwise comparison Save pairwise alignments
12
In class exercise, cont use raw alignment scores as distance measures; make a guide tree based on these scores In Vector NTI, select all sequences in multalign.apr (in the sequence pane); choose Alignment from the toolbar at the top; choose Alignment Setup from the pulldown; choose multiple alignment; take the defaults, choose ok; choose Alignment again, this time choose Align Selected Sequences from the pulldown use raw alignment scores as distance measures; make a guide tree based on these scores In Vector NTI, select all sequences in multalign.apr (in the sequence pane); choose Alignment from the toolbar at the top; choose Alignment Setup from the pulldown; choose multiple alignment; take the defaults, choose ok; choose Alignment again, this time choose Align Selected Sequences from the pulldown
13
In class exercise, cont. Note that ClustalW does some other things that the Pileup program discussed on the tape does not; we are going to ignore those things for the moment Compare ClustalW’s guide tree (visible in the Phylogenetic Tree Pane – tab at bottom of window) with yours Note that ClustalW does some other things that the Pileup program discussed on the tape does not; we are going to ignore those things for the moment Compare ClustalW’s guide tree (visible in the Phylogenetic Tree Pane – tab at bottom of window) with yours
14
In class exercise, cont Carefully examine ClustalW’s alignment; compare it to the individual pairwise alignments you saved. Are there differences?
15
Start refining alignment: Use structural info if you have it Find patterns if you don’t Use amino acid structure handout from beginning of class for substitution decisions! Start refining alignment: Use structural info if you have it Find patterns if you don’t Use amino acid structure handout from beginning of class for substitution decisions!
16
ClustalW Most widely used multiple alignment method Similar strategy to the Feng-Doolittle approach implemented as Pileup, but more complex and gives generally superior results Ad hoc nature of the program can be mysterious Most widely used multiple alignment method Similar strategy to the Feng-Doolittle approach implemented as Pileup, but more complex and gives generally superior results Ad hoc nature of the program can be mysterious
17
Advantageous differences Gap penalties vary locally: By observed frequency (in database) after each residue By simple structure prediction – lower gap penalties in probable loop regions By proximity to existing gaps – higher gap penalties when within 8 residues of an existing gap Gap penalties vary locally: By observed frequency (in database) after each residue By simple structure prediction – lower gap penalties in probable loop regions By proximity to existing gaps – higher gap penalties when within 8 residues of an existing gap
18
Advantages, cont. Change in substitution matrix choice depending on distance computed for guide tree Substitution matrix families Profile construction (more later) Weighting of sequences in profiles depending on evolutionary distance computed for guide tree More similar sequences get less weight than less similar sequences Change in substitution matrix choice depending on distance computed for guide tree Substitution matrix families Profile construction (more later) Weighting of sequences in profiles depending on evolutionary distance computed for guide tree More similar sequences get less weight than less similar sequences
19
In class exercise II Change a few parameters in the ClustalW program (gap, gap extension, substitution matrix, etc.) one at a time: this is done in Alignment Setup. After each run with a different change, save the alignment project with some descriptive name that you can remember (e.g., gap20 or blosum) Compare alignment results with different parameters changed Change a few parameters in the ClustalW program (gap, gap extension, substitution matrix, etc.) one at a time: this is done in Alignment Setup. After each run with a different change, save the alignment project with some descriptive name that you can remember (e.g., gap20 or blosum) Compare alignment results with different parameters changed
20
MultAlin MultAlin is also a heuristic algorithm that builds up a multiple alignment from a group of pairwise alignments It differs from Pileup and Clustal in that the guide tree is recalculated based on the results of each alignment step Because this leads to cycles of tree building and alignmnent, MultAlin can take a long time to run. It stops after the overall alignment score stops improving MultAlin is also a heuristic algorithm that builds up a multiple alignment from a group of pairwise alignments It differs from Pileup and Clustal in that the guide tree is recalculated based on the results of each alignment step Because this leads to cycles of tree building and alignmnent, MultAlin can take a long time to run. It stops after the overall alignment score stops improving
21
Scoring a multiple sequence alignment Assumptions: Sequences (rows) independent Positions (columns) independent Neither assumption is true … Score of a column is the (possibly weighted) sum of all the pairwise comparisons (I.e., substitution matrix values) within that column Score of a multiple alignment is the sum of scores for all columns Assumptions: Sequences (rows) independent Positions (columns) independent Neither assumption is true … Score of a column is the (possibly weighted) sum of all the pairwise comparisons (I.e., substitution matrix values) within that column Score of a multiple alignment is the sum of scores for all columns
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.