Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Similar presentations


Presentation on theme: "Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics."— Presentation transcript:

1 Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics

2 Multiple Sequence Alignment One amino acid sequence plays coy; a pair of homologous sequences whisper; many aligned sequences shout out loud. Very informative

3 Definition A global alignment of a set of sequences is obtained by –inserting into each sequence gap characters so that –the resulting sequences are of the same length and so that –no “column” has only gap characters

4 Example: Chromo domains aligned

5 Use of alignments High sequence similarity usually means significant structural and/or functional similarity. The reverse does not need to be true Homolog proteins (common ancestor) can vary significantly in large parts of the sequences, but still retain common 2D-patterns, 3D-patterns or common active site or binding site. Comparison of several sequences in a family can reveal what is common for the family. Something common for several sequences can be significant when regarding all of the sequences, but need not if regarding only two. Multiple alignment can be used to derive evolutionary history.

6 Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important

7 Conserved positions

8 Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important –patterns of hydrophobicity/hydrophilicity secondary structure elements

9 Helix pattern

10 Use of alignments Predict features of aligned objects –conserved positions structurally/functionally important –patterns of hydrophobicity/hydrophilicity secondary structure elements –“gappy” regions loops/variable regions

11 Loop?

12 Use of Alignments - make patterns/profiles Can make a profile or a pattern that can be used to match against a sequence database and identify new family members Profiles/patterns can be used to predict family membership of new sequences Databases of profiles/patterns –PROSITE –PFAM –PRINTS –...

13 Prosite: Motifs for classification Protein sequence Prosite pattern 1 Prosite pattern 2 Prosite pattern n Family 1Family 2Family n Pattern Regular expression Profile

14 Pattern from alignment [FYL]-x-[LIVMC]-[KR]-W-x-[GDNR]-[FYWLE]-x(5,6)-[ST]-W-[ES]-[PSTDN]-x(3)-[LIVMC]

15 Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules

16 For homologous proteins Two residues should be aligned (on top of each other) –if they are homologous (evolved from the same residue in a common ancestor protein) –if they are structurally equivalent

17 Automatic approach Need a way of scoring alignments –fitness function which for an alignment quantifies its “goodness” Need an algorithm for finding alignments with good scores Not all methods provide a scoring function for the final alignment!

18 Analysis of fitness function One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences For example, if the structure of (some of) the proteins are known.

19 Align by use of dynamic programming Dynamic programming finds best alignment of k sequences with given scoring scheme For two sequences there are three different column types For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x Time complexity of O(n k ) (sequence lengths = n)

20 Use of dynamic programming Dynamic programming finds best alignment of k sequences given scoring scheme

21 Algorithm for dynamic programming


Download ppt "Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics."

Similar presentations


Ads by Google