Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

Similar presentations


Presentation on theme: "Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics."— Presentation transcript:

1 Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics

2 Alignment problem Given a set of sequences, produce a multiple alignment which corresponds as well as possible to the biological relationships between the corresponding bio-molecules

3 For homologous proteins  Two residues should be aligned (on top of each other) if they are homologous (evolved from the same residue in a common ancestor protein) if they are structurally equivalent

4 Automatic approach  Need a way of scoring alignments fitness function which for an alignment quantifies its “goodness”  Need an algorithm for finding alignments with good scores  Not all methods provide a scoring function for the final alignment!

5 Analysis of fitness function  One can test whether the alignments optimal under a given fitness function correspond well to the biological relationships between the sequences  For example, if the structure of (some of) the proteins are known.

6 Scoring Alignments  In order to find an optimal alignment, we need to be able to measure how good an alignment is Sum of pairs (SP) method: in a column, score each pair of letters and total the scores. Pairs of gaps score 0. Total up scores for each column

7 SP Method Example  Using BLOSUM62 matrix, gap penalty -8  In column 1, we have pairs -,S S,S  k(k-1)/2 pairs per column -IK SIK SSE -8 - 8 + 4 = -12

8 Align by use of dynamic programming  Dynamic programming finds best alignment of k sequences with given scoring scheme  For two sequences there are three different column types  For three sequences there are seven different column types x means an amino acid, - a blank Sequence1 x - x x - - x Sequence2 x x - x - x - Sequence3 x x x - x - x

9 Use of dynamic programming  Dynamic programming finds best alignment of k sequences given scoring scheme

10 Algorithm for dynamic programming

11 Analysis  O(n k ) entries to fill  Each entry combines O(2 k ) other entries  Costs O(k 2 ) to calculate each SP score  Overall cost is O(k 2 2 k n k ), or exponential in the number of sequences!  NP-complete

12 General progressive alignment Algorithm. General progressive alignment. Progressive alignment of the sequences {s 1, s 2,..., s m } Var C current set of alignments begin C := ∅ for i := 1 to m do C := C union {{s i }} end one alignment of each seq. for i := 1 to m − 1 do choose two alignments A p,A q from C; C := C − {A p,A q } A r := align(A p,A q );C := C union {A r } end C now contains the (single) final alignment end

13 The Clustal Algorithm  Three steps: 1 Compare all pairs of sequences to obtain a similarity matrix 2 Based on the similarity matrix, make a guide tree relating all the sequences 3 Perform progressive alignment where the order of the alignments is determined by the guide tree

14 (A) 1 pairwise comparison 2 clustering/making tree (B) 3 Align according to tree

15 Clustal - summary  Does not use a score for the final alignment  Each pairwise alignment is done using dynamic programming  Heuristics are used - tailored to globular proteins  Graphical version: ClustalX

16 Phylogeny  The basic principle is that the origin of similarity is common ancestry.  The field of phylogeny has the goals of working out the relationships among species, populations, individuals, or genes.  Usually expressed as a tree.

17 Phylogeny  The basic principle is that the origin of similarity is common ancestry.  The field of phylogeny has the goals of working out the relationships among species, populations, individuals, or genes.  Usually expressed as a tree.

18 Phylogeny  A statement of phylogeny among objects assumes homology and depends on classification.  Phylogeny states a topology of the relationships based on classification according to similarity of one or more sets of characters, or on a model of evolutionary processes.

19

20 Phylogeny  It is rare for species relationships and ancestry to be directly observable.  Evolutionary trees determined from genetic data are often based on inferences from the patterns of similarity, which are all that is observable among species living now.


Download ppt "Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics."

Similar presentations


Ads by Google