Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.

Similar presentations


Presentation on theme: "Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural."— Presentation transcript:

1 Pairwise sequence alignment Lecture 02

2 Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural and functional analysis of newly determined sequences.  The most fundamental comparison process is sequence alignment.  Search for common character patterns  Pairwise sequence alignment is the basis of database similarity searching and multiple sequence alignment

3 Evolutionary Basis  DNA and proteins are products of evolution.  Linear sequences of the nucleotide bases and amino acids form the primary structure of the DNA and proteins.  They can be considered as molecular fossils that encode the history of million of years of evolution.  During the time period they undergo random changes  Selections  Mutations

4 Evolutionary Basis cont.…  But the traces of evolution may still exits that will allow to identify the common ancestry.  This is due to the residues that perform key functional structural roles tend to be preserved by natural selection. Others tend to mutate more frequently.  Therefore by sequence alignment, patterns of conversions and variation can be identified.  The degree of sequence alignment reveals the evolutionary relatedness of different sequences.  Variations between the sequences reflects the changes occurred during the evolution in the form of substitution, insertions and deletions,

5 Why sequence alignment?  It helps to characterize the functions of unknown sequences.  Significant similarity between two sequences indicates that they belongs to the same family.  It is the basis for prediction of structure and functions of uncharacterized sequences.  Also it provides inference for the relatedness (evolutionary relationship) of two sequences under study.  If they share significant similarity it reflects the fact that they must have derived from a common evolutionary origin.

6 Sequence Homology vs. Sequence Similarity  Sequence homology – is an inference or a conclusion about a common ancestral relationship drawn from sequence similarity comparison. (high degree of similarity)  Sequence similarity – is the percentage of aligned residues that are similar in physiochemical properties such as size, charge and etc..  Sequence similarity can be quantified using percentages; homology is a qualitative statement.  Ex: two sequences share 40% similarity.  The two sequences are either homologous or nonhomologous

7 Methods  Pairwise sequence alignment  The task of locating equivalent regions of two or more sequences to maximize their overall similarity.  Alignment strategies:  Global Alignment  Local Alignment

8 Global Alignment  It is assumed that the two sequences to be aligned are generally similar over their entire length.  Alignment is carried out from beginning to the end of both sequences.  Finds out the best possible alignment across the entire length between the two  More applicable for aligning two closely related sequences.  May not be able to generate optimal results

9 Local Alignment  Finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions.  More applicable for aligning more divergent sequences.  Can be used to search conserved patterns in DNA or protein.  The aligning sequences can be in different lengths.

10 Alignment Algorithms  Fundamentally similar for both global and local alignments  They only differ in the optimization strategy used in aligning similar residues.  Methods:  The dot matrix method  The dynamic programming method  The word method

11 Dynamic Programming Method  It is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences.  Works by first constructing a two-dimensional matrix whose axes are the two sequences to be compared.  The residue matching is according to a particular scoring matrix.  The scores are calculated one row at a time.  The scores are accumulated along the diagonal going from the upper left corner to the lower right corner.  The best score is given by the bottom right corner of the matrix.

12 Dynamic Programming Method  To find the optimal alignment, back track through the matrix in reverse order from the lower right hand corner of the matrix towards the origin of the matrix in the upper left hand corner.  The best matching path is the one that has the maximum total score.

13 Gap penalties  Gaps represents insertions and deletions in sequence alignment  There is no precise costs for introducing insertions and deletions  Linear score  γ(g)=-dg, where d is a fixed cost, g is the gap length  Longer the gap higher the penalty  Affine score  γ(g)=-(d +(g-1)e), d and e are fixed penalties and e<d.  d : gap open penalty  e : gap extension penalty

14 Global alignment (by Needleman & Wunsch)  Given two sequences u and v and a scoring matrix delta, find the alignment with the maximal score.  The number of possible alignments is very big.

15 Number of possible alignments CAATGAATTGATSequence 1Sequence 2 CAATGA_ _ATTGAT CAATGA ATTGAT CAATGA_ AT_TGAT Alignment 1Alignment 2Alignment 3 Can we find the highest scoring alignment by enumerating all possible alignments and picking the best?

16 Dynamic programming idea  Let x’ s length be n  Let y’ s length be m  construct an ( n +1)  ( m +1) matrix F  F ( j, i ) = score of the best alignment of x 1 …x i with y 1 …y j y x A A CAG A C score of best alignment of AAA to AG

17 DP for global alignment with linear gap penalty F(j,i) F(j-1,i-1)F(j-1,i) F(j,i-1) -d s(x i,y j ) DP recurrence relation Score of the best partial alignment between x 1..x i and y 1.. y j

18 DP algorithm sketch: global alignment  initialize first row and column of matrix  fill in rest of matrix from top to bottom, left to right  for each F ( i, j ), save pointer(s) to cell(s) that resulted in best score  F (m, n) holds the optimal alignment score; trace pointers back from F (m, n) to F (0, 0) to recover alignment

19 Global alignment example  suppose we choose the following scoring scheme:  d (penalty for aligning with a gap) = 2

20 Initializing matrix: global alignment with linear gap penalty

21 Global alignment example

22 DP comments  works for either DNA or protein sequences, although the substitution matrices used differ  finds an optimal alignment  the exact algorithm (and computational complexity) depends on gap penalty function

23 Local alignment  so far we have discussed global alignment, where we are looking for best match between sequences from one end to the other  often we want a local alignment, the best match between subsequences of x and y

24 Local alignment DP algorithm  original formulation: Smith & Waterman, Journal of Molecular Biology, 1981  interpretation of array values is somewhat different:  F ( i, j ) = score of the best alignment of a suffix of x[1…i ] and a suffix of y[1…j ]

25 Local alignment DP algorithm  the recurrence relation is slightly different than for global algorithm

26 Local alignment DP algorithm  Initialization:  F(0, j) = F(i, 0) = 0  traceback:  find maximum value of F(i, j); can be anywhere in matrix  stop when we get to a cell with value 0

27 Local alignment example AAGA 00000 T0 T0 A0 A0 G0 Match +1 Mismatch -1 Gap -2

28 Local alignment example AAGA 00000 T00000 T00000 A0 A0 G0

29 AAGA 00000 T00000 T00000 A01101 A0 G0

30 AAGA 00000 T00000 T00000 A01101 A01201 G0

31 AAGA 00000 T00000 T00000 A01101 A01201 G0001 3

32 Local alignment DP algorithm  No negative values in local alignment DP array  Optimal local alignment will never have a gap on either end  Local alignment: “Smith-Waterman”  Global alignment: “Needleman-Wunsch”


Download ppt "Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural."

Similar presentations


Ads by Google