Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.

Similar presentations


Presentation on theme: "Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman."— Presentation transcript:

1 Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman algorithm. Have a general understanding about PAM and BLOSUM scoring matrices. Workshop-Compare scoring matrices.

2 Smith-Waterman Algorithm Advances in Applied Mathematics, 2:482-489 (1981) Smith-Waterman algorithm –can be used for global or local alignment -Memory intensive -Common searching programs such as BLAST use SW algorithm

3 Mi,j = MAXIMUM [ M i-1, j-1 + s i,,j (match or mismatch in the diagonal), M i, j-1 + w (gap in sequence #1), M i-1, j + w (gap in sequence #2), 0] Where Mi-1, j-1 is the value in the cell diagonally juxtaposed to M i,j. (The i-1, j-1 cell is up and to the left of m i,n j ). Where s i,j is the value for the match or mismatch in the m i n j cell. Where Mi, j-1 is the value in the cell above M i,j. Where w is the value for the gap penalty. Where Mi-1, j is the value in the cell to the left of M i,j. Smith-Waterman algorithm

4 Two sequences to align Sequence 1: ABCNJRQCLCRPM Sequence 2: AJCJNRCKCRBP

5 Initialization step: create matrix with M + 1 columns and N + 1 rows. M = number of letters in sequence 1 and N = number of letters in sequence 2. First column (M-1) and first row (N-1) will be filled with 0’s.

6 Matrix fill step: Each position M i,j is defined to be the MAXIMUM score at position i,j M i,j = MAXIMUM [ M i-1, j-1 + s i,,j (match or mismatch in the diagonal) M i, j-1 + w (gap in sequence #1) M i-1, j + w (gap in sequence #2)] row column

7

8 Sequence 1: ABCNJ-RQCLCR-PM Sequence 2: AJC-JNR-CKCRBP- Score : 8

9 Smith-Waterman (local alignment) a. Initializes edges of the matrix with zeros b. It searches for sequence matches. c. Assigns a score to each pair of amino acids -uses similarity scores -uses positive scores for related residues -uses negative scores for substitutions and gaps d. Scores are summed for placement into Mi,j. If any sum result is below 0, a 0 is placed into Mi,j. e. Backtracing begins at the maximum value found anywhere in the matrix. f. Backtrace continues until the it meets an Mi,j value of 0.

10 BLOSUM 45 Scoring Matrix

11 A W G H E A W – H E Score: 5 15 -8 10 6 Total score: 28 Pecent similarity: 4/5 x 100 = 80%

12 How does one achieve the “perfect database search”? Consider the following: Scoring Matrices (PAM vs. BLOSUM) Local alignment algorithm Database Search Parameters Expect Value-change threshold for score reporting Filtering-remove repeat sequences

13 Which Scoring Matrix to use? PAM-1 BLOSUM-100 Small evolutionary distance High identity within short sequences PAM-250 BLOSUM-20 Large evolutionary distance Low identity within long sequences

14 BLOSUM Scoring Matrices Which BLOSUM Matrix to use? BLOSUM Identity (up to) 80 80% 62 62% (usually default value) 35 35% If you are comparing sequences that are very similar, use BLOSUM 80. Sequences that are more divergent (dissimilar) than 20% are given very low scores in this matrix.

15 Logic behind PAM scoring matrix

16 Original amino acid Replacement amino acid

17 Figure 4.2 Numbers of accepted point mutations (multiplied by 10). A total of 1572 exchanges are shown. Positions with red dashes are Mjj values. Modified from Dayhoff, 1978.

18 Relative mutability calculations Figure 4.3 Simplified example to show how relative mutability is calculated.

19 Development of the Mutation Probability Matrix.

20 Development of the Mutation Probability Matrix. (2) Figure 4.4. Mutational Probability Matrix (partial). This only shows 5 of the 20 amino acids in the MPM. Numbers were multiplied by 10,000 to make it easier to read. The numbers for each column adds up to 10,000. In the top row there are the replacement amino acids a nd on the left column are the original amino acids. Mjj values shown are 9867, 9913, 9822, 9859 and 9973.

21

22 What is percent of amino acids that differ in the MPM? This value totals 99 for each amino acid. There is a 1% difference for each amino acid

23 Conversion of the PAM1 Mutational Probability Matrix to the PAM1 Scoring Matrix.

24 Conversion of the PAM1 Mutational Probability Matrix to other PAM scoring matrices. 1 Mutation Probability Matrices generated by the equation (PAM1 MPM) n where n is the number listed in the first column.

25


Download ppt "Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman."

Similar presentations


Ads by Google