Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.

Similar presentations


Presentation on theme: "Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to."— Presentation transcript:

1 Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to perform local alignment. 3) Have a general understanding about PAM and BLOSUM scoring matrices. Homework 3 and 4 due today Quiz 1 today Writing topic due today Homework 5 due Thursday, April 30.

2 Global Alignment output file Global: HBA_HUMAN vs HBB_HUMAN Score: 290.50 HBA_HUMAN 1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFP 44 |:| :|: | | |||| : | | ||| |: : :| |: :| HBB_HUMAN 1 VHLTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFE 43 HBA_HUMAN 45 HF.DLS.....HGSAQVKGHGKKVADALTNAVAHVDDMPNALSAL 83 | ||| |: :|| ||||| | :: :||:|:: : | HBB_HUMAN 44 SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATL 88 HBA_HUMAN 84 SDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKF 128 |:|| || ||| ||:|| : |: || | |||| | |: | HBB_HUMAN 89 SELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKV 133 HBA_HUMAN 129 LASVSTVLTSKYR 141 :| |: | || HBB_HUMAN 134 VAGVANALAHKYH 146 %id = 45.32 %similarity = 63.31 (88/139 *100) Overall %id = 43.15; Overall %similarity = 60.27 (88/146 *100)

3 Smith-Waterman Algorithm Advances in Applied Mathematics, 2:482-489 (1981) Smith-Waterman algorithm –can be used for local alignment -Memory intensive -Common searching programs such as BLAST use SW algorithm

4 Smith-Waterman (cont. 1) a. Initializes edges of the matrix with zeros b. It searches for sequence matches. c. Assigns a score to each pair of amino acids -uses similarity scores -uses positive scores for related residues -uses negative scores for substitutions and gaps d. Scores are summed for placement into Mi,j. If any sum result is below 0, a 0 is placed into Mi,j. e. Backtracing begins at the maximum value found anywhere in the matrix. f. Backtrace continues until the it meets an Mi,j value of 0.

5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 0 0 0 0 0 0 0 0 0 3 0 2012 4 0 0 0 10 2 0 0 1 12182214 6 0 2 16 8 0 0 4101828 20 0 0 82113 5 0 41020 27 0 0 6131912 4 0 416 26 H E A G A W G H E E PAWHEAE PAWHEAE Smith-Waterman (cont. 2) Put zeros on top row and left column. Assign initial scores based on a scoring matrix. Calculate new scores based on adjacent cell scores. If sum is less than zero or equal to zero begin new scoring with next cell. This example uses the BLOSUM45 Scoring Matrix with a gap penalty of -8.

6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 5 0 0 0 0 0 0 0 0 0 3 0 2012 4 0 0 0 10 2 0 0 1 12182214 6 0 2 16 8 0 0 4101828 20 0 0 82113 5 0 41020 27 0 0 6131912 4 0 416 26 H E A G A W G H E E PAWHEAE PAWHEAE Smith-Waterman (cont. 3) Begin backtrace at the maximum value found anywhere on the matrix. Continue the backtrace until score falls to zero AWGHE || AW-HE Score=28

7 Calculation of similarity score and percent similarity A W G H E A W - H E Blosum45 SCORES 5 15 -8 10 6 GAP PENALTY (novel) % SIMILARITY = NUMBER OF POS. SCORES DIVIDED BY NUMBER OF AAs IN REGION x 100 % SIMILARITY = 4/5 x 100 = 80% Similarity Score= 28

8 Why search sequence databases? 1. I have just sequenced something. What is known about the thing I sequenced? 2. I have a unique sequence. Does it have similarity to another gene of known function? 3. I found a new protein sequence in a lower organism. Is it similar to a protein from another species?

9 Perfect searches for similar sequences in a database First “hit” should be an exact match. Next “hits” should contain all of the genes that are related to your gene (homologs). Next “hits” should be similar but are not homologs

10 How does one achieve the “perfect search”? Consider the following: Scoring Matrices (PAM vs. BLOSUM) Local alignment algorithm Database Search Parameters Expect Value-change threshold for score reporting Translation-of DNA sequence into protein Filtering-remove repeat sequences

11 Which Scoring Matrix to use? PAM-1 BLOSUM-100 Small evolutionary distance High identity within short sequences PAM-250 BLOSUM-20 Large evolutionary distance Low identity within long sequences

12 BLOSUM Scoring Matrices Which BLOSUM Matrix to use? BLOSUM Identity (up to) 80 80% 62 62% (usually default value) 35 35% If you are comparing sequences that are very similar, use BLOSUM 80. Sequences that are more divergent (dissimilar) than 20% are given very low scores in this matrix.


Download ppt "Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to."

Similar presentations


Ads by Google