Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)

Similar presentations


Presentation on theme: "Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)"— Presentation transcript:

1 Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-5) x (# gap openings) + (-2) x (total length of all gaps)

2

3

4 Scoring Matrices

5 Scoring Rules vs. Scoring Matrices Nucleotide vs. Amino Acid Sequence The choice of a scoring rule can strongly influence the outcome of sequence analysis Scoring matrices implicitly represent a particular theory of evolution Elements of the matrices specify the similarity of one residue to another

6 DNA: A T G C 1:1 RNA: A U G C 3:1 Protein: 20 amino acids Transcription Translation Replication Translation - Protein Synthesis: Every 3 nucleotides (codon) are translated into one amino acid

7 Nucleotide sequence determines the amino acid sequence

8 Translation - Protein Synthesis 5’ -> 3’ : N-term -> C-term RNA Protein

9

10

11 Log Likelihoods used as Scoring Matrices: PAM - % Accepted Mutations: 1500 changes in 71 groups w/ > 85% similarity BLOSUM – Blocks Substitution Matrix: 2000 “blocks” from 500 families

12 Log Likelihoods used as Scoring Matrices: BLOSUM

13 Likelihood Ratio for Aligning a Single Pair of Residues Above: the probability that two residues are aligned by evolutionary descent Below: the probability that they are aligned by chance Pi, Pj are frequencies of residue i and j in all protein sequences (abundance)

14 Likelihood Ratio of Aligning Two Sequences

15 The alignment score of aligning two sequences is the log likelihood ratio of the alignment under two models Common ancestry By chance

16 PAM and BLOSUM matrices are all log likelihood matrices More specificly: An alignment that scores 6 means that the alignment by common ancestry is 2^(6/2)=8 times as likely as expected by chance.

17 BLOSUM matrices for Protein S. Henikoff and J. Henikoff (1992). “Amino acid substitution matrices from protein blocks”. PNAS 89: 10915-10919 Training Data: ~2000 conserved blocks from BLOCKS database. Ungapped, aligned protein segments. Each block represents a conserved region of a protein family

18 Constructing BLOSUM Matrices of Specific Similarities Sets of sequences have widely varying similarity. Sequences with above a threshold similarity are clustered. If clustering threshold is 62%, final matrix is BLOSUM62

19 A toy example of constructing a BLOSUM matrix from 4 training sequences

20 Constructing a BLOSUM matr. 1. Counting mutations

21 Constructing a BLOSUM matr. 2. Tallying mutation frequencies

22 Constructing a BLOSUM matr. 3. Matrix of mutation probs.

23 4. Calculate abundance of each residue (Marginal prob)

24 5. Obtaining a BLOSUM matrix

25 Constructing the real BLOSUM62 Matrix

26 1.2.3.Mutation Frequency Table

27 4. Calculate Amino Acid Abundance

28 5. Obtaining BLOSUM62 Matrix

29

30 PAM Matrices (Point Accepted Mutations) Mutations accepted by natural selection

31 PAM Matrices Accepted Point Mutation Atlas of Protein Sequence and Structure, Suppl 3, 1978, M.O. Dayhoff. ed. National Biomedical Research Foundation, 1 Based on evolutionary principles

32 Constructing PAM Matrix: Training Data

33 PAM: Phylogenetic Tree

34 PAM: Accepted Point Mutation

35 Mutability

36 Total Mutation Rate is the total mutation rate of all amino acids

37 Normalize Total Mutation Rate

38 Mutation Probability Matrix Normalized Such that the Total Mutation Rate is 1%

39 Mutation Probability Matrix (transposed) M*10000

40 -- PAM1 mutation prob. matr. --PAM2 Mutation Probability Matrix? -- Mutations that happen in twice the evolution period of that for a PAM1

41 PAM Matrix: Assumptions

42 In two PAM1 periods: {A  R} = {A  A and A  R} or {A  N and N  R} or {A  D and D  R} or … or {A  V and V  R}

43 Entries in a PAM-2 Mut. Prob. Matr.

44 PAM-k Mutation Prob. Matrix

45 PAM-1 log likelihood matrix

46 PAM-k log likelihood matrix

47 PAM-250

48 PAM60—60%, PAM80—50%, PAM120—40% PAM-250 matrix provides a better scoring alignment than lower-numbered PAM matrices for proteins of 14-27% similarity

49 Sources of Error in PAM

50 Comparing Scoring Matrix PAM Based on extrapolation of a small evol. Period Track evolutionary origins Homologous seq.s during evolution BLOSUM Based on a range of evol. Periods Conserved blocks Find conserved domains

51 Choice of Scoring Matrix

52 Global Alignment with Affine Gaps Complex Dynamic Programming

53 Problem w/ Independent Gap Penalties The occurrence of x consecutive deletions/insertions is more likely than the occurrence of x isolated mutations We should penalize x long gap less than x times of the penalty for one gap

54 Affine Gap Penalty w2 is the penalty for each gap w1 is the _extra_ penalty for the 1 st gap

55 Scoring Rule not Additive! We need to know if the current gap is a new gap or the continuation of an existing gap Use three Dynamic Programming matrices to keep track of the previous step

56 S1 is the vertical sequence S2 is the horizontal sequence (From Diagonal) a(i,j): current position is a match (From Left) b(i,j): current position is a gap in S1 (From Above) c(i,j): current position is a gap in S2 Filling the next element in each matrix depends on the previous step, which is stored in the three matrices.

57

58 Last step a match a gap in S2 a gap in S1 new gap in S2 a continued gap in S2 a gap in S2 following a gap in S1

59

60

61

62

63

64

65 Decisions in Seq. Alignment Local or global alignment? Which program to use Type of scoring matrix Value of gap penalty

66 A ij *10

67 PAM-k log-likelihood matrix

68


Download ppt "Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)"

Similar presentations


Ads by Google