Pairwise Sequence Analysis-III

Pairwise Sequence Analysis-III
Amino-acid substitution matrices PAM matrices Derivation Limitation BLOSUM matrices Lecture 4 CS566

Amino-acid substitution matrix
Goal To find log [Pjoint(xy)/Pindependent(xy)] To find probabilistic measures of “interchangeability” between amino acids Concepts Accepted mutation Replacement that does not disrupt function Markov chain (1st order) Next state (amino-acid) in time decided entirely by current value of state “Odds of winning for team in play-offs” (Does not matter how the team got there!) Lecture 4 CS566

Point Accepted Mutation (PAM) Matrix
Pioneering work by Margaret Dayhoff et al (1978) Based on Evolutionary model PAM n matrix Scores based on allowing for average substitution in n% of residues Larger the value of n, greater the evolutionary distance between residues Lecture 4 CS566

PAM Matrix Generation Assumption
Based on atomic substitutions (“What you see is what you got”) A=>G and not A=>S=>G Sets of highly related sequences (>85% similarity) Lecture 4 CS566

PAM Matrix Generation Build phylogenetic (“family”) tree for each set of sequences to establish sequence of atomic changes Count residue populations and substitutions Estimate probability of replacements for each pair of residues Normalize to 1% average replacement and generate Mutation probability matrix Generate PAM1 matrix Generate other PAM matrices (e.g., PAM250) Lecture 4 CS566

Phylogenetic trees Tree for set of 4 sequences that have either C or D
at a certain position in the alignment Typically double-counted as C=>D as well as D=>C Counts to keep track of Frequency of each residue Frequency of each kind of substitution Frequency of each residue’s involvement in substitution Lecture 4 CS566

PAM n% mutation matrix generation
Square PAM 1 mutation matrix n times to obtain PAM n% matrix Helps to model “what is you see is not what you got” by representing longer evolutionary distances PAM 250 implies 250% average substitutions, i.e., average of 2.5 transitions between aligned residues – and NOT a completely different pair of protein sequences Lecture 4 CS566

PAM n% matrix generation
A given PAM n% mutation matrix is converted to the log odds form by dividing each entry by the relative abundance of each residue, taking the log, rounding and averaging x=>y and y<=x scores Lecture 4 CS566

Point Accepted Mutation (PAM) Matrix
Limitation Based on only one type of mutational event Ignores rarer types of mutations that are observed only over longer periods of time Because of the above, model does not fit as well for the more divergent sequences Lecture 4 CS566

BLOSUMx matrices Matrix scores for different evolutionary distances derived independently Much larger dataset (better sampling) Sequences clustered into BLOCKS; x represents % similarity within block Intrablock substitutions used to characterize log odds Lecture 4 CS566

Choice of appropriate matrix
Matrix should be chosen based on percent similarity of sequences being analyzed PAM250 for 20% similarity PAM120 for 40% similarity PAM80 for 50% similarity PAM60 for 60% similarity BLOSUM? Lecture 4 CS566

Pairwise Sequence Analysis-III

Similar presentations

Presentation on theme: "Pairwise Sequence Analysis-III"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pairwise Sequence Analysis-III

Similar presentations

Presentation on theme: "Pairwise Sequence Analysis-III"— Presentation transcript:

Similar presentations

About project

Feedback