Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing Two Protein Sequences

Similar presentations


Presentation on theme: "Comparing Two Protein Sequences"— Presentation transcript:

1 Comparing Two Protein Sequences
Cédric Notredame

2 Our Scope If You Understand the LIMITS they Become VERY POWERFUL
Look once Under the Hood Pairwise Alignment methods are POWERFUL Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL

3 Outline -WHY Does It Make Sense To Compare Sequences
-HOW Can we Compare Two Sequences ? -HOW Can we Align Two Sequences ? -HOW can I Search a Database ?

4 Why Does It Make Sense To Compare Sequences ?
Sequence Evolution

5 Why Do We Want To Compare Sequences
wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE ?????? Homology? SwissProt

6 Why Do We Want To Compare Sequences

7 Why Does It Make Sense To Align Sequences ?
-Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence  Same Ancestor

8 Why Does It Make Sense To Align Sequences ?
Same Sequence Same Function Same Origin Same 3D Fold Many Counter-examples!

9 Comparing Is Reconstructing Evolution

10 An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutations + Selection Mutations, deletions are the engines of evolution, but selection does the steering… As shown here it is often impossible to tell appart insertions and deletions, hence their generic name: indels. Next: Homology

11 An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN Mutations + Selection Mutations, deletions are the engines of evolution, but selection does the steering… As shown here it is often impossible to tell appart insertions and deletions, hence their generic name: indels. Next: Homology Deletion Insertion ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation

12 Evolution is NOT Always Divergent…
Chen et al, 97, PNAS, 94, AFGP with (ThrAlaAla)n Similar To Trypsynogen N AFGP with (ThrAlaAla)n S NOT Similar to Trypsinogen

13 Evolution is NOT Always Divergent
AFGP with (ThrAlaAla)n Similar To Trypsynogen NOT Similar to Trypsinogen N S SIMILAR Sequences BUT DIFFERENT origin

14 Evolution is NOT always Divergent…
But in MOST cases, you may assume it is… Similar Function DOES NOT REQUIRE Similar Sequence Same Sequence Function 3D Fold Origin Similar Sequence Historical Legacy

15 How Do Sequences Evolve
Each Portion of a Genome has its own Agenda.

16 How Do Sequences Evolve ?
CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint Family KS KA Histone Insulin Interleukin I a-Globin Apolipoprot. AI Interferon G Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years) Ks Synonymous Mutations, Ka Non-Neutral.

17 Different molecular clocks for different proteins--another prediction
The Neutral Theory also makes another prediction about molecular clocks--namely that different types of proteins will have different clock rates? In particular, proteins whose structures are such that a small change in the amino acid sequence can impair the function of that protein, should evolve at the slowest rates, whereas proteins whose amino acid sequences can be modified fairly dramatically WITHOUT impairing function, should evolve at the fastest rates? Do we, in fact, see evidence of this? Yes. Consider the fibrinopeptide class of protein. These proteins are involved in blood clotting. They can perform this function even when there are numerous amino acid changes. They evolve at a relatively rapid rate, as the slode of the line relating aa substitutions to time shows (slide). On the other hand, cytochrome c, a protein involved in respiration metabolism, cannot tolerate many changes to its aa sequence without losing function. As the slide shows, it evolves (“its clock ticks at”) a much slower rate.

18 How Do Sequences Evolve ? The amino Acids Venn Diagram
To Make Things Worse, Every Residue has its Own Personality G C L I V A F Aliphatic Aromatic Hydrophobic P G Small C S T W Y Q H K R E D N Polar

19 How Do Sequences Evolve ?
In a structure, each Amino Acid plays a Special Role OmpR, Cter Domain In the core, SIZE MATTERS On the surface, CHARGE MATTERS - +

20 How Do Sequences Evolve ?
Accepted Mutations Depend on the Structure Big -> Big Small ->Small NO DELETION + - - Charged -> Charged Small <-> Big or Small DELETIONS

21 How Can We Compare Sequences ?
Substitution Matrices

22 How Can We Compare Sequences ?
To Compare Two Sequences, We need: We Do Not Have Them !!! Their Structure Their Function

23 How Can We Compare Sequences ?
We will Need To Replace Structural Information With Sequence Information. Same Sequence Same Origin Same Function Same 3D Fold It CANNOT Work ALL THE TIME !!!

24 How Can We Compare Sequences ?
To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX How to derive that matrix?

25 How Can We Compare Sequences ?
G C L I V A F Aliphatic Aromatic Hydrophobic S T W Y Q H K R E D N Polar P Small Using Knowledge Could Work But we do not know enough about Evolution and Structure. Using Data works better.

26 How Can We Compare Sequences ? Making a Substitution Matrix
-Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log

27 You’re kidding! … I was struck by a lightning twice too!!
Garry Larson, The Far Side

28 How Can We Compare Sequences ? Making a Substitution Matrix
The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Cysteins that make disulfide bridges and those that do not get averaged Some Residues are Easier To mutate into other similar

29 How Can We Compare Sequences ? Making a Substitution Matrix

30

31 How Can We Compare Sequences ? Using Substitution Matrix
ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Insertion Deletion Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment

32 Scoring an Alignment TPEA ¦| | APGA Most popular Subsitution Matrices
PAM250 Blosum62 (Most widely used) Raw Score TPEA ¦| | APGA Score = = 9 Question: Is it possible to get such a good alignment by chance only? 1 + 6 + + 2

33 Insertions and Deletions
Gap Penalties Opening a gap is more expensive than extending it Gap Opening Penalty Gap Extension Penalty gap Seq A GARFIELDTHE----CAT ||||||||||| ||| Seq B GARFIELDTHELASTCAT

34 How Can We Compare Sequences ? Limits of the substitution Matrices
They ignore non-local interactions and Assume that identical residues are equal They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN Mutations + Selection

35 How Can We Compare Sequences ? Limits of the substitution Matrices
Substitution Matrices Cannot Work !!!

36 How Can We Compare Sequences ? Limits of the substitution Matrices
I know… But at least, could I get some idea of when they are likely to do all right

37 How Can We Compare Sequences ?
The Twilight Zone %Sequence Identity Similar Sequence Similar Structure 30% Different Sequence Structure ???? Same 3D Fold 30 Twilight Zone Length 100

38 How Can We Compare Sequences ?
The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

39

40

41

42

43 How Can We Compare Sequences ? Which Matrix Shall I used
The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350 Other Matrices Exist: BLOSUM 42 BLOSUM 62

44 How Can We Compare Sequences ? Which Matrix Shall I use
PAM: Distant Proteins High Index (PAM 350) BLOSUM: Distant Proteins  Low Index (Blosum30) GONNET 250> BLOSUM62>PAM 250. But This will depend on: The Family. The Program Used and Its Tuning. Choosing The Right Matrix may be Tricky… Insertions, Deletions?

45 HOW Can we Align Two Sequences ?
Dot Matrices Global Alignments Local Alignment

46

47 Dot Matrices QUESTION What are the elements shared by two sequences ?

48 Dot Matrices >Seq1 THEFATCAT >Seq2 THELASTCAT Window Stringency

49 Dot Matrices Sequences Window size Stringency

50 Dot Matrices Strigency Window=1 Stringency=1 Window=11 Stringency=7

51 Dot Matrices x y x y x

52 Dot Matrices

53 Dot Matrices

54 Dot Matrices

55 Dot Matrices

56 Dot Matrices Limits -Visual aid
-Best Way to EXPLORE the Sequence Organisation -Does NOT provide us with an ALIGNMENT wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

57 Parsimony: Evolution takes the simplest path (So We Think…)
Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) Cost L Afine Gap Penalty GOP GEP GOP GOP Parsimony: Evolution takes the simplest path (So We Think…)

58 Insertions and Deletions
Gap Penalties Opening a gap is more expensive than extending it Gap Opening Penalty Gap Extension Penalty gap Seq A GARFIELDTHE----CAT ||||||||||| ||| Seq B GARFIELDTHELASTCAT

59 Global Alignments >Seq1 THEFATCAT >Seq2 THEFASTCAT THEFA-TCAT
-Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) -DYNAMIC PROGRAMMING >Seq1 THEFATCAT >Seq2 THEFASTCAT DYNAMIC PROGRAMMING THEFA-TCAT THEFASTCAT

60 ( ) Global Alignments Brute Force Enumeration 2 F A S T F A T (L1+l2)!
DYNAMIC PROGRAMMING Brute Force Enumeration 2 ----FAT FAST--- F A S T ( ) (L1+l2)! ---FAT- FAST--- F A T (L1)!*(L2)! --F-AT- FAST---

61 Global Alignments Dynamic Programming (Needlman and Wunsch) F A S T F
Match=1 MisMatch=-1 Gap=-1 F A S T F A S T F A S T -1 -2 -3 -4 -1 -2 -3 -4 -1 -2 -3 -4 F F F -1 1 -1 1 -1 1 A A A -2 2 -2 2 1 2 1 T T T -3 -3 -1 -1 1 2 2 F A S T F A - T

62 Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties GOP GEP

63 Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties Global Alignments do not take into account the MODULAR nature of Proteins C: K vitamin dep. Ca Binding K: Kringle Domain G: Growth Factor module F: Finger Module

64 Local Alignments LOCAL Alignment GLOBAL Alignment Smith And Waterman (SW)=LOCAL Alignment

65 Local Alignments We now have a PairWise Comparison Algorithm, We are ready to search Databases

66 Database Search Q QUERRY Comparison Engine Database E-values
How many time do we expect such an Alignment by chance? Database SW Q 1.10e-20 10 1.10e-100 1.10e-2 1.10e-1 3 1 6 20 15 13

67

68 CONCLUSION

69 Sequence Comparison -Thanks to evolution, We CAN compare Sequences
-There is a relation between Sequence and Structure. -Substitution matrices only work well with similar Sequences (More than 30% id). The Easiest way to Compare Two Sequences is a dotplot.

70 A few Addresses

71

72

73

74

75

76


Download ppt "Comparing Two Protein Sequences"

Similar presentations


Ads by Google