Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame.

Similar presentations


Presentation on theme: "Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame."— Presentation transcript:

1 Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame

2 Cédric Notredame (22/02/2016) Our Scope Pairwise Alignment methods are POWERFUL Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL Look once Under the Hood

3 Cédric Notredame (22/02/2016) Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Align Two Sequences ? -HOW can I Search a Database ? -HOW Can we Compare Two Sequences ?

4 Cédric Notredame (22/02/2016) Why Does It Make Sense To Compare Sequences ? Sequence Evolution

5 Cédric Notredame (22/02/2016) Why Do We Want To Compare Sequences wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE ?????? Homology? SwissProt

6 Cédric Notredame (22/02/2016) Why Do We Want To Compare Sequences

7 Cédric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? -Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence  Same Ancestor

8 Cédric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? Same Sequence Same Function Same 3D Fold Same Origin Many Counter-examples!

9 Cédric Notredame (22/02/2016) Comparing Is Reconstructing Evolution

10 Cédric Notredame (22/02/2016) An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection

11 Cédric Notredame (22/02/2016) An Alignment is a STORY ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation InsertionDeletion ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection

12 Cédric Notredame (22/02/2016) Evolution is NOT Always Divergent… AFGP with (ThrAlaAla)n Similar To Trypsynogen N AFGP with (ThrAlaAla)n S Chen et al, 97, PNAS, 94, 3811-16 NOT Similar to Trypsinogen

13 Cédric Notredame (22/02/2016) Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen AFGP with (ThrAlaAla)n NOT Similar to Trypsinogen N S SIMILAR Sequences BUT DIFFERENT origin

14 Cédric Notredame (22/02/2016) Evolution is NOT always Divergent… But in MOST cases, you may assume it is… Same Sequence Same Function Same 3D Fold Same Origin Similar Function DOES NOT REQUIRE Similar Sequence  Historical Legacy

15 Cédric Notredame (22/02/2016) How Do Sequences Evolve Each Portion of a Genome has its own Agenda.

16 Cédric Notredame (22/02/2016) How Do Sequences Evolve ? CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint FamilyK S K A Histone36.40 Insulin4.00.1 Interleukin I4.61.4  Globin5.10.6 Apolipoprot. AI4.51.6 Interferon G8.62.8 Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years) Ks Synonymous Mutations, Ka Non-Neutral.

17 Cédric Notredame (22/02/2016) Different molecular clocks for different proteins--another prediction

18 Cédric Notredame (22/02/2016) G C L I V A F Aliphatic Aromatic Hydrophobic C How Do Sequences Evolve ? The amino Acids Venn Diagram To Make Things Worse, Every Residue has its Own Personality S T W Y Q H K R E DN Polar P G Small C

19 Cédric Notredame (22/02/2016) How Do Sequences Evolve ? In a structure, each Amino Acid plays a Special Role OmpR, Cter Domain In the core, SIZE MATTERS On the surface, CHARGE MATTERS - - +

20 Cédric Notredame (22/02/2016) How Do Sequences Evolve ? Accepted Mutations Depend on the Structure Big -> Big Small->Small NO DELETION - - + Charged -> Charged Small Big or Small DELETIONS

21 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Substitution Matrices

22 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Two Sequences, We need: Their FunctionTheir Structure We Do Not Have Them !!!

23 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? We will Need To Replace Structural Information With Sequence Information. Same Sequence Same Function Same 3D Fold Same Origin It CANNOT Work ALL THE TIME !!!

24 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine … The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX How to derive that matrix?

25 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? G C L I V A F Aliphatic Aromatic Hydrophobic C S T W Y Q H K R E DN Polar P G Small C Using Knowledge Could Work But we do not know enough about Evolution and Structure. Using Data works better.

26 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log

27 Cédric Notredame (22/02/2016) You ’ re kidding! … I was struck by a lightning twice too!! Garry Larson, The Far Side

28 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log

29 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Some Residues are Easier To mutate into other similar Cysteins that make disulfide bridges and those that do not get averaged

30 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix

31 Cédric Notredame (22/02/2016)

32 How Can We Compare Sequences ? Using Substitution Matrix ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Insertion Deletion Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment

33 Cédric Notredame (22/02/2016) Most popular Subsitution Matrices PAM250 Blosum62 (Most widely used) Raw Score TPEA ¦| | APGA TPEA ¦| | APGA Score = 1= 9 Question: Is it possible to get such a good alignment by chance only? +6+0+2 Scoring an Alignment

34 Cédric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty

35 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices They ignore non-local interactions and Assume that identical residues are equal They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection

36 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices Substitution Matrices Cannot Work !!!

37 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices I know… But at least, could I get some idea of when they are likely to do all right

38 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Length %Sequence Identity 100 Same 3D Fold Twilight Zone Similar Sequence Similar Structure 30% Different Sequence Structure ???? 30

39 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

40 Cédric Notredame (22/02/2016)

41

42

43

44 How Can We Compare Sequences ? Which Matrix Shall I used The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350 Other Matrices Exist: BLOSUM 42 BLOSUM 62

45 Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Which Matrix Shall I use PAM: Distant Proteins  High Index (PAM 350) BLOSUM: Distant Proteins  Low Index (Blosum30) GONNET 250> BLOSUM62>PAM 250. But This will depend on: The Family. The Program Used and Its Tuning. Choosing The Right Matrix may be Tricky… Insertions, Deletions?

46 Cédric Notredame (22/02/2016) Dot Matrices Global Alignments Local Alignment HOW Can we Align Two Sequences ?

47 Cédric Notredame (22/02/2016)

48 Dot Matrices QUESTION What are the elements shared by two sequences ?

49 Cédric Notredame (22/02/2016) Dot Matrices >Seq1 THEFATCAT >Seq2 THELASTCAT THEFATCAT T H E F A S T C A T Window Stringency

50 Cédric Notredame (22/02/2016) Dot Matrices Sequences Window size Stringency

51 Cédric Notredame (22/02/2016) Dot Matrices Strigency Window=1 Stringency=1 Window=11 Stringency=7 Window=25 Stringency=15

52 Cédric Notredame (22/02/2016) Dot Matrices x y x y x

53 Cédric Notredame (22/02/2016) Dot Matrices http://myhits.isb-sib.ch/cgi-bin/dotlet

54 Cédric Notredame (22/02/2016) Dot Matrices

55 Cédric Notredame (22/02/2016) Dot Matrices

56 Cédric Notredame (22/02/2016) Dot Matrices

57 Cédric Notredame (22/02/2016) Dot Matrices Limits -Visual aid -Best Way to EXPLORE the Sequence Organisation -Does NOT provide us with an ALIGNMENT wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

58 Cédric Notredame (22/02/2016) Cost L Afine Gap Penalty Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) GOP GEP GOP Parsimony: Evolution takes the simplest path (So We Think…)

59 Cédric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty

60 Cédric Notredame (22/02/2016) Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) >Seq1 THEFATCAT >Seq2 THEFASTCAT -DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING THEFA-TCAT THEFASTCAT

61 Cédric Notredame (22/02/2016) Global Alignments F A S T F A T ----FAT FAST--- (L1+l2)! (L1)!*(L2)! ---FAT- FAST--- --F-AT- FAST--- Brute Force Enumeration 2 () DYNAMIC PROGRAMMING

62 Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Match=1MisMatch=-1Gap=-1 F A T FAST 1 -2 -3 0 -2-3-4 2 0 0 Dynamic Programming (Needlman and Wunsch) F A T FAST 1 -2 -3 0 -2-3-4 2 0 0 0 0 2 1 1 F A T FAST 1 -2-3-4 2 0 2 1 FAST FA-T

63 Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties GOP GEP

64 Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties Global Alignments do not take into account the MODULAR nature of Proteins C: K vitamin dep. Ca Binding K: Kringle Domain G: Growth Factor module F: Finger Module

65 Cédric Notredame (22/02/2016) Local Alignments GLOBAL AlignmentLOCAL Alignment Smith And Waterman (SW)=LOCAL Alignment

66 Cédric Notredame (22/02/2016) Local Alignments We now have a PairWise Comparison Algorithm, We are ready to search Databases

67 Cédric Notredame (22/02/2016) Database Search 1.10e-20 10 1.10e-100 1.10e-2 1.10e-1 10 3 1 3 6 1.10e-2 1 20 15 13 QUERRY Comparison Engine Database E-values How many time do we expect such an Alignment by chance? SW Q

68 Cédric Notredame (22/02/2016)

69 CONCLUSION

70 Cédric Notredame (22/02/2016) -There is a relation between Sequence and Structure. The Easiest way to Compare Two Sequences is a dotplot. Sequence Comparison -Thanks to evolution, We CAN compare Sequences -Substitution matrices only work well with similar Sequences (More than 30% id).

71 Cédric Notredame (22/02/2016) A few Addresses

72 Cédric Notredame (22/02/2016)

73

74

75

76

77


Download ppt "Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame."

Similar presentations


Ads by Google