Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,

Similar presentations


Presentation on theme: "Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,"— Presentation transcript:

1 Finding the Beta Helix Motif By Marcin Mejran

2 Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke, Jonathan King, Bonnie Berger Segmentation Conditional Random Fields (SCRFs): A New Approach for Protein Fold Recognition by Yan Liu, Jaime Carbonell, Peter Weigele, and Vanathi Gopalakrishnan

3 Secondary Structure Beta Strand Forms  -sheets Alpha Helix Stand alone Can combine into more complex structures: Beta sheets Beta Helixes Images from: http://www.people.virginia.edu/~rjh9u/prot2ndstruct.html

4  sheet

5 Second and a half Structure beta helix beta barrel beta trefoil

6  -Helix

7 Helix composed of three parallel  sheets Three  -strands per “rung” Connecting “loops” Not in Eukaryotes Secreted by various bacteria Right and left handed

8  -Helix Few solved structures  9 SCOP SuperFamilies  14 RH solved structures in PDB Solved structures differ widely B3 T2 B2 B1

9  -Helix T2 turn: unique two residue loop  -strands are 3 to 5 residues. T1 and T3 vary in size, may contain secondary structures  -strands interact between rungs

10  -Helix Good choice from computational point of view “Nice” structure  Repeating  parallel  -stands  Rungs have similar structure  Stacking is predictable  Well conserved  -stand across super- families

11  -Helix Long term interactions  Close in 3D but not 1D “Non-unique” features  B2-T2-B3 segment Unique features not clearly shown in sequence Usual methods don’t work Image from: http://www.cryst.bbk.ac.uk/PPS2/course/section10/all_beta.html

12 BetaWrap “Wraps” sequences around helix Finds best “wrap” Uses B2, B3 strands and T2 turn  Rest of rung varies greatly in size Decomposes into sub-problems  Rungs  Find multiple rungs  Find B1 by local optimization

13 Hydrophobic/charged Hydrophobic  Dislikes Water Hydrophilic  Like water Charged  On Outside B3 T2 B2 B1 Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

14 BetaWrap: Rungs Given a T2 turn, find the next T2 turn B2 B3 T2 Candidate Rung Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

15 BetaWrap: Rungs More weight given to inward pairs Certain stacked Amino Acids preferred Penalty for highly charged inward residues Penalizes too few or too many residues B3 T2 B2 B1 Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

16 BetaWrap: Multiple Rungs Find multiple initial B2- T2-B3 segments Match pattern based on hydrophobic residues (appear on the inside) Φ – A,F,I,L,M,V,W,Y – D,E,R,K X - Any AFDEMVRKYE FIFDDEAK EDEMVMVFD

17 BetaWrap: Multiple Rungs DP is used to find 5 rungs in either direction from initial positions α-helix filtering Take average score of top 10 remaining wraps Image from: http://betawrap.lcs.mit.edu/BetaTalk.ppt

18 BetaWrap: Completing Find B1 positions  Highest scoring parse  Does not affect wrap score. Further filtering on hydrophobic residues in T1 and T2

19 Training Seven fold cross-validation  Partitioned based on families Scores calculated for  α-helix filtering threshold  B1-score threshold  Hydrophobic count threshold  distribution of unmatched residues between rungs Image from: http://www.ornl.gov/info/ornlreview/v37_1_04/article_21.shtml

20 BetaWrap: Results

21 Correctly identifies Beta-Helixes Correctly separates helixes and non-helixes Can predict  -helixes across families

22 BetaWrap: Summary Pros: Finds beta-helixes Accurate Cons: Still makes errors  Rung placement Hard coded information  Over-fitting  Hard to generalize

23 Conditional Random Fields (CRFs) y1y1 x1x1 y2y2 x2x2 y3y3 x3x3 y4y4 x4x4 y5y5 x5x5 y6y6 x6x6 … HMM y1y1 x1x1 y2y2 x2x2 y3y3 x3x3 y4y4 x4x4 y5y5 x5x5 y6y6 x6x6 … CRF

24 Hidden Markov Model Set of States Transition Probabilities Emission Probabilities Only given sequence of emitted residues  Find sequence of true states Generative ResProb A.2 B.8 ResProb A.2 B.8 ResProb A.2 B.8

25 Hidden Markov Model HMM: Maximize P(x,y|θ) = P(y|x,θ)P(x|θ)  x: emitted state/given sequence  y: “hidden”/true state  P(x,y|θ): Joint probability of x and y  P(y|x,θ): Probability of y given x  P(x|θ): Probability of x Need to make assumptions about the distribution of x

26 Viterbi Algorithm HMM Find most likely path/most likely sequence of hidden states e 3 (x 1 ) e 2 (x 1 ) e 1 (x 1 ) e 3 (x 2 ) e 2 (x 2 ) e 1 (x 2 ) e 3 (x 3 ) e 2 (x 3 ) e 1 (x 3 ) e 3 (x 4 ) e 2 (x 4 ) e 1 (x 4 ) x1x1 x2x2 x3x3 x4x4

27 Viterbi Algorithm HMM e 3 (x 1 ) e 2 (x 1 ) e 1 (x 1 ) e 3 (x 2 ) e 2 (x 2 ) e 1 (x 2 ) e 3 (x 3 ) e 2 (x 3 ) e 1 (x 3 ) e 3 (x 4 ) e 2 (x 4 ) e 1 (x 4 ) x1x1 x2x2 x3x3 x4x4 v(i,j) = max(v(i-1,1)*t 1,j *e j (x i ), v(i-1,2)*t 2,j *e j (x i ) … v(i-k,1)*tk,j *e j (x i ))

28 HMM Disadvantages There is a strong independence assumption Long term interactions are difficult to model Overlapping features are difficult to model

29 Conditional Random Fields (CRFs) Replace transition and emission probabilities with a set of feature functions f(i,j,k) Feature functions based on all xs, not just one Not generative f(3,0,1) f(2,0,1) f(1,0,1) f(3,i,2) f(2,i,2) f(1,i,2) f(3,i,3) f(2,i,3) f(1,i,3) f(3,i,4) f(2,i,4) f(1,i,4) x1x1 x2x2 x3x3 x4x4

30 Conditional Random Fields (CRFs) HMM: Maximize P(x,y|θ)=P(y|x,θ)P(x|θ) CRF: Maximize P(y|x,θ) Do not make assumptions about underlying distribution

31 Viterbi CRFs Same method as for HMM f(3,0,1) f(2,0,1) f(1,0,1) f(3,i,2) f(2,i,2) f(1,i,2) f(3,i,3) f(2,i,3) f(1,i,3) f(3,i,4) f(2,i,4) f(1,i,4) x1x1 x2x2 x3x3 x4x4

32 Conditional Random Fields (CRFs) States should form a chain Likelihood function is convex for chain Z 0 = number of states λ k = weights

33 Segmented CRFs Each state corresponds to a structure Represented as a graph G  States represent secondary structures  Nodes represent interactions  Chains are nicer than graphs

34 Segmented CRFs G =  E1: Edges between neighbors  E2: Edges for long-term interactions E1 edges can be implied in model

35 Only E2 needs to be explicitly considered However Graph needs to be a chain for E2 Deterministic state transitions

36 Beta-Helix CRF

37 Combined states  B23:B2,B3,T2 Size assumptions:  B23:8 residues  B1: 3 residues  T1,T3: 1 to 80 res.

38 Intra-Node Features Regular Expression Template for B23 FIFDDEAK Φ – A,F,I,L,M,V,W,Y – D,E,R,K X - Any

39 Intra-Node Features Probabilistic motif profiles for B23 and B1 Use HMMER to generate profiles from known B23 and B1 sequences

40 Intra-Node Features Secondary Structure Prediction  PSIPRED  Helps locate T1 and T3  76 to 78% accuracy for α-helixes and coils Segment length for T1 and T3  Estimated as density function

41 Inter-Node Features Side chain alignment scores  Alignment between B23 regions  More weight given to inward pairs B3 T2 B2

42 Inter-Node Features Parallel Beta-sheet alignment scores Distance between adjacent B23 segments

43 SCRF: Results

44

45 Summary Discovered new beta-helix protein  Sf6 gp14 Detected beta-helixes in plants  None known of before More robust than BetaWrap

46 Questions


Download ppt "Finding the Beta Helix Motif By Marcin Mejran. Papers Predicting The  -Helix Fold From Protein Sequence Data by Phil Bradley, Lenore Cowen, Matthew Menke,"

Similar presentations


Ads by Google