Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/60 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica.

Similar presentations


Presentation on theme: "1/60 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica."— Presentation transcript:

1 1/60 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica

2 2 Characteristics of Our Method  Model this as a constraint satisfaction problem  Solve it using natural language parsing techniques Both top-down and bottom-up Both top-down and bottom-up  An iterative approach Create spin systems based on noisy data. Create spin systems based on noisy data. Link spin systems by using maximum independent set finding techniques. Link spin systems by using maximum independent set finding techniques.

3 3 Outline  Introduction  Method  Experiment Results  Conclusion

4 4 Blind Man’s Elephant  We cannot directly “see” the positions of these atoms (the structure)  But we can measure a set of parameters (with constraints) on these atoms Which can help us infer their coordinates Which can help us infer their coordinates Each experiment can only determine a subset of parameters (with noises) To combine the parameters of different experiments we need to stitch them together

5 5 The Flow of NMR Experiments Structure ConstraintsResonance assignment Get protein Samples Calculation and simulation - Energy minimization - Fitness of structure constraints Collect NMR spectra

6 6 Find out Chemical Shift for Each Atom Backbone atoms: Ca, Cb, C’, N, NH Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA Side chain: all others (especially CHs) TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY CC CC N H H CC CC CC H2H2 H2H2 H3H3 Chemical Shift Assignment One amino acid

7 7 H-C-H C H-C-HH -N-C-C-N-C-C-N-C-C-N-C-C- O O O O H H H H HO H H-C-H CH3 Backbone Some Relevant Parameters ppm 18-23 19-2416-20 17-23 31-34 55-60 CH3 30-35

8 8 Backbone: Ca, Cb, C’, N, NH HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA  sequential assignment  chemical shifts of Ca, Cb, NH HSQC Three important experiments

9 Our NMR spectra CBCANH CBCA(CO)NH  HSQC  CBCA(CO)NH (2 peaks)  HNCACB (4 peaks)

10 10 HSQC Spectra  HSQC peaks (1 chemical shifts for an amino acid) HNIntensity 8.109118.6065920032 HSQC

11 11 CBCA(CO)NH Spectra  CBCA(CO)NH peaks (2 chemical shifts for one amino acid) HNCIntensity 8.116118.2516.3779238811 8.109118.6036.5265920032

12 12 CBCANH Spectra  CBCANH peaks (4 chemical shifts for one amino acid) Ca (+), Cb (-) Ca (+), Cb (-) HNCIntensity 8.116118.2516.3779238811 8.109118.6036.52 ─65920032 8.117118.9061.58 ─51223894 8.119117.2557.42109928374 ++ --

13 13 A Dataset Example  HSQC  HNCACB 4  CBCA(CO)NH 2 N H

14 14 Backbone Assignment  Goal Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone.  General approaches Generate spin systems Generate spin systems A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb). Link spin systems Link spin systems

15 15 Ambiguities  All 4 point experiments are mixed together  All 2 point experiments are mixed together  Each spin system can be mapped to several amino acids in the protein sequence  False positives, false negatives

16 16 Previous Approaches  Constrained bipartite matching problem The spin system might be ambiguous The spin system might be ambiguous Can’t deal with ambiguous link Can’t deal with ambiguous link Legal matching Illegal matching under constraints

17 17 Natural Language Processing ─ Signal or Noise?  Speech recognition : Homophone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位

18 18 An Error-Tolerant Algorithm

19 19 Phrase, Sentence Combination

20 20 句意模版 句型模版 片語模版 字詞模版 Hierarchical Analysis

21 Perfect Group   Each spin group contains 6 points, in which 4 points are from the first experiments 2 points are from the second experiment H O H  N H C C C C C    H O H  N H C C C C C   

22 Perfect Group   Each spin group contains 6 points, in which 4 points are from the first experiments 2 points are from the second experiment H O H  N H C C C C C    H O H  N H C C C C C   

23 23 NHCIntensity 113.2937.89756.2941.64325e+008 113.2937.89727.8531.08099e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 56.29428.16562.54468.483 NHCIntensity113.2937.9262.5448.52851e+007 113.2937.9256.2944.71331e+007 113.2937.9268.483-8.54121e+007 113.2937.9228.165-3.49346e+007 CBCA(CO)NH CBCANH i -1 Ca Cb A Perfect Spin System Group

24 24 False Positives and False Negatives  False positives Noise with high intensity Noise with high intensity Produce fake spin systems Produce fake spin systems  False negatives Peaks with low intensity Peaks with low intensity Missing peaks Missing peaks  In real wet-lab data, nearly 50% are noises (false positive).

25 25 Spin System Group Perfect False Negative False Positive N H

26 26 Outline  Introduction  Method  Experiment Results  Conclusion

27 27 Main Idea  Deal with false negative in spin system generation procedures.  Eliminate false positive in spin system linking procedures.  Perform spin system generation and linking procedures in an iterative fashion.

28 28 Spin System Group Generation  Three types of spin system group are generated based on the quality of CBCANH data: Perfect Perfect Weak false negative Weak false negative Severe false negative Severe false negative

29 29 Perfect Spin Systems  A spin system is determined without any added pseudo peak. NHCIntensity 113.2937.89756.2941.64325e+008 113.2937.89727.8531.08099e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 56.29428.16562.54468.483 NHCIntensity113.2937.9262.5448.52851e+007 113.2937.9256.2944.71331e+007 113.2937.9268.483-8.54121e+007 113.2937.9228.165-3.49346e+007 CBCA(CO)NH CBCANH i -1 Ca Cb

30 30 Weak False Negative Spin System Group NHCIntensity 115.4819.60460.0441.30407e+008 115.4819.60430.666.93923e+007 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 60.04431.29159.41927.583  A spin system is determined with an added pseudo peak. NHCIntensity115.4819.61659.4192.25295e+008 115.4819.61631.291-4.82097e+007 115.4819.61627.853-1.33326e+008 CBCA(CO)NH CBCANH i -1 Ca Cb Ca 115.481 9.604 60.044 1.30407e+008

31 31 Severe false Negative Spin System Group NHCIntensity 119.8578.43528.1663.36293e+007 119.8578.43559.4191.56434e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 59.41928.16658.48128.79  A spin system is determined with two added pseudo peaks. NHCIntensity119.8568.47758.4813.7353e+008 119.8568.47728.79-2.55735e+008 CBCA(CO)NH CBCANH 119.857 8.435 28.166 3.36293e+007 119.857 8.435 59.419 1.56434e+008 i -1 Ca Cb Ca Note: it is also possible that C a i-1 = 28.166 and C b i-1 = 59.419

32 32 A note on spin system generation  To generate *ALL* possible spin systems, a peak can be included in more than one spin system. False positives are eliminated in spin system linking procedure. False positives are eliminated in spin system linking procedure. False negative are treated by adding pseudo peaks. False negative are treated by adding pseudo peaks.  A rule-based mechanism is used to filter out incompatible spin systems (false positives). Adopt maximum weight independent set algorithm Adopt maximum weight independent set algorithm

33 33 Spin System Linking  Goal Link spin system as long as possible. Link spin system as long as possible.  Constraints Each spin system is uniquely assigned to a position of the target protein sequence. Each spin system is uniquely assigned to a position of the target protein sequence. Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds. Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.

34 A Peculiar Parking Lot (valet parking) Information you have: The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).

35 Backbone Assignment DGRIGEIKGRKTLATPAVRRLAMENNIKLS

36 36 Spin System Positioning 55.266 38.675 44.555 0 44.417 0 55.043 30.04 44.417 0 30.665 28.72 55356 29.782 60.044 37.541 D 50G 10R 40I 50|51 55.266 38.675 44.555 0 => 50 10 44.417 0 55.043 30.04 =>10 40 44.417 0 30.665 28.72 =>10 40 55356 29.782 60.044 37.541 => 40 50  We assign spin system groups to a protein sequence according to their codes. Spin System

37 37 Segment 3 Segment 2 Segment 1 Link Spin System groups 55.266 38.675 44.555 0 44.417 0 55.043 30.04 44.417 0 30.665 28.72 55356 29.782 60.044 37.541 DGRI

38 38 Iterative Concatenation DGRI….FKJJREKL …. Step n Segment 99 1 2 …. 56 Spin Systems 1 2 47 1 Step1 56 … Step2 Segment 1 Segment 2 Segment 31 … Step n-1 Segment 78Segment 79 …

39 39 Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 71 Segment 79 Segment 99Segment 98 Segment 97  Two kinds of conflict segments Overlap (e.g. segment 71, segment 99) Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )

40 40 A Graph Model for Spin System Linking  G(V,E) V: a set of nodes (segments). V: a set of nodes (segments). E: (u, v), u, v  V, u and v are conflict. E: (u, v), u, v  V, u and v are conflict.  Goal Assign as many non-conflict segments as possible => find the maximum independent set of G. Assign as many non-conflict segments as possible => find the maximum independent set of G.

41 41 An Example of G  Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg1Seg3Seg4Seg2 Seg1 Seg3 Seg2 Seg4 SP13 SP15 Overlap

42 42 Segment weight  The larger length of segment is, the higher weight of segment is.  The less frequency of segment is, the higher of segment is.

43 43 Find Maximum Weight Independent Set of G  Boppana, R. and M.M. Halld ό rsson, Approximatin Maximum Independent Sets bt Excluding Subgraphs. BIR, 1992. 32(2).

44 44 An Iterative Approach  We perform spin system generation and linking iteratively.  Three stages.

45 45 First Stage  Generate perfect spin systems; Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments; Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in the resulting segments. Drop spin systems (and related peaks) that are used in the resulting segments.

46 46 Second Stage  Generate weak false negative spin systems. Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments; Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in the resulting segments. Drop spin systems (and related peaks) that are used in the resulting segments.

47 47 Third Stage  Generate severe false negative spin systems. Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments. Perform MaxIndSet on the segments.

48 48 ….FKJJREKL…. Segment Extension 109 1 2 …. 45 12 29 109 29 New 109 New spin systems

49 49 Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS DGRGEKGRKTLATPAVRRLAMENNIKLS MaxIndSet 77 99‘ 97‘ 99 97 45 23 26 31 29 32 33 24 27 28 77 71 78 99‘ 97‘ 99 97

50 50 Outline  Introduction  Method  Experimental Results  Conclusion

51 51 Experimental Results  Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica: Average precision: 87.5% Average precision: 87.5% Average recall: 73.1% Average recall: 73.1%  Perfect data from BMRB: 99.1%

52 52 Real Wet-Lab Datasets  The two datasets are obtained from our collaborator Dr. Tai- Huang, Huang in IBMS at Academia Sinica, Taiwan. Datasetssbdlbd # of amino acids5385 # of amino acids that are assigned manually by biologists4280 # of HSQC peaks5878 # of CBCA(CO)NH peaks258271 # of HNCACB peaks224620 # of expected CBCA(CO)NH84160 # of expected HNCACB168320 false positive of CBCA(CO)NH67.4% 41.0 % false positive of HNCACB25.0% 48.4 %

53 53 Experimental Results on Real Data datasetssbdlbd # of amino acid 5385 # of assigned amino acid 4281 # of HSQC 5878 # of CBCANH peaks 224620 # of CBCA(CO)NH peaks 258271 # of correctly assigned# of assignedaccuracyrecall Method on sbd323591.4%76.2% Method on lbd566783.6%70.0%

54 54 Outline  Introduction  Method  Experiment Results  Conclusion

55 55 Conclusion  We model the backbone assignment problem as a constraint satisfaction problem  This problem is solved using a natural language parsing technique (both bottom- up and top-down approach)  The same approach seem to work for a large class of noise reduction problems that are discrete in nature


Download ppt "1/60 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica."

Similar presentations


Ads by Google