Presentation is loading. Please wait.

Presentation is loading. Please wait.

ISA 2013. 05. 28 Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.

Similar presentations


Presentation on theme: "ISA 2013. 05. 28 Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet."— Presentation transcript:

1 ISA 2013. 05. 28 Kim Hye mi

2 Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet Target/Decoy Protein assignment & validation Output Interpretation Quantitation 2

3 Introduction Target-decoy search strategy is effective way Proteome researcher must devise way to distinguish correct from incorrect peptide identifications Target-decoy search strategy is simple to implement ‘Target’ protein sequence database The protein mixture to be analyzed ‘Decoy’ database By reversing the target protein sequences Minimizing the number of peptide sequences in common between the target and decoy 3

4 Introduction Target-decoy search strategy With FP estimations possible to derive other measurements that help evaluate and compare scoring methods and data sets. peptide-spectral matches (PSMs) are correct or incorrect the composite target-decoy database evaluates FP rates in large PSM populations. 4

5 Introduction Measurements derived from decoy database search results 5

6 Introduction Target-decoy search strategy The two assumptions mentioned are reasonable Target and decoy databases do not overlap Target and decoy false positives are equally likely Concatenated database searches are preferable to separate searches Estimating theoretical error of target-decoy false positive rates Alternate decoy database constructions can be similarly effective 6

7 Assumption 1 : target and decoy databases do not overlap If decoy hits are incorrect, they are not present in target database Very short peptides were found in both target and decoy database Practically no (0.02%) peptides with lengths greater than eight amino acids were in common between target and decoy database. 7

8 Assumption 1 : target and decoy databases do not overlap International Protein Index sequence database 8

9 Assumption 2 : target and decoy false positives are equally likely The validity of this assumption can be tested in two ways. The search algorithm must be presented with equal numbers of target and decoy peptides. The number of necessarily incorrect peptide hits should be equally distributed between target and decoy hits. 9

10 Assumption 2 : target and decoy false positives are equally likely The distributions of considered peptides were practically the same between target-decoy peptides regardless of mass tolerance. Target database Decoy database 10

11 Assumption 2 : target and decoy false positives are equally likely Comparing these curves indicated substantial correspondence between target-and decoy-derived peptides 11

12 Assumption 2 : target and decoy false positives are equally likely Indicating that top-ranked peptides showed a strong bias toward target database hits Unlike lower-ranked matches. We extended this idea by modifying MS/MS spectra to prevent any correct identifications from being made. 12

13 Concatenated database searches are preferable to separate searches Decoy sequences are searched separately Target and decoy sequence cannot compete for the top-ranked score Decoy searches may often receive elevated scores relative to other top- ranked hits Search MS/MS spectra once against a single database Consist of target and decoy sequences 13

14 Concatenated database searches are preferable to separate searches Separate searching method force one To assume all peptide assignments are incorrect Below the score at which decoy hits outnumber target hits Leading to an overestimated FP rates 14

15 Concatenated database searches are preferable to separate searches Separate searching overestimates FP rate Separate search cannot estimate correct identifications When decoy hit outnumber target hit(0.8 – 2.3) Target and decoy sequences compete Making it possible to estimate the distribution of low-scoring correct identifications(0.8 – 2.3) 15

16 Concatenated database searches are preferable to separate searches Direct comparison of FP rates Separate database searches can overestimate FP rates by > 35% relative to concatenated searches 16

17 Estimating theoretical error of target-decoy false positive rates 17 One criticism of the target-decoy approach is that one can never know exactly which or how many selected PSMs are incorrect. expect these estimations substantially deviate from the actual number of FPs when the number of returned hits is very small or the number of returned decoy hits is very large.

18 Estimating theoretical error of target-decoy false positive rates Based on these findings, it was possible to place confidence intervals on target- decoy estimations given the number of total hits returned and the estimated precision rate derived from the decoy hits FP rate 를 시뮬레이션 하기 위해 작성 The program randomly assigned each of the remaining incorrect hits a ’target’ or ‘decoy’ state Larger standard deviation indicates less reliable precision rate estimations. 18

19 Estimating theoretical error of target-decoy false positive rates 앞의 그림을 로그변환 시킨 그래프 19

20 Estimating theoretical error of target-decoy false positive rates The relationship between the slopes and precision The slopes of these lines are related to the underlying precision rate 20

21 Estimating theoretical error of target-decoy false positive rates The relationship between the slopes and precision suggest that the expected standard deviation of a precision rate estimation can be calculated from the precision rate and sample size 21

22 Estimating theoretical error of target-decoy false positive rates The relationship between the standard deviation(σ) of error and the sample size(N) 22

23 Estimating theoretical error of target-decoy false positive rates The expected standard deviation of error 그러므로 error 에 대한 예상표준편차는 다음의 식을 이용해 나타낼 수 있다. 23

24 Alternate decoy database constructions can be similarly effective Protein sequence reversal Modified sequence reversal method Two stochastic method Random Markov chain model 24

25 Alternate decoy database constructions can be similarly effective Both stochastic databases produced more peptides Constrained to have similar amino acid compositions as target database 25

26 Alternate decoy database constructions can be similarly effective Incorrect identification were equally distributed Both stochastic methods performed essentially identically to one another The distribution target and decoy sequences being incorrectly matched Not desired 50%, but decidedly skewed(63%) For estimating FP identification, use factor 1.6(≈1/0.63) 26


Download ppt "ISA 2013. 05. 28 Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet."

Similar presentations


Ads by Google