Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu.

Similar presentations


Presentation on theme: "Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu."— Presentation transcript:

1 Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu

2 The “Scoring” Problem Proteoforms are hypotheses about what was in MS. The model “knows” the process. Output is a ranked list of hypotheses. Science builds on prior knowledge. 1 2 3 4 2 3 1 4 Competing Hypotheses Ranked list of hypotheses, With measure of confidence, Under a given model Process Model

3 ‘P score’ = P f,n = (xf) n x e -xf n! F. Meng, B. Cargile, L. Miller, J. Johnson, and N. Kelleher, Nat. Biotechnol., 2001, 19, 952-957. f is the number of matching fragment ions, n is the # of matches, M a is the Mass Accuracy Meng-Kelleher p-score

4

5 Specific Example

6 Bayesian Approach Prior Probability of the Proteoform Likelihood of the Proteoform given the observed data Probability of the data Posterior probability of the proteoform after making the observations

7 The Scoring Model From Bayes Theorem we have: From independence we can: Which gives our final scoring function:

8 MS1 Generative Model Given a certain theoretic proteoform, what is the probability of seeing the observed precursor mass? Likelihood Fun Facts n Area does not equal one. n Need some level for “wrong precursor mass”

9 Probability Fragment Mass 0I wiwi Noise = k mimi MS2 Generative Model

10

11 Lambda Scores Assume that prior to scoring, each sequence had an equal probability of being the correct sequence. This means that if we are considering k sequences, then our prior probability is just: So then, the ratio of the posterior over the prior is:

12 Lambda Spread The lambda score spreads hits with the same number of matching fragment ions.

13 Room for Improvement Initial VersionI to Max of all proteoforms Theoretical Mass One set of real observations scored against 890,000 random “theoretical” proteoforms.

14 Scoring Models Compared Ahlf, D.R., Compton, P.D., Tran, J.C., Early, B.P., Thomas, P.M., Kelleher, N.L. “Evaluation of the Compact High-Field Orbitrap for Top-Down Proteomics of Human Cells”, J. Proteome Res., 2012, 11, 4308-4314. PMCID: PMC3437942.

15 Future Directions n Add oxidation for MS1. n Improve modeling of various processes. n Incorporate into a search engine.

16 Conclusions n Include prior knowledge: Science builds on itself. n There is a system that gives a framework for including prior knowledge in models. n This particular implementation is better than older scoring systems, and it can improve!

17 Acknowledgements and Questions n Kelleher group for providing the data. n All my many colleagues who I have worked with on this project over the years. n Of course all the related funding agencies, but specifically NSF ABI-1062432.

18 Acknowledgements & disclaimer n This material is based upon work supported by the National Science Foundation under Grants No. ABI-1062432 n This work was supported in part by the Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute n Any opinions presented here are those of the presenter(s) and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies

19 License terms n Please cite as: LeDuc, R.D., Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments, presented at ASMS 2013 Minneapolis MN June 2013. n Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. n Except where otherwise noted, contents of this presentation are copyright 2013 by the Trustees of Indiana University. n This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.http://creativecommons.org/licenses/by/3.0/


Download ppt "Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu."

Similar presentations


Ads by Google