Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc National Center for Genome Analysis Support Discovery.

Similar presentations


Presentation on theme: "Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc National Center for Genome Analysis Support Discovery."— Presentation transcript:

1 Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc National Center for Genome Analysis Support Discovery Omics with Top Down Proteomics

2 Acknowledgements Leonid Zamdborg Shannee Babai Bryon Early Ian Spauling Kevin Glowacz Eric Bluhm Vinayak Viswanathan Yong-Bin Kim Ryan Fellers Tom Januszyk Brian Cis Chris Strouse Seyoung Sohn Greg Taylor Joe Sola Lee Bynum Andrew Birck All the other numerous members of the KRG who have contributed insights over the years. Drs. Neil Kelleher, Paul Thomas, and Andy Forbes, and ProSight Development Team (past and present) Yury Bukhman, James McCurdy, Adam Halstead, Irene Ong (Area 3), Mary Lipton (PNNL), Kathryn Richmond (Enabling Technologies) and others Proteomics Core Reid Townsend Petra Gilmore Cheryl Lichti James Malone Alan Davis Michael Gross (NCRR Mass Spec) Henry Rohrs (NCRR Mass Spec) Ron Bose (Oncology) Mike Boyne (FDA) Jeffry Hiken (Genetics) Le-Shin Wu, Carrie Ganote, Tom Doak, Bill Barnett, and a cast of thousands Limbrick Laboratory David Limbrick Diego Morales Holtzman Laboratory David Holtzman Rick Perrin Jacqueline Payton Chengjie Xiong (Biostatistics) National Center for Genome Analysis Support Washington Univ. School of Medicine The Kelleher Research Group

3 Differential Omics Studies 1.RNA-seq, Bottom-up proteomics, metabolomics 2.Looking for a list of discovered entities that have different expression levels between treatments 3.Very popular for target discovery 4.Frequently done on organisms before a genome is completed

4 ‘P score’ = P f,n = (xf) n x e -xf n! f is the # of input fragment ions, n is the # of matches, M a is the Mass Accuracy F. Meng, B. Cargile, L. Miller, J. Johnson, and N. Kelleher, Nat. Biotechnol., 2001, 19, 952-957. “Kelleher P-Score” Example

5 Modeling the Scrambled P-Scores MotivationGoodness of Fit 9,839 MS/MS Queries (MS1 and MS2 data)

6 Better is better, but the easy ones are easy

7 Computers Ask the Darndest Questions

8 Top Down Proteomics! Three pillars of proteomics: Identification Characterization Quantification. Top down proteomic studies are underway. These are large and complex studies (At several institutions, a typical production bottom-up study would have 200+ LC runs)

9 Top Down Proteomics Biometrics Sources of 1.Intensity calculation 2.LC alignment 3.Mass Spec Physics 4.Separation Different fractions etc. 5.Protein Isolation ChIP, RBC ghosts etc 6.Tissue variation 7.Individual variation 8.Population variation 9.Random and systemic errors

10 Experimental Design Ronald A. Fisher (1926) : "The Arrangement of Field Experiments“ All measurements have errors All biological systems have individual variation The goal of experimental design is to design the experiment so that the variation can be partitioned Typically testing variation between groups against the variation within Healthy Group Sub 1Sub 2 R1R2R3R4R5R7R6R8 Diseased Group Sub 1Sub 2 R1R2R3R4R5R7R6R8

11 Control SamplesPNH Samples 1 0 -2 -3 3 2 1642810121113357 9 RAP1A Coomassie Catalase Peroxiredoxin 250 150 100 75 50 37 20 25 20 25 100 75 50 CON_1CON_6 PNH_9 PNH_10 Typical Results: Human RBC Ghosts Control Samples PNH Samples RAP1A

12 Populations of Experiments Instead of doing 1 experiment, you are doing an unknown number of experiments Number of experiments determined by how many unique entities are observed consistently over the entire set of observations Control Samples PNH Samples

13 Typical Results: Breast Cancer Model

14 Sources of Variation: The Model Where i=1 or 2 and represents the two preparations, j = 1 to 3 for each digestion within a given preparation, k = 1 to 3 for each injection (or run) within each digestion l = 1 to the number of peptides for the given protein. Under this model, let

15 Variance Component Estimates

16 Power Calculations Human Subjects n=20 n=5 Inbreed Mice

17 Systems Analysis What to do with the laundry lists of significant genes? Gene Ontology Analysis Gene Set Enrichment Analysis Often paired with RNA or metabolomic data. Creates a third level of analysis

18 To Review Everything is in place for top-down proteomic studies. In any discovery omic study, extreme care must be taken – lots of pilot work to understand the behavior of your analytic system Technology and mathematical formalism does not trump biology. (Bad experimental design results in bad experiments)

19 Funded by National Science Foundation 1.Large memory clusters for assembly 2.Bioinformatics consulting for biologists 3.Optimized software for better efficiency Partner Institutions: Extreme Science and Engineering Discovery Environment (XSEDE) Texas Advanced Computing Center (TACC) at the University of Texas at Austin San Diego Supercomputer Center (SDSC) at the University of California, San Diego. Pittsburgh Supercomputing Center (PSC) Open for business at: http://ncgas.orghttp://ncgas.org Shameless NCGAS Plug Questions?

20 Acknowledgements & disclaimer This material is based upon work supported in part by the National Science Foundation under Grants No. ABI-1062432 This work was supported in part by the Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute Any opinions presented here are those of the presenter(s) and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies

21 License terms Please cite as: LeDuc, R.D., Statistical Consideration for Identification and Quantification in Top- Down Proteomics, presented at the 2013 Sanibel Conference on Top Down Mass Spectrometry, St. Petersburg FL, 1/27/2013 Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, contents of this presentation are copyright 2011 by the Trustees of Indiana University. This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.http://creativecommons.org/licenses/by/3.0/


Download ppt "Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc National Center for Genome Analysis Support Discovery."

Similar presentations


Ads by Google