Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc National Center for Genome Analysis Support Discovery.

Slides:



Advertisements
Similar presentations
April 19, 2015 CASC Meeting 7 Sep 2011 Campus Bridging Presentation.
Advertisements

Win8 on Intel Programming Course Win8 and Intel Paul Guermonprez Intel Software
Bill Barnett, Bob Flynn & Anurag Shankar Pervasive Technology Institute and University Information Technology Services, Indiana University CASC. September.
Data Gateways for Scientific Communities Birds of a Feather (BoF) Tuesday, June 10, 2008 Craig Stewart (Indiana University) Chris Jordan.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
1 Supplemental line if need be (example: Supported by the National Science Foundation) Delete if not needed. Supporting Polar Research with National Cyberinfrastructure.
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
NSF Vision and Strategy for Advanced Computational Infrastructure Vision: NSF Leadership in creating and deploying a comprehensive portfolio…to facilitate.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Jetstream Overview.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI Prepared for the.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Rockhopper: Penguin on Demand at Indiana.
Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI
Statewide IT Conference, Bloomington IN (October 7 th, 2014) The National Center for Genome Analysis Support, IU and You! Carrie Ganote (Bioinformatics.
Win8 on Intel Programming Course The challenge Paul Guermonprez Intel Software
Next Generation Cyberinfrastructures for Next Generation Sequencing and Genome Science AAMC 2013 Information Technology in Academic Medicine Conference.
Craig Stewart 23 July 2009 Cyberinfrastructure in research, education, and workforce development.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Using the Purdue DB Technology to build.
A Pervasive Technology Institute Center What is The National Center for Genome Analysis Support? NCGAS is a national center dedicated to providing scientists.
Goodbye from Indianapolis, IUPUI, and Craig A. Stewart Executive Director, Pervasive Technology Institute Associate Dean, Research Technologies Indiana.
Big Red II & Supporting Infrastructure Craig A. Stewart, Matthew R. Link, David Y Hancock Presented at IUPUI Faculty Council Information Technology Subcommittee.
I-Light: A Network for Collaboration between Indiana University and Purdue University Craig Stewart Associate Vice President Gary Bertoline Associate Vice.
Genomics, Transcriptomics, and Proteomics: Engaging Biologists Richard LeDuc Manager, NCGAS eScience, Chicago 10/8/2012.
Center for Human Health and the Environment
The National Center for Genome Analysis Support as a Model Virtual Resource for Biologists Internet2 Network Infrastructure for the Life Sciences Focused.
GSAT501 - proteomics Name, home-town Students – previous lab experience –Lab you hope to end up in? Teachers – what is your current project.
Leveraging the National Cyberinfrastructure for Top Down Mass Spectrometry Richard LeDuc.
XSEDE12 Closing Remarks Craig Stewart XSEDE12 General Chair Executive Director, Indiana University Pervasive Technology Institute.
September 6, 2013 A HUBzero Extension for Automated Tagging Jim Mullen Advanced Biomedical IT Core Indiana University.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. The IQ-Table & Collection Viewer A.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
1 BioGrids in the US: Current status and future opportunities Craig A. Stewart 15 April 2004 Director, Research and Academic Computing Director,
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream - A self-provisioned, scalable science and.
July 18, 2012 Campus Bridging Security Challenges from “Panel: Security for Science Gateways and Campus Bridging”
Pti.iu.edu /jetstream Award # funded by the National Science Foundation Award #ACI Jetstream Overview – XSEDE ’15 Panel - New and emerging.
INDIANAUNIVERSITYINDIANAUNIVERSITY 1 Parallel implementation and performance of fastDNAml - a program for maximum likelihood phylogenetic inference Craig.
Using Prior Knowledge to Improve Scoring in High-Throughput Top-Down Proteomics Experiments Rich LeDuc Le-Shin Wu.
INDIANAUNIVERSITYINDIANAUNIVERSITY Spring 2000 Indiana University Information Technology University Information Technology Services Please cite as: Stewart,
November 18, 2015 Quarterly Meeting 30Aug2011 – 1Sep2011 Campus Bridging Presentation.
February 27, 2007 University Information Technology Services Research Computing Craig A. Stewart Associate Vice President, Research Computing Chief Operating.
Win8 on Intel Programming Course Paul Guermonprez Intel Software
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Pti.iu.edu /jetstream Award # A national science & engineering cloud funded by the National Science Foundation Award #ACI
Recent key achievements in research computing at IU Craig Stewart Associate Vice President, Research & Academic Computing Chief Operating Officer, Pervasive.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Update on EAGER: Best Practices and.
Award # funded by the National Science Foundation Award #ACI Jetstream: A Distributed Cloud Infrastructure for.
Jetstream: A new national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration.
A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Pti.iu.edu/sc14 The National Center for Genome Analysis Support Supercomputing 2014 November 17-21, 2014.
Craig Stewart ORCID ID Jetstream Principal Investigator Executive Director, Indiana University Pervasive Technology Institute Presented.
1 A national science & engineering cloud funded by the National Science Foundation Award #ACI Craig Stewart ORCID ID Jetstream.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
NCGAS provides A specific goal is to provide dedicated access to memory rich supercomputers customized for genomics studies, including Mason and other.
© Trustees of Indiana University Released under Creative Commons 3.0 unported license; license terms on last slide. Informatics Tools at the Indiana CTSI.
Numerical Methods Multidimensional Gradient Methods in Optimization- Example
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Jetstream Overview Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor,
1 Campus Bridging: What is it and why is it important? Barbara Hallock – Senior Systems Analyst, Campus Bridging and Research Infrastructure.
Jetstream: A national research and education cloud Jeremy Fischer ORCID Senior Technical Advisor, Collaboration and.
Research & Academic Computing Indiana University Statewide IT Conference 11 September 2003 Indianapolis IN.
Matt Link Associate Vice President (Acting) Director, Systems
Methodology Overview 2 basics in user studies Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Notice: some material in this.
Elliptic Partial Differential Equations – Direct Method
Sampling Distribution
Sampling Distribution
Proteomics Informatics David Fenyő
Presentation transcript:

Statistical Consideration for Identification and Quantification in Top-Down Proteomics Richard LeDuc National Center for Genome Analysis Support Discovery Omics with Top Down Proteomics

Acknowledgements Leonid Zamdborg Shannee Babai Bryon Early Ian Spauling Kevin Glowacz Eric Bluhm Vinayak Viswanathan Yong-Bin Kim Ryan Fellers Tom Januszyk Brian Cis Chris Strouse Seyoung Sohn Greg Taylor Joe Sola Lee Bynum Andrew Birck All the other numerous members of the KRG who have contributed insights over the years. Drs. Neil Kelleher, Paul Thomas, and Andy Forbes, and ProSight Development Team (past and present) Yury Bukhman, James McCurdy, Adam Halstead, Irene Ong (Area 3), Mary Lipton (PNNL), Kathryn Richmond (Enabling Technologies) and others Proteomics Core Reid Townsend Petra Gilmore Cheryl Lichti James Malone Alan Davis Michael Gross (NCRR Mass Spec) Henry Rohrs (NCRR Mass Spec) Ron Bose (Oncology) Mike Boyne (FDA) Jeffry Hiken (Genetics) Le-Shin Wu, Carrie Ganote, Tom Doak, Bill Barnett, and a cast of thousands Limbrick Laboratory David Limbrick Diego Morales Holtzman Laboratory David Holtzman Rick Perrin Jacqueline Payton Chengjie Xiong (Biostatistics) National Center for Genome Analysis Support Washington Univ. School of Medicine The Kelleher Research Group

Differential Omics Studies 1.RNA-seq, Bottom-up proteomics, metabolomics 2.Looking for a list of discovered entities that have different expression levels between treatments 3.Very popular for target discovery 4.Frequently done on organisms before a genome is completed

‘P score’ = P f,n = (xf) n x e -xf n! f is the # of input fragment ions, n is the # of matches, M a is the Mass Accuracy F. Meng, B. Cargile, L. Miller, J. Johnson, and N. Kelleher, Nat. Biotechnol., 2001, 19, “Kelleher P-Score” Example

Modeling the Scrambled P-Scores MotivationGoodness of Fit 9,839 MS/MS Queries (MS1 and MS2 data)

Better is better, but the easy ones are easy

Computers Ask the Darndest Questions

Top Down Proteomics! Three pillars of proteomics: Identification Characterization Quantification. Top down proteomic studies are underway. These are large and complex studies (At several institutions, a typical production bottom-up study would have 200+ LC runs)

Top Down Proteomics Biometrics Sources of 1.Intensity calculation 2.LC alignment 3.Mass Spec Physics 4.Separation Different fractions etc. 5.Protein Isolation ChIP, RBC ghosts etc 6.Tissue variation 7.Individual variation 8.Population variation 9.Random and systemic errors

Experimental Design Ronald A. Fisher (1926) : "The Arrangement of Field Experiments“ All measurements have errors All biological systems have individual variation The goal of experimental design is to design the experiment so that the variation can be partitioned Typically testing variation between groups against the variation within Healthy Group Sub 1Sub 2 R1R2R3R4R5R7R6R8 Diseased Group Sub 1Sub 2 R1R2R3R4R5R7R6R8

Control SamplesPNH Samples RAP1A Coomassie Catalase Peroxiredoxin CON_1CON_6 PNH_9 PNH_10 Typical Results: Human RBC Ghosts Control Samples PNH Samples RAP1A

Populations of Experiments Instead of doing 1 experiment, you are doing an unknown number of experiments Number of experiments determined by how many unique entities are observed consistently over the entire set of observations Control Samples PNH Samples

Typical Results: Breast Cancer Model

Sources of Variation: The Model Where i=1 or 2 and represents the two preparations, j = 1 to 3 for each digestion within a given preparation, k = 1 to 3 for each injection (or run) within each digestion l = 1 to the number of peptides for the given protein. Under this model, let

Variance Component Estimates

Power Calculations Human Subjects n=20 n=5 Inbreed Mice

Systems Analysis What to do with the laundry lists of significant genes? Gene Ontology Analysis Gene Set Enrichment Analysis Often paired with RNA or metabolomic data. Creates a third level of analysis

To Review Everything is in place for top-down proteomic studies. In any discovery omic study, extreme care must be taken – lots of pilot work to understand the behavior of your analytic system Technology and mathematical formalism does not trump biology. (Bad experimental design results in bad experiments)

Funded by National Science Foundation 1.Large memory clusters for assembly 2.Bioinformatics consulting for biologists 3.Optimized software for better efficiency Partner Institutions: Extreme Science and Engineering Discovery Environment (XSEDE) Texas Advanced Computing Center (TACC) at the University of Texas at Austin San Diego Supercomputer Center (SDSC) at the University of California, San Diego. Pittsburgh Supercomputing Center (PSC) Open for business at: Shameless NCGAS Plug Questions?

Acknowledgements & disclaimer This material is based upon work supported in part by the National Science Foundation under Grants No. ABI This work was supported in part by the Lilly Endowment, Inc. and the Indiana University Pervasive Technology Institute Any opinions presented here are those of the presenter(s) and do not necessarily represent the opinions of the National Science Foundation or any other funding agencies

License terms Please cite as: LeDuc, R.D., Statistical Consideration for Identification and Quantification in Top- Down Proteomics, presented at the 2013 Sanibel Conference on Top Down Mass Spectrometry, St. Petersburg FL, 1/27/2013 Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse. Except where otherwise noted, contents of this presentation are copyright 2011 by the Trustees of Indiana University. This document is released under the Creative Commons Attribution 3.0 Unported license ( This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.