Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Protein Quantitation II: Multiple Reaction Monitoring
Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Proteomics Informatics – Protein characterization: post-translational modifications and protein-protein interactions (Week 10)
Proteomics Informatics – Databases, data repositories and standardization (Week 6)
Protein Sequencing and Identification by Mass Spectrometry.
Proteomics The proteome is larger than the genome due to alternative splicing and protein modification. As we have said before we need to know All protein-protein.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
HOW MASS SPECTROMETRY CAN IMPROVE YOUR RESEARCH
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Proteome.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
A highly abbreviated introduction to proteomics
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
Center for Human Health and the Environment
Proteomics Informatics – Protein Characterization II: Protein Interactions (Week 11)
Common parameters At the beginning one need to set up the parameters.
Proteomics Informatics – Databases, data repositories and standardization (Week 7)
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
1 CH908 Structural Analysis by Mass Spectrometry revision lecture. Prof. Peter O’Connor.
1 I. Introduction 1.Definition: Protein Characterization/Proteomics i.Classical Proteomics ii.Functional Proteomics 2.Mass spectrometery I.Advantages in.
Isotope Labeled Internal Standards in Skyline
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
ISOMATCH-web For automatic matching of isotope peak distributions ■ Automatic matching of a raw spectrum (ASCII format) to theoretical isotopic distributions.
Oct 2011 SDMBT1 Lecture 11 Some quantitation methods with LC-MS a.ICAT b.iTRAQ c.Proteolytic 18 O labelling d.SILAC e.AQUA f.Label Free quantitation.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
What is proteomics? Richard Mbasu and Ben Richards.
Protein quantitation I: Overview (Week 5). Fractionation Digestion LC-MS Lysis MS Sample i Protein j Peptide k Proteomic Bioinformatics – Quantitation.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Ho-Tak Lau, Hyong Won Suh, Martin Golkowski, and Shao-En Ong
RANIA MOHAMED EL-SHARKAWY Lecturer of clinical chemistry Medical Research Institute, Alexandria University MEDICAL RESEARCH INSTITUTE– ALEXANDRIA UNIVERSITY.
Protein/Peptide Quantification
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
A perspective on proteomics in cell biology
Proteomics Informatics –
Proteomics Informatics –
A, high resolution MS/MS spectrum (lower panel) of 1435
Example of MS/MS spectrum of peptide FPTLTGFNR (hypothetical protein with signal peptide EAK88888; N77) from a protein digestion mixture prepared by labeling.
Shotgun Proteomics in Neuroscience
Tryptic phosphopeptides of AdIGFBP-5, [γ-32P]ATP-labeled in vitro by phosphorylation with CK2, were separated by HPLC and detected and sequenced by mass.
Tryptic phosphopeptides of 32P-labeled IGFBP-5 from T47D cells separated by HPLC and sequenced by tandem MS.a, tandem MS spectrum of the triply charged.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Protein Identification David Fenyő
Presentation transcript:

Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications Protein complexes Cross-linking The Global Proteome Machine Database

MS MS/MS Biological System Samples Information about each sample Information about the biological system Measurements What does the sample contain? How much? Proteomics Informatics Experimental Design Data Analysis Information Integration Sample Preparation What does the sample contain? How much?

Biological System Information about each sample Information about the biological system What does the sample contain? How much? Sample Preparation Experimental Design Data Analysis Information Integration MS MS/MS Samples Measurements Sample Preparation What does the sample contain? How much? Enrichment Separation etc Digestion Top down Bottom up PeptidesProteins Fragmentation Fragments

Top down / bottom up Top down Bottom up mass/charge intensity

Top down Bottom up Charge distribution mass/charge intensity mass/charge intensity

Top down Bottom up Isotope distribution mass/charge intensity mass/charge intensity

Fragmentation Top downBottom up Fragmentation

Correlations between modifications Top down Bottom up

Alternative Splicing Top down Bottom up Exon 123

Top down Kellie et al., Molecular BioSystems 2010 Protein mass spectra Fragment mass spectra

Non-Covalent Protein Complexes Schreiber et al., Nature 2011

Dynamic Range in Proteomics Large discrepancy between the experimental dynamic range and the range of amounts of different proteins in a proteome Experimental Dynamic Range Distribution of Protein Amounts Log (Protein Amount) Number of Proteins The goal is to identify and characterize all components of a proteome Desired Dynamic Range

Experimental Designs Simulated

Parameters in Simulation ● Distribution of protein amounts in sample ● Loss of peptides before binding to the column ● Loss of peptides after elution off the column ● Distribution of mass spectrometric response for different peptides present at the same amount ● Total amount of peptides that are loaded on column (limited by column loading capacity) ● # of peptide fractions ● # of Proteins in each fraction ● Total amount of peptides that are loaded on column (limited by column loading capacity) ● # of peptide fractions ● Dynamic range of mass spectrometer ● Detection limit of mass spectrometer

Simulation Results for 1D-LC-MS Complex Mixtures of Proteins RPC Digestion MS Analysis No Protein Separation Protein Separation: 10 fractions Protein Separation: 10 fractions No Protein Separation Tissue Body Fluid

Success Rate of a Proteomics Experiment DEFINITION: The success rate of a proteomics experiment is defined as the number of proteins detected divided by the total number of proteins in the proteome. Log (Protein Amount) Number of Proteins Proteins Detected Distribution of Protein Amounts

Relative Dynamic Range of a Proteomics Experiment DEFINITION: RELATIVE DYNAMIC RANGE, RDR x, where x is e.g. 10%, 50%, or 90% Log (Protein Amount) RDR 90 RDR 50 RDR 10 Fraction of Proteins Detected Number of Proteins Proteins Detected Distribution of Protein Amounts

Repeat Analysis 1 Analysis2 Analyses3 Analyses4 Analyses5 Analyses6 Analyses7 Analyses8 Analyses

Repeat Analysis: Comparison of Simulations and Experiments

Number of Proteins in Mixture TissueBody Fluid 112 RDR 50 Success Rate Tissue Body Fluid 1 1 Tissue 2 2 2

Amount loaded and peptide separation 1. Protein separation 2. Amount loaded 3. Peptide separation Order: Tissue Protein separation Tissue Protein separation Amount loaded Tissue Protein separation Peptide separation Amount loaded 1. Protein separation 2. Peptide separation 3. Amount loaded Protein separation Tissue Protein separation Peptide separation Tissue Protein separation Amount loaded Peptide separation Protein separation Amount loaded Peptide separation Ranges: Protein separation: – 3000 proteins in each fraction Amount loaded: 0.1 ug – 10 ug Peptide separation: 100 – 1000 fractions

Phosphopeptide identification m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation Localization of modifications

Localization (d min =3) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation d min >=3 for 47% of human tryptic peptides Localization of modifications

Localization (d min =2) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation d min =2 for 33% of human tryptic peptides Localization of modifications

Localization (d min =1) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation d min =1 for 20% of human tryptic peptides Localization of modifications

Localization (d=1*) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation Localization of modifications

Peptide with two possible modification sites Localization of modifications

Peptide with two possible modification sites MS/MS spectrum m/z Intensity Localization of modifications

Peptide with two possible modification sites MS/MS spectrum m/z Intensity Matching Localization of modifications

Peptide with two possible modification sites MS/MS spectrum m/z Intensity Matching Which assignment does the data support? 1, 1 or 2, or 1 and 2? Localization of modifications

AAYYQK Visualization of evidence for localization AAYYQK

Visualization of evidence for localization

Estimation of global false localization rate using decoy sites By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency. Amino acid frequency False localization frequency Y

How much can we trust a single localization assignment? If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

Is it a mixture or not? If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

1 and or 2 Ø Localization of modifications

Protein Complexes A B A C D Digestion Mass spectrometry

Tackett et al. JPR 2005 Protein Complexes – specific/non-specific binding

Sowa et al., Cell 2009 Protein Complexes – specific/non-specific binding

Choi et al., Nature Methods 2010

Analysis of Non-Covalent Protein Complexes Taverner et al., Acc Chem Res 2008

Determining the architectures of macromolecular assemblies Alber et al., Nature 2007

M/Z Peptides Fragments Fragmentation Proteolytic Peptides Enzymatic Digestion Protein Complex Chemical Cross-Linking MS MS/MS Isolation Cross-Linked Protein Complex Interaction Partners by Chemical Cross-Linking

M/Z Peptides Fragments Fragmentation Proteolytic Peptides Enzymatic Digestion Protein Complex Chemical Cross-Linking MS MS/MS Isolation Cross-Linked Protein Complex Interaction Sites by Chemical Cross-Linking

Cross-linking protein n peptides with reactive groups (n-1)n/2 potential ways to cross-link peptides pairwise + many additional uninformative forms Protein A + IgG heavy chain 990 possible peptide pairs Yeast NPC ˜ 10 6 possible peptide pairs

Cross-linking Mass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides. For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation. High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides. Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.

Search Results

GPMDB

Year (as of Jan 1 st ) Assigned spectra Sequence-spectrum assignments in GPMDB

Human Genes Observed in GPMDB

Proteotypic peptide relative composition

Comparison with GPMDB Most proteins show very reproducible peptide patterns

Comparison with GPMDB

Global frequency of observing a peptide Peptide SequenceObservations FSTVAGESGSADTVR2633 FNTANDDNVTQVR2432 AFYVNVLNEEQR1722 LVNANGEAVYCK1701 GPLLVQDVVFTDEMAHFDR1637 LSQEDPDYGIR1560 LFAYPDTHR1499 NLSVEDAAR1400 FYTEDGNWDLVGNNTPIFFIR1386 ADVLTTGAGNPVGDK1338

If the number of times a peptide sequence (i) has been observed is n i, then for a particular protein: Global frequency of observing a peptide

Define a normalized global frequency of observation for a particular peptide sequence from a particular protein as: Global frequency of observing a peptide (ω)

Peptide Sequenceω FSTVAGESGSADTVR0.08 FNTANDDNVTQVR0.07 AFYVNVLNEEQR0.05 LVNANGEAVYCK0.05 GPLLVQDVVFTDEMAHFDR0.05 LSQEDPDYGIR0.04 LFAYPDTHR0.04 NLSVEDAAR0.04 FYTEDGNWDLVGNNTPIFFIR0.04 ADVLTTGAGNPVGDK0.04 Global frequency of observation (ω), catalase

ω Peptide sequences Global frequency of observation (ω), catalase

For any set peptides observed in an experiment assigned to a particular protein (1 to j ): Omega (Ω) value for a protein identification

Protein IDΩ (z=2)Ω (z=3) SERPINB SNRPD CFL SNRPE PPIA CSTA PFN CAT GLRX CALM FABP Protein Ω’s for a set of identifications

Part of Best Practices Integrative Informatics Consultation Service (BPIC) at the NYU Center for Health Informatics and Bioinformatics (CHIBI) Contact or Walk-in Clinic: Wednesday, February 23, 3-5 pm 227 E 30th Street, 7th Floor, Room #739 Proteomics Consultation

Proteomics Informatics Workshop Part III: Protein Quantitation February 25, 2011 Metabolic labeling – SILAC Chemical labeling Label-free quantitation Spectrum counting Stoichiometry Protein processing and degradation Biomarker discovery and verification

Proteomics Informatics Workshop Part I: Protein Identification, February 4, 2011 Part II: Protein Characterization, February 18, 2011 Part III: Protein Quantitation, February 25, 2011