Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.

Similar presentations


Presentation on theme: "Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications."— Presentation transcript:

1 Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications Protein complexes Cross-linking The Global Proteome Machine Database

2 MS MS/MS Biological System Samples Information about each sample Information about the biological system Measurements What does the sample contain? How much? Proteomics Informatics Experimental Design Data Analysis Information Integration Sample Preparation What does the sample contain? How much?

3 Biological System Information about each sample Information about the biological system What does the sample contain? How much? Sample Preparation Experimental Design Data Analysis Information Integration MS MS/MS Samples Measurements Sample Preparation What does the sample contain? How much? Enrichment Separation etc Digestion Top down Bottom up PeptidesProteins Fragmentation Fragments

4 Top down / bottom up Top down Bottom up mass/charge intensity

5 Top down Bottom up Charge distribution mass/charge intensity mass/charge intensity 1+ 2+ 3+ 4+ 27+ 31+

6 Top down Bottom up Isotope distribution mass/charge intensity mass/charge intensity

7 Fragmentation Top downBottom up Fragmentation

8 Correlations between modifications Top down Bottom up

9 Alternative Splicing Top down Bottom up Exon 123

10 Top down Kellie et al., Molecular BioSystems 2010 Protein mass spectra Fragment mass spectra

11 Non-Covalent Protein Complexes Schreiber et al., Nature 2011

12 Dynamic Range in Proteomics Large discrepancy between the experimental dynamic range and the range of amounts of different proteins in a proteome Experimental Dynamic Range Distribution of Protein Amounts Log (Protein Amount) Number of Proteins The goal is to identify and characterize all components of a proteome Desired Dynamic Range

13 Experimental Designs Simulated

14 Parameters in Simulation ● Distribution of protein amounts in sample ● Loss of peptides before binding to the column ● Loss of peptides after elution off the column ● Distribution of mass spectrometric response for different peptides present at the same amount ● Total amount of peptides that are loaded on column (limited by column loading capacity) ● # of peptide fractions ● # of Proteins in each fraction ● Total amount of peptides that are loaded on column (limited by column loading capacity) ● # of peptide fractions ● Dynamic range of mass spectrometer ● Detection limit of mass spectrometer

15 Simulation Results for 1D-LC-MS Complex Mixtures of Proteins RPC Digestion MS Analysis No Protein Separation Protein Separation: 10 fractions Protein Separation: 10 fractions No Protein Separation Tissue Body Fluid

16 Success Rate of a Proteomics Experiment DEFINITION: The success rate of a proteomics experiment is defined as the number of proteins detected divided by the total number of proteins in the proteome. Log (Protein Amount) Number of Proteins Proteins Detected Distribution of Protein Amounts

17 Relative Dynamic Range of a Proteomics Experiment DEFINITION: RELATIVE DYNAMIC RANGE, RDR x, where x is e.g. 10%, 50%, or 90% Log (Protein Amount) RDR 90 RDR 50 RDR 10 Fraction of Proteins Detected Number of Proteins Proteins Detected Distribution of Protein Amounts

18 Repeat Analysis 1 Analysis2 Analyses3 Analyses4 Analyses5 Analyses6 Analyses7 Analyses8 Analyses

19 Repeat Analysis: Comparison of Simulations and Experiments

20 Number of Proteins in Mixture TissueBody Fluid 112 RDR 50 Success Rate Tissue Body Fluid 1 1 Tissue 2 2 2

21 Amount loaded and peptide separation 1. Protein separation 2. Amount loaded 3. Peptide separation Order: Tissue Protein separation Tissue Protein separation Amount loaded Tissue Protein separation Peptide separation Amount loaded 1. Protein separation 2. Peptide separation 3. Amount loaded Protein separation Tissue Protein separation Peptide separation Tissue Protein separation Amount loaded Peptide separation Protein separation Amount loaded Peptide separation Ranges: Protein separation: 30000 – 3000 proteins in each fraction Amount loaded: 0.1 ug – 10 ug Peptide separation: 100 – 1000 fractions

22 Phosphopeptide identification m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation Localization of modifications

23 Localization (d min =3) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation d min >=3 for 47% of human tryptic peptides Localization of modifications

24 Localization (d min =2) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation d min =2 for 33% of human tryptic peptides Localization of modifications

25 Localization (d min =1) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation d min =1 for 20% of human tryptic peptides Localization of modifications

26 Localization (d=1*) m precursor = 2000 Da  m precursor = 1 Da  m fragment = 0.5 Da Phosphorylation Localization of modifications

27 Peptide with two possible modification sites Localization of modifications

28 Peptide with two possible modification sites MS/MS spectrum m/z Intensity Localization of modifications

29 Peptide with two possible modification sites MS/MS spectrum m/z Intensity Matching Localization of modifications

30 Peptide with two possible modification sites MS/MS spectrum m/z Intensity Matching Which assignment does the data support? 1, 1 or 2, or 1 and 2? Localization of modifications

31 AAYYQK Visualization of evidence for localization AAYYQK

32 Visualization of evidence for localization

33 3 2 1 3 2 1

34 Estimation of global false localization rate using decoy sites By counting how many times the phosphorylation is localized to amino acids that can not be phosphorylated we can estimate the false localization rate as a function of amino acid frequency. Amino acid frequency False localization frequency Y

35 How much can we trust a single localization assignment? If we can generate the distribution of scores for assignment 1 when 2 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

36 Is it a mixture or not? If we can generate the distribution of scores for assignment 2 when 1 is the correct assignment, it is possible to estimate the probability of obtaining a certain score by chance for a given peptide sequence and MS/MS spectrum assignment.

37 1 and 2 1 1 or 2 Ø Localization of modifications

38 Protein Complexes A B A C D Digestion Mass spectrometry

39 Tackett et al. JPR 2005 Protein Complexes – specific/non-specific binding

40 Sowa et al., Cell 2009 Protein Complexes – specific/non-specific binding

41 Choi et al., Nature Methods 2010

42 Analysis of Non-Covalent Protein Complexes Taverner et al., Acc Chem Res 2008

43 Determining the architectures of macromolecular assemblies Alber et al., Nature 2007

44 M/Z Peptides Fragments Fragmentation Proteolytic Peptides Enzymatic Digestion Protein Complex Chemical Cross-Linking MS MS/MS Isolation Cross-Linked Protein Complex Interaction Partners by Chemical Cross-Linking

45 M/Z Peptides Fragments Fragmentation Proteolytic Peptides Enzymatic Digestion Protein Complex Chemical Cross-Linking MS MS/MS Isolation Cross-Linked Protein Complex Interaction Sites by Chemical Cross-Linking

46 Cross-linking protein n peptides with reactive groups (n-1)n/2 potential ways to cross-link peptides pairwise + many additional uninformative forms Protein A + IgG heavy chain 990 possible peptide pairs Yeast NPC ˜ 10 6 possible peptide pairs

47 Cross-linking Mass spectrometers have a limited dynamic range and it therefore important to limit the number of possible reactions not to dilute the cross-linked peptides. For identification of a cross-linked peptide pair, both peptides have to be sufficiently long and required to give informative fragmentation. High mass accuracy MS/MS is recommended because the spectrum will be a mixture of fragment ions from two peptides. Because the cross-linked peptides are often large, CAD is not ideal, but instead ETD is recommended.

48 Search Results

49

50

51 GPMDB

52 Year (as of Jan 1 st ) Assigned spectra Sequence-spectrum assignments in GPMDB

53 Human Genes Observed in GPMDB

54 Proteotypic peptide relative composition

55 Comparison with GPMDB Most proteins show very reproducible peptide patterns

56 Comparison with GPMDB

57 Global frequency of observing a peptide Peptide SequenceObservations FSTVAGESGSADTVR2633 FNTANDDNVTQVR2432 AFYVNVLNEEQR1722 LVNANGEAVYCK1701 GPLLVQDVVFTDEMAHFDR1637 LSQEDPDYGIR1560 LFAYPDTHR1499 NLSVEDAAR1400 FYTEDGNWDLVGNNTPIFFIR1386 ADVLTTGAGNPVGDK1338

58 If the number of times a peptide sequence (i) has been observed is n i, then for a particular protein: Global frequency of observing a peptide

59 Define a normalized global frequency of observation for a particular peptide sequence from a particular protein as: Global frequency of observing a peptide (ω)

60 Peptide Sequenceω FSTVAGESGSADTVR0.08 FNTANDDNVTQVR0.07 AFYVNVLNEEQR0.05 LVNANGEAVYCK0.05 GPLLVQDVVFTDEMAHFDR0.05 LSQEDPDYGIR0.04 LFAYPDTHR0.04 NLSVEDAAR0.04 FYTEDGNWDLVGNNTPIFFIR0.04 ADVLTTGAGNPVGDK0.04 Global frequency of observation (ω), catalase

61 ω Peptide sequences Global frequency of observation (ω), catalase

62 For any set peptides observed in an experiment assigned to a particular protein (1 to j ): Omega (Ω) value for a protein identification

63 Protein IDΩ (z=2)Ω (z=3) SERPINB10.880.82 SNRPD10.880.59 CFL10.810.87 SNRPE0.80.81 PPIA0.790.64 CSTA0.790.36 PFN10.760.61 CAT0.710.78 GLRX0.660.8 CALM10.620.76 FABP50.570.17 Protein Ω’s for a set of identifications

64 Part of Best Practices Integrative Informatics Consultation Service (BPIC) at the NYU Center for Health Informatics and Bioinformatics (CHIBI) Contact InformaticsConsultation@nyumc.org or David.Fenyo@nyumc.org Walk-in Clinic: Wednesday, February 23, 3-5 pm 227 E 30th Street, 7th Floor, Room #739 Proteomics Consultation

65 Proteomics Informatics Workshop Part III: Protein Quantitation February 25, 2011 Metabolic labeling – SILAC Chemical labeling Label-free quantitation Spectrum counting Stoichiometry Protein processing and degradation Biomarker discovery and verification

66 Proteomics Informatics Workshop Part I: Protein Identification, February 4, 2011 Part II: Protein Characterization, February 18, 2011 Part III: Protein Quantitation, February 25, 2011


Download ppt "Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications."

Similar presentations


Ads by Google