Presentation is loading. Please wait.

Presentation is loading. Please wait.

Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014.

Similar presentations


Presentation on theme: "Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014."— Presentation transcript:

1 Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014

2 Aebersold R and Mann M. 2003. Nature 422: 198-207 General MS-based proteomics workflow

3 Principle of protein database search m/z Intensity Database m/z Intensity m/z Intensity Translated Genomic Sequence Theoretical Spectra for Proteins Theoretical spectra that fall into the defined mass range. Each of them is compared to our fragment Ion spectra. 3

4 >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGAR RSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQP ESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLG LALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLT LWTSENQGDEGDAGEGEN >sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARR ASWRIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANT GESKVFYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRL GLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNL TLWTSDMQGDGEEQNKEALQDVEDENQ >sp|P62258-2|1433E_HUMAN Isoform SV of 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE MVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASWRIISSIEQKEENKGGEDKL KMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVFYYKMKGDYHRYLAEFA TGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVFYYEILNSPDRACR LAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGEEQNKEALQDV EDENQ >sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH PE=1 SV=4 MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARR SSWRVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCND FQYESKVFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPI RLGLALNFSVFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRD NLTLWTSDQQDEEAGEGN >tr|F2Z3E5|F2Z3E5_HUMAN Hydroxyacid-oxoacid transhydrogenase, mitochondrial OS=Homo sapiens GN=ADHFE1 PE=4 SV=1 MAAAARARVAYLLRQLQRAACQCPTHSHTYSQDGCFKY >tr|Q5SS58|Q5SS58_HUMAN MHC class I polypeptide-related sequence A OS=Homo sapiens GN=MICA PE=4 SV=2 MGQRDQGLDRERKGPQDDPGSYQGPERRNFLKEDAMKTKTHYHAMHADCLQELRRYL ESGVVLRRTVPPMVNVTRSEASEGNITVTCRASSFYPRNIILTWRQDGVSLSHDTQQ WGDVLPDGNGTYQTWVATRICRGEEQRFTCYMEHSGNHSTHPVPSGKVLVLQSHWQT FHVSAVAAGCCYFCYYYFLCPLL >tr|Q5T409|Q5T409_HUMAN Disrupted in schizophrenia 1 OS=Homo sapiens GN=DISC1 PE=2 SV=1 MPGGGPQGAPAAAGGGGVSHRAGSRDCLPPAACFRRRRLARRPGYMRSSTGPGIGFL SPAVGTLFRFPGGVSGEESHHSESRARQCGLDSRGLLVRSPVSKSAAAPTVTSVRGT SAHFGIQLRGGTRLPDRLSWPCGPGSAGWQQEFAAMDSSETLDASWEAACSDGARRV RAAGSLPSAELSSNSCSPGCGPEVPPTPPGSHSAFTSSFSFIRLSLGSAGERGEAEG CPPSREAESHCQSPQEMGAKAASLDGPHEDPRCLSRPFSLLATRVSADLAQAARNSS RPERDMHSLPDMDPGSSSSLDPSLAGCGGDGSSGSGDAHSWDTLLRKWEPVLRDCLL RNRRQMEVISLRLKLQKLQEDAVENDDYDKAETLQQRLEDLEQEKISLHFQLPSRQP ALSSFLGHLAAQVQAALRRGATQQASGDDTHTPLRMEPRLLEPTAQDSLHVSITRRD WLLQEKQQLQKEIEALQARMFVLEAKDQQLRREIEEQEQQLQWQGCDLTPLVGQLSL GQLQEVSKALQDTLASAGQIPFHAEPPETIRSLQERIKSLNLSLKEITTKVCMSEKF CSTLRKKVNDIETQLPALLEAKMHAISGNHFWTAKDLTEEIRSLTSEREGLEGLLSK LLVLSSRNVKKLGSVKEDYNRLRREVEHQETAYETSVKENTMKYMETLKNKLCSCKC PLLGKVWEADLEACRLLIQSLQLQEARGSLSVEDERQMDDLEGAAPPIPPRLHSEDK RKTPLKESYILSAELGEKCEDIGKKLLYLEDQLHTAIHSHDEDLIHSLRRELQMVKE TLQAMILQLQPAKEAGEREAAASCMTAGVHEAQA >sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGAR RSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQP ESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLG LALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLT LWTSENQGDEGDAGEGEN >sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1 MDDREDLVYQAKLAEQAERYDEMVESMKKVAGMDVELTVEERNLLSVAYKNVIGARR ASWRIISSIEQKEENKGGEDKLKMIREYRQMVETELKLICCDILDVLDKHLIPAANT GESKVFYYKMKGDYHRYLAEFATGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRL GLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNL TLWTSDMQGDGEEQNKEALQDVEDENQ >sp|P62258-2|1433E_HUMAN Isoform SV of 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE MVESMKKVAGMDVELTVEERNLLSVAYKNVIGARRASWRIISSIEQKEENKGGEDKL KMIREYRQMVETELKLICCDILDVLDKHLIPAANTGESKVFYYKMKGDYHRYLAEFA TGNDRKEAAENSLVAYKAASDIAMTELPPTHPIRLGLALNFSVFYYEILNSPDRACR LAKAAFDDAIAELDTLSEESYKDSTLIMQLLRDNLTLWTSDMQGDGEEQNKEALQDV EDENQ >sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH PE=1 SV=4 MGDREQLLQRARLAEQAERYDDMASAMKAVTELNEPLSNEDRNLLSVAYKNVVGARR SSWRVISSIEQKTMADGNEKKLEKVKAYREKIEKELETVCNDVLSLLDKFLIKNCND FQYESKVFYLKMKGDYYRYLAEVASGEKKNSVVEASEAAYKEAFEISKEQMQPTHPI RLGLALNFSVFYYEIQNAPEQACLLAKQAFDDAIAELDTLNEDSYKDSTLIMQLLRD NLTLWTSDQQDEEAGEGN >tr|F2Z3E5|F2Z3E5_HUMAN Hydroxyacid-oxoacid transhydrogenase, mitochondrial OS=Homo sapiens GN=ADHFE1 PE=4 SV=1 MAAAARARVAYLLRQLQRAACQCPTHSHTYSQDGCFKY >tr|Q5SS58|Q5SS58_HUMAN MHC class I polypeptide-related sequence A OS=Homo sapiens GN=MICA PE=4 SV=2 MGQRDQGLDRERKGPQDDPGSYQGPERRNFLKEDAMKTKTHYHAMHADCLQELRRYL ESGVVLRRTVPPMVNVTRSEASEGNITVTCRASSFYPRNIILTWRQDGVSLSHDTQQ WGDVLPDGNGTYQTWVATRICRGEEQRFTCYMEHSGNHSTHPVPSGKVLVLQSHWQT FHVSAVAAGCCYFCYYYFLCPLL >tr|Q5T409|Q5T409_HUMAN Disrupted in schizophrenia 1 OS=Homo sapiens GN=DISC1 PE=2 SV=1 MPGGGPQGAPAAAGGGGVSHRAGSRDCLPPAACFRRRRLARRPGYMRSSTGPGIGFL SPAVGTLFRFPGGVSGEESHHSESRARQCGLDSRGLLVRSPVSKSAAAPTVTSVRGT SAHFGIQLRGGTRLPDRLSWPCGPGSAGWQQEFAAMDSSETLDASWEAACSDGARRV RAAGSLPSAELSSNSCSPGCGPEVPPTPPGSHSAFTSSFSFIRLSLGSAGERGEAEG CPPSREAESHCQSPQEMGAKAASLDGPHEDPRCLSRPFSLLATRVSADLAQAARNSS RPERDMHSLPDMDPGSSSSLDPSLAGCGGDGSSGSGDAHSWDTLLRKWEPVLRDCLL RNRRQMEVISLRLKLQKLQEDAVENDDYDKAETLQQRLEDLEQEKISLHFQLPSRQP ALSSFLGHLAAQVQAALRRGATQQASGDDTHTPLRMEPRLLEPTAQDSLHVSITRRD WLLQEKQQLQKEIEALQARMFVLEAKDQQLRREIEEQEQQLQWQGCDLTPLVGQLSL GQLQEVSKALQDTLASAGQIPFHAEPPETIRSLQERIKSLNLSLKEITTKVCMSEKF CSTLRKKVNDIETQLPALLEAKMHAISGNHFWTAKDLTEEIRSLTSEREGLEGLLSK LLVLSSRNVKKLGSVKEDYNRLRREVEHQETAYETSVKENTMKYMETLKNKLCSCKC PLLGKVWEADLEACRLLIQSLQLQEARGSLSVEDERQMDDLEGAAPPIPPRLHSEDK RKTPLKESYILSAELGEKCEDIGKKLLYLEDQLHTAIHSHDEDLIHSLRRELQMVKE TLQAMILQLQPAKEAGEREAAASCMTAGVHEAQA Database Translated Genomic Sequence Theoretical Spectra for Proteins MaxQuantSoftware (20,246 reviewed proteins) (51,188 un-reviewed) Homo Sapiens Reference Proteome 71,434 entries Homo Sapiens Reference Proteome 71,434 entries 4 Principle of protein database search

5 MS instrumentation in proteomics Aebersold R and Mann M. 2003. Nature 422: 198-207

6 Gradient elution:~200 nl/min Column (75 µm)/spray tip (8 μm) Reverse-phase C18 beads, 3 μm Nanoflow LC/MS interface set-up: Platin-wire 2.0 kV Sample Loading:~700 nl/min No precolumn or split! LTQ-Orbitrap Proxeon Easy nLC nanoflow LC System 12-15 cm Coupling LC to MS for complex mixture analysis

7 BSA tryptic in-solution digest 50 fmol on column Coupling LC to MS for complex mixture analysis

8 Source Linear ion trap (LTQ) C-Trap Octopole coll. cell Orbitrap LTQ-Orbitrap (2005) MS-Full Scan MS2 0 300600900120015001800 Orbitrap-MS LTQ-MS LTQ-FT MS/MS optimized scan cycle: Time [msec] MS2 → peptide mass measurement → peptide sequencing

9 Data processing workflow: MaxQuant

10 □ CID Identified + CID Not Iidentified Acquisition speed LTQ Orbitrap XL LTQ Orbitrap Velos

11 # of MS/MS Scans Acquisition speed

12 Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) Quantitation and identification by MS (nanoscale LC-MS/MS) Lys- 12 C 6 Lys- 13 C 6 Resting cells Treated (drug, GF) Combine and lyse, protein purification or fractionation ”normal AA””heavy AA” Proteolysis (trypsin, Lys-C, etc.)

13 Current research at the PCT Proteogenomics B. subtilis, E. coli (Krug et al, 2011, Mol Bosystems; 2013 MCP) Pristionchus pacificus (Borchert et al, 2010, Genome Res) cancer cell lines/tissues Proteomics for systems biology In-depth sequencing and quantitation of model organisms (B.subtilis, E.coli, S. pombe, A. thaliana) (Soufi et al, 2010, J Prot Res; Schütz et al, 2011, Plant Cell; Soufi et al, 2012, Curr Opinion Microbiol; Soares et al, 2013, JPR) Phosphoproteomics targets of Aurora kinase in S. pombe (Koch et al, 2011, Science Signaling) targets of protein kinase D in human cells (Franz-Wachtel et al., 2012, MCP) targets of S/T/Y kinases and phosphatases in B.subtilis and E.coli Protein modifications ubiquitylation (Ikeda et al, 2011, Nature) lysine acetylation (Carpy et al., in preparation) Clinical proteomics genetic rescue of Fragile X phenotype in FMR1 KO mice

14 Super-SILAC in Bacteria

15

16 E. coli: Replicate 1 and 2 ParameterNumber Total MS/MS757,835 Total Peptides Identified18,273 Total Proteins Identified2,292 Single Peptide Hits6.5% Total Proteins Quantified*1923 *in all phases of growth Soufi et al. in preparation

17 Biological reproducibility Soufi et al. in preparation

18 Proteome dynamics during growth Soufi et al. in preparation

19 Dynamics of stress proteins during growth Soufi et al. in preparation

20 Estimation of absolute copy numbers OD 600 Time (min) T1 T2 T3 T4 T5 T6 T7 18005760 UPS standard (iBAQ) Soufi et al. in preparation

21 Summary of absolutely quantified proteins During GrowthMembrane Proteins Identified2,292684 Quantified (All Phases) 1,923588 Absolutely Quantified 2,096494 Soufi et al. in preparation

22 Most abundant Proteins (ES) ProteinCopies per cell (ES) Elongation factor Tu 1;P-43341,047.56 Outer membrane protein A313,464.22 Braun lipoprotein216,037.00 Cysteine synthase A;O187,791.26 Enolase164,914.38 DNA-binding protein HU-alpha136,208.45 Scavengase P20;Thiol peroxidase131,599.61 Glyceraldehyde-3-phosphate dehydrogenase A127,416.09 Malate dehydrogenase123,943.77 IDP;Isocitrate dehydrogenase [NADP]117,787.02 High-affinity zinc uptake system protein znuA111,748.80 Cadmium-induced protein yodA107,098.12 Outer membrane protein C106,108.02 50S ribosomal protein L698,724.11 Universal stress protein A94,784.63 Soufi et al. in preparation

23 Dynamic range of protein abundance Soufi et al. in preparation Count Log 2 Protein Copy Number Blue: All proteins Red: Membrane proteins

24 Application of tandem mass spectrometry to genome re-annotation Search MS/MS spectra against a database containing the complete genome translated in 6 reading frames Proteogenomics

25 Problem: database size and structure Incompatibility with some data processing programs Long search times Decreased sensitivity of database search Unequal target and decoy search spaces Most translated frames are in fact decoy sequences Overestimation of the FDR Predicted ORFs Frame1 Frame2 Frame3 Frame4 Frame5 Frame6 REV_Predicted ORFs REV_Frame1 REV_Frame2 REV_Frame3 REV_Frame4 REV_Frame5 REV_Frame6 Predicted ORFs REV_Predicted ORFs „Ususal“ Proteomics applications Proteogenomics applications

26 Model Gram-negative bacterium Small (4.6 Mb) and well characterized genome ~4,300 protein coding genes (manually annotated and reviewed) Comprehensive high accuracy MS dataset comprising >42,000 unique peptide sequences from >2,600 proteins Hypothesis: genome annotation approaches completeness Assessment of general properties of a simple proteogenomic experiment Results I Proteogenomics of E. coli MS/MS spectra acquired MS/MS spectra identified MS/MS spectra identified (%) Peptide sequences Novel peptides Decoy peptides Lab contaminant peptides E. coli proteins MQ 1,941,724370,23119,133,9642633363062,653 TPP 1,941,724162,0288.325,7245902092,524

27 1.9M peptide mass spectra Results I Proteogenomics of E. coli

28 A B C D Position (Mb) MFEVTFWWRDPQGSEEY... VGSESWWQSK TWGYGVTALKVGSESWWQSKHGPEWQRLNDEMFEVTFWWRDPQGSEEY... Annotated genes Detected peptides Six-frame ORFs MLNQKIQNPNPDELMIEVDLCYELDPYELKLDEMIEAEP... KPPQIRISL...NAVFKPPQIRISL LATNFGGWILMLNQKIQNPNPDELMIEVDLCYELDPYELKLDEMIEAEP... Position (Mb) tref Annotated genes Detected peptides Six-frame ORFs PEP = 0.027976 PP = 0.9504 PEP = 4.02E-08 PP = 0.9999 yhjatref yhjb fepa fes ybdz fes Proteogenomics of E. coli Krug et al. Mol Cell Proteomics, 2013

29 Majority of Novel Peptides are False Positives Results I Krug et al. Mol Cell Proteomics, 2013

30 Assessment of Processing Workflows Results I Krug et al. Mol Cell Proteomics, 2013

31 Deep Proteome Coverage of Escherichia coli 20-fold base coverage of 27.5% genome sequence 0 50 100 150 Mean:20 scans Median: 7 scans MS/MS scans Results I Krug et al. Mol Cell Proteomics, 2013

32 Conclusions proteomics reaches analytical capacity to identify and quantify all gene products in microorganisms grown in culture several regulatory protein modifications (e.g. S/T/Y-phosphorylation, lysine acetylation) can routinly be analyzed on a global scale many challenges ahead: analysis of H/D-phosphorylation analysis of environmental samples coverage of genome/protein sequence by detected peptides future developments: faster MS/MS acquisition smarter acquisition software large-scale targeted proteomics metaproteomics and individual proteomics

33 Acknowledgements Proteome Center Tuebingen Boumediene Soufi Nelson C. Soares Philipp Spät Karsten Krug Alejantro Carpy Sasa Popic Silke Wahl Funding


Download ppt "Principles of Shotgun Proteomics and Proteogenomics Boris Maček Proteome Center Tuebingen InnoMol Proteomics Workshop April 8, 2014."

Similar presentations


Ads by Google