La Cristalera, Miraflores de la Sierra, December 2012 HPPHPP Use of SEQUEST search results with ProteoRed.org MIAPE Extractor
1.A working Workflow to extract MIAPE information from Proteome Discoverer 1.3 search results using ProteoRed MIAPE Toolkit 1.A working Workflow to extract MIAPE information from Proteome Discoverer 1.3 search results using ProteoRed MIAPE Toolkit Óscar Gallardo, Joan Villanueva, Montserrat Carrascal, Joaquín Abián 2.Data dependent acquisition using inclusion list (IL) 2.Data dependent acquisition using inclusion list (IL) Joan Villanueva, Óscar Gallardo, Joaquín Abián, Montserrat Carrascal I NDEX
Ó. Gallardo M ASCOT W ORKFLOW MIAPE Generation MIAPE Extractor Mass Spectra Identification Mascot Output file mzIdentML MIAPE MSMIAPE MSI MIAPE Generator Tool RAW MGF
MIAPE Extractor Ó. Gallardo Mass Spectra Identification Output file P ROTEOME D ISCOVERER W ORKFLOW RAWMSFMGFmzIdentML
Ó. Gallardo P ROTEOME D ISCOVERER W ORKFLOW RAWMGF (GPL) (GPL) LP-CSIC/UAB
Ó. Gallardo P ROTEOME D ISCOVERER W ORKFLOW RAWMGF
MIAPE Extractor Ó. Gallardo Mass Spectra Identification Output file P ROTEOME D ISCOVERER W ORKFLOW RAWMSFMGFmzIdentML
Ó. Gallardo A. Medina August 2012 P ROTEOME D ISCOVERER W ORKFLOW MSFmzIdentML
Ó. Gallardo P ROTEOME D ISCOVERER W ORKFLOW MSF.Prot.XML mzIdentML % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851) % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851) 1.ProCon ProCon was unable to interpret correctly the Controlled Vocabulary used by Proteome Discoverer to identify Post Translational Modifications (PTMs) % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. 2.ProCon ProCon also had problems with it’s internal array references
MIAPE Extractor Mass Spectra Identification Output file OMSSA W ORKFLOW RAWMGF Ó. Gallardo OMX (GPL) (GPL) LP-CSIC/UAB
MGF MIAPE Extractor Mass Spectra Identification Output file OMSSA W ORKFLOW RAW Ó. Gallardo mzIdentMLOMX
mzIdentMLOMX A. Medina August 2012 OMSSA W ORKFLOW Ó. Gallardo % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) mzIdentML Parsers mzIdentML Parsers were unable to process big OMX files because of internal memory management problems BIG, real-world, file
Ó. Gallardo P ROTEOME D ISCOVERER W ORKFLOW MSF.Prot.XML mzIdentML % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851) % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Exception in thread "AWT-EventQueue-0" java.lang.NullPointerException at de.mpc.Prot2MzIdent.AParamHandler.createThresholdParameterList(AParamHandler.java:526) at de.mpc.Prot2MzIdent.PD12ToMzIdentML.getProteinDetectionProtocol(PD12ToMzIdentML.java:851) 1.ProCon ProCon was unable to identify correctly Post Translational Modifications (PTMs), marking all of them as “unknown modification” in the resulting mzIdentML file % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. 2.ProCon ProCon had still problems with it’s internal array references
Ó. Gallardo P ROTEOME D ISCOVERER W ORKFLOW MSF.Prot.XML mzIdentML
MIAPE Generation MIAPE Generator Tool MIAPE Extractor Ó. Gallardo Mass Spectra Identification Output file P ROTEOME D ISCOVERER W ORKFLOW RAWMSFMGF.Prot.XML mzIdentML
MIAPE Generation MIAPE Extractor Ó. Gallardo Mass Spectra Identification Output file P ROTEOME D ISCOVERER W ORKFLOW RAWMSFMGF.Prot.XML mzIdentML % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found % finished TaxID for organismName unknown: Leptospira interrogans serogroup Icterohaemorrhagiae serovar Lai.TaxID for organismName unknown: Sphaerochaeta globosa...TaxID for organismName unknown: Leptospira borgpetersenii serovar..... MyProgressBar for getSpectrumIdentificationListAndProteinDetectionListAndPeptideEvidences for SEQ finished SequenceCollection written CV term for unknown modification Deamidated / Da (N, Q) not found. CV term for unknown modification Acetyl / Da (Any NTerminus) not found. Spectra IDs didn’t match between MGF file and mzIdentML file ID mgf ID mzid ID ID ID ID PepMSChargeRT
MIAPE Generation MIAPE Generator Tool MIAPE Extractor Ó. Gallardo Mass Spectra Identification Output file P ROTEOME D ISCOVERER W ORKFLOW RAWMSFMGF.Prot.XML mzIdentML MIAPE MSMIAPE MSI ID ID PepMSChargeRT ID
Ó. Gallardo P ROTEOME D ISCOVERER W ORKFLOW MIAPE Generation MIAPE Generator Tool MIAPE Extractor Mass Spectra Identification Output file RAWMSF MGF.Prot.XML mzIdentML MIAPE MSMIAPE MSI
1. 1.Uploading of MSF + mzIdentML files through MIAPE Extractor is not yet automatized 2. 2.Although we can generate MIAPE data from Sequest search results, MIAPE Toolkit doesn’t work very well with this data for the analysis stage: we can not retrieve the identified proteins, there are problems with the Sequest Score fields, … 1. 1.We are working in an automation script, to automatize MIAPE Extractor data extraction: MIAPE Extractor Automator v Development of MIAPE Extractor and MIAPE Generator tool continues improvement in each version 1. 1.Exportation of Prot.XML files from the MSF ones, and utter conversion of MSF + Prot.XML files to mzIdentML files is not automatized 2. 2.ProCon has still some errors, is very slow with large files, and is memory hungry ProCon developers are working in a new version that doesn’t need Prot.XML files, making the conversion process much faster and easier. W ORK IN P ROGRESS Ó. Gallardo
1.A working Workflow to extract MIAPE information from Proteome Discoverer 1.3 search results using ProteoRed MIAPE Toolkit 1.A working Workflow to extract MIAPE information from Proteome Discoverer 1.3 search results using ProteoRed MIAPE Toolkit Óscar Gallardo, Joan Villanueva, Montserrat Carrascal, Joaquín Abián 2.Data dependent acquisition using inclusion list (IL) 2.Data dependent acquisition using inclusion list (IL) Joan Villanueva, Óscar Gallardo, Joaquín Abián, Montserrat Carrascal I NDEX
RATIONAL OF USING DDP WITH INCLUSION LIST (IL): a.- Most target proteins assigned to the groups of the shotgun project were not detected using shotgun approaches. b.- The few detected peptides were not optimum for MRM analysis (not proteotypic, with Met/Cys, with missed cleavage). c.- Preliminary tests at LP-CSIC/UAB using targeted approaches require a limited list of peptides (need to restrict the list of target m/z values to 20-30) and failed to detect the target proteins. DDP with Inclusion list increases the probability to positively detect low abundant proteins/peptides without the constraints of targeted approaches. 16 PROTEINS SELECTED FOR INCLUSION LIST - 6 proteins assigned to the LPCSICUAB laboratory - 10 proteins assigned to MRM labs and not detected by shotgun LaboratoryUniprotName CanalsP69905 HBA_HUMAN FBQ6GPI1 CTRB2_HUMAN CGP24855 DNAS1_HUMAN MPVQ6A1A2 PDPK2_HUMAN FCP16444 DPEP1_HUMAN CGQ9BSW7 SYT17_HUMAN CGP11597 CETP_HUMAN MPVP15391 CD19_HUMAN CGQ53FZ2 ACSM3_HUMAN FVQ8N4N3 KLH36_HUMAN AbianQ9BUU2 METTL22_HUMAN AbianP33076 CIITA_HUMAN AbianQ9Y661 HS3ST4_HUMAN AbianQ14703 MBTPS1_HUMAN AbianB7ZMK8 PRSS36_HUMAN AbianA4GXA9 EME2_HUMAN Data dependent acquisition with inclusion list J. Villanueva
To obtain the inclusion list: 1.- All tryptic peptides 7-25AA. 2.- m/z values assuming z=2 and z=3 for all peptides. 3.- Filter duplicate m/z values (software requirement) Number of m/z values in the inclusion list: 556 (num peptides 282) Signal IDm/z P33076_GCTLLLTARPR P11597_VFHSLAK P16444_YPDLIAELLR Q53FZ2_EGWGNLK P24855_YDIALVQEVR Q8N4N3_VASMNQR Q8N4N3_VKPAVCSLLPK Q14703_APCPGCSHLTLK Q9Y661_AISDYTQTLSK Q9BSW7_TAVEQWHSLR P69905_VDPVNFK P16444_TLEQMDVVHR A4GXA9_MGLLAVGPDLSR Samples CCD18 and MCF7 Aliquot 250 µg protein OffGel (12 fractions) FASP digestion LC-MS/MS (DDP, IL, Targeted) Protein Discoverer Procedure: Data Dependent with IL J. Villanueva
DATA DEPENDENT WITH INCLUSION LIST: LTQ-ORBITRAP Offgel Fr6 Offgel Fr7 Sample VH: MCF-7 MS traces J. Villanueva
RESULT: Data dependent with IL: 282 Listed peptides undetected (same that targeted experiments) Low amount of target proteins Proteins not expressed in these cells RESULTS: Inclusion list and targeted DATA PROCESSING FOR IL DATA: 1.- MGF generation with PDv Database search: Proteome Discoverer and Mascot 3.- FDR 5% J. Villanueva
DATA PROCESSING: 1.- MGF generation with PDv Database search: Proteome Discoverer (and Mascot) 3.- Search results and Filtering (1 %FDR): MIAPE Extractor (Data Inspector Module) and Proteome Discoverer. Work in progress: MIAPE EXTRACTOR: The data could be uploaded and the FDR process could be achieved. Data Inspector Module: Detected errors to be solved: unable to extract protein information from SEQUEST data. Chromosome 16 protein description: Data Dependent Analysis J. Villanueva
Sample Acquisition method search method MIAPE EXTRACTORPROTEOME DISCOVERER Num peptidesNum proteinsNum peptidesNum proteins MCF7DDPMASCOT SEQUEST CCD18DDPMASCOT SEQUEST Work in progress... Number of proteins that passed the 1%FDR filter: 1.- Significant differences between searching algorithms Need an in-depth data revision. J. Villanueva
La Cristalera, Miraflores de la Sierra, December 2012 HPPHPP Use of SEQUEST search results with ProteoRed.org MIAPE Extractor