Presentation on theme: "PSI-Proteome Informatics update Andy Jones PSI 2013 Liverpool."— Presentation transcript:
PSI-Proteome Informatics update Andy Jones PSI 2013 Liverpool
PSI-PI outputs Formats and guidelines for proteome informatics Standard formats: – mzIdentML – mzQuantML – mzTab Reporting guidelines – MIAPE MSI – MIAPE Quant
MIAPE documents Originally one MIAPE document: – MIAPE Mass spectrometry information (MSI) containing both identification guidelines and quant guidelines – MIAPE MSI (ident only) and MIAPE Quant MIAPE MSI status – MIAPE MSI 1.1 published back in 2008 – Working group 2011-2012 minor updates to requirements and removal of quant parts – MIAPE MSI 1.2 still needs to be re-submitted to PSI process Plan for meeting: – Final issues with MIAPE MSI and alignment with mzIdentML?
MIAPE Quant timeline Work started on Dec 2010 by ProteoRed groups of experts Shared with PSI working groups in March 2011 Revision at PSI meeting (Heidelberg) April 2011 PSI review: – Public and external review ended on August 2012 – Major revision accepted on October 2012 Journal of Proteomics: – Submitted on 15th February – Accepted on 27th February after minor revision Martínez-Bartolomé, S., Deutsch, E. W., Binz, P.-A., Jones, A. R., Eisenacher, M., Mayer, G., Campos, A., Canals, F., Bech-Serra, J.-J., Carrascal, M., Gay, M., Paradela, A., Navajas, R., Marcilla, M., Hernáez, M. L., Gutiérrez-Blázquez, M. D., Velarde, L. F. C., Aloria, K., Beaskoetxea, J., Medina-Aunon, J. A., and Albar, J. P. Guidelines for reporting quantitative mass spectrometry based experiments in proteomics. Journal of Proteomics, 2013 in press. http://www.sciencedirect.com/science/article/pii/S1874391913001024 http://www.sciencedirect.com/science/article/pii/S1874391913001024 No planned work for meeting
mzIdentML Timeline: – Original 1.0 version in Aug 2009 – Version 1.1 stable (Aug 2011) – Manuscript published in MCP in 2012 PSI 2013 To do list: – Updates to protein grouping – PTM localisation / ambiguity scoring – General discussion of data compression issues Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., Siepen, J., Hubbard, S., Selley, J., Searle, B., Shofstahl, J., Seymour, S., Julian, R., Binz, P.-A., Deutsch, E. W., Hermjakob, H., Reisinger, F., Griss, J., Vizcaino, J. A., Chambers, M., Pizarro, A., and Creasy, D. (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & Cellular Proteomics 11, M111.014381.
ToolStatusIMPORT/EXPORT Mascot mzIdentML Version 1.0 available in Mascot version 2.3, mzid stable version 1.1 available in Mascot version 2.4 EXPORT OMSSAhttp://code.google.com/p/mzidentml-lib/http://code.google.com/p/mzidentml-lib/ from the U.Liverpool groupEXPORT MSGF+Full support for results into mzid 1.1EXPORT PeaksNative export of mzIdentML version 1.1EXPORT PhenyxExporter to mzIdentML v1.1 now available - contact GeneBio for details.EXPORT PLGS ProCon Conversion of SEQUEST *.out into mzIdentML (SpectrumIdentificationResults only); Conversion of ProteomeDiscoverer 1.2 + 1.3 (Thermo) *.msf and *.prot.xml files into mzIdentML EXPORT ProteinPilotContact is Sean Seymour ProteinScapeWork in progress SEQUEST - NativeProCon (see above) ProteoWizard pepXML converter available now - impl. of C++ library for reading/writing MzIdentML / interface for importing other formats IMPORT AND EXPORT SEQUEST - BioWorksWork in progress SEQUEST - Proteome DiscovererWork in progress (exporters available from ProCon project) SpectraSTProteowizard conversion from pepXML EXPORT Spectrum Mill X!Tandemhttp://code.google.com/p/mzidentml-lib/ EXPORT OpenMSFully supported in release 1.9IMPORT AND EXPORT ScaffoldAvailable now in Scaffold version 3.0EXPORT Scaffold PTMScaffold PTM tool imports identifications in mzIdentMLIMPORT TPPpepXML to mzid from ProteoWizardProteoWizardIMPORT AND EXPORT MIAPE MSI ExtractorTool available from the ProteoRed teamProteoRed IMPORT CSV exporter http://code.google.com/p/mzidentml-lib/ IMPORT PAAnalyzerImports and exports mzIdentML (only v1.0)IMPORT AND EXPORT MyrimatchIdentifications exported in mzIdentMLEXPORT TagReconIdentifications exported in mzIdentMLEXPORT PepitomeIdentifications exported in mzIdentMLEXPORT IDPickerVersion 3.x implements mzIdentML importIMPORT jmzIdentMLJava API for reading and writing mzIdentMLIMPORT AND EXPORT Tooling for mzIdentML
Formats mzQuantML – Output of quantitative software – Quantitative values about proteins, protein groups, peptides and features (quantified regions on mass spec) also small molecules... Relative or absolute values for single samples (Assays) or groups of replicates (StudyVariables)
mzQuantML status Version 1.0 rc-1 submitted to the PSI process October 2011 Version 1.0 rc-2 June 2012 Re-submitted to PSI process in October 2012 & manuscript submitted to MCP, minor correction received Completed PSI process in Feb 2013 – version 1.0 release – Supports label-free (intensity), label-free (spectral counting), MS 2 tag techniques (e.g. iTRAQ) and MS 1 label techniques e.g. SILAC – Schema is fixed with each technique defined by separate semantic rules, implemented in validator software – Manuscript re-submitted to MCP, awaiting outcome Implementations Java API for creating example files (version 1.0 release): http://code.google.com/p/jmzquantml/http://code.google.com/p/jmzquantml/ Java-based validator (version 1.0 release): http://code.google.com/p/mzquantml-validator/http://code.google.com/p/mzquantml-validator/ Software for converting output files from MaxQuant and Progenesis: – Qi, D et al. OMICS 16(9): 489-495 ; http://code.google.com/p/maxquant-mzquantml-convertor/http://code.google.com/p/maxquant-mzquantml-convertor/ Implementation in OpenMS for some techniques Beta Java library of routines inc. mzTab exporter: http://code.google.com/p/mzq-lib/http://code.google.com/p/mzq-lib/ Beta Excel to mzQuantML converter for spectral count data: http://code.google.com/p/tsv-or-csv- mzquantml-converter/http://code.google.com/p/tsv-or-csv- mzquantml-converter/ Mzq To do list: Need to add SRM support – Local testing of SRM encoding and conversion from Skyline – Need wider input on our mapping and writing semantic rules for software Need to check whether protein grouping and mod scoring map onto format okay
mzTab To provide a simple and efficient way of exchanging results from MS approaches. – Simple summary “final” report of the experimental results; Peptides and proteins identified and quantified – Small molecules included (metabolomics) – Technical and biological metadata – Spectra can be referenced in optional columns. – Set of mandatory and optional attributes (very flexible). Four sections: – (Optional) Metadata section – (Optional) Protein section – (Optional) Peptide section – (Optional) Small Molecule section (metabolomics) Can report MS derived data at different levels: – Single experiments – Multiple (possibly linked) experiments (merged files) – Data generated as a result of a query to a bioinformatics resource – Possible to add a reliability score for each identification Easy to parse and use by the research community, systems biologists as well as providers of knowledge bases. It can be used by non-experts in bioinformatics and/or proteomics. http://mztab.googlecode.com
mzTab status Submitted to the PSI document process on May 2012. TO DO: Addressing now the remaining (minor) comments after the second round of review. So, we hope that version 1.0 will soon be formalised. Publication (revised version) under review in MCP. Current implementations: – jmzTab (Java API): 2 versions have been developed. Version 2.0 (Q.W. Xu, about to be finished, with more functionality) is going to be the maintained version. Version 1.0 (J. Griss) will not be further maintained. – mzTab Validator, PRIDE XML to mzTab converter and mzTab merger in beta status. – PRIDE Converter 2. – OpenMS (version 1.10) – R/Bioconductor package Msnbase (L. Gatto, Cambridge University) – LipidDataAnalyzer (University of Graz) – Metabolights (EBI) and COSMOS EU project: A slightly modified version is being used right now. Working in contact with them.
PSI-PI work done mzIdentML – Minor schema issues: Optional attribute (Dbsequence_Ref) on ProteinDetectionHypothesis (would be better if mandatory) – update spec doc encouraging best practice Pre-fractionation: – update spec doc encouraging best practice – one SpectrumIdentificationList where possible Retention time reporting: – Update spec doc encouraging best practice; align with mzML CVs – Support for Crosslinking results Sketched a possible reporting format that looks to cover most simple cases Needs (considerable) further testing in local implementations and follow up by calls – Mod localisation Sketched some possible encodings Needs follow up calls and implementation in software Keen to build this support into mzid 1.1 but model is going to be a work-around. – Protein grouping Reported back current progress of working group (key members not present here) New members will join the working group
PSI-PI work done mzQuantML – Sketched SRM example files for label-based encoding – Need sketched example for label-free but seems straightforward – Plan to build export software very soon from Skyline (prototype already done) and mProphet – Write up semantic encoding rules – Submit to PSI doc process as a Community Practice document
PSI-PI work done mzTab Finalised minor issues from second round of PSI doc process review Deadline 1 st May for re-submitting final “release 1.0” document – Needs minor updates to document and example files