Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processing of MS data using open source software - MZmine

Similar presentations


Presentation on theme: "Processing of MS data using open source software - MZmine"— Presentation transcript:

1 Processing of MS data using open source software - MZmine
Mark Earll, Mark Seymour, Mark Forster, Dave Portwood, Chris Pudney - Syngenta

2 We bring plant potential to life
Syngenta is one of the world’s leading companies with more than 24,000 employees in over 90 countries dedicated to our purpose: Bringing plant potential to life. Our Crop Protection and Seeds products help growers increase crop yields and productivity. We contribute to meeting the growing global demand for food, feed and fuel and are committed to protecting the environment, promoting health and improving the quality of life. 2

3 Metabolomics Metabolomics applications within Syngenta Plant breeding
Early detection of desirable traits is very valuable Sensory and nutritional profiling, relating chemical composition to taste and health Fundamental research Drought tolerance Ripening processes Effect of genetic manipulation New agrochemical products Mode of action studies

4 MZmine project history
MZmine project was initiated by Matej Orešic of Quantitative Biology and Bioinformatics group at VTT Technical Research Centre of Finland and Mikko Katajamaa of Computational Systems Biology Research group at Turku Centre for Biotechnology in 2004. First release in 2005 introduced the data processing workflow and implemented simple methods for each data processing task and data visualization. The project entered a second phase as Tomáš Pluskal of G0 Cell Unit at Okinawa Institute of Science and Technology joined the project in 2006 and started a process of redesigning the software framework towards modularity. A paper about MZmine 2 was published in BMC Bioinformatics in 2010. 2010 Syngenta sponsors two rounds of development 2011 and 2012 extending the software for GC-MS use and improving peak picking and adding various usability features. T. Pluskal, S. Castillo, A. Villar-Briones, M. Orešič, MZmine 2: Modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data, BMC Bioinformatics 11:395 (2010).

5 Summary of Syngenta sponsored improvements
Added an interface to R to allow incorporation of R based tools Baseline correction module (ptw) XCMS Centwave peak picking XCMS CAMERA deconvolution Identification NIST search for GC-MS data Chemspider Customisable adduct list Various usability and plotting improvements Save batch mode workflows 3D & 2D plot enhancements

6 Extracted single ion chromatograms Chromatographic peak detection
How MZmine works Mass Detection MZmine detects mass features Constructs a series of extracted ion chromatograms Detects peaks in extracted ion chromatograms Extracted single ion chromatograms Chromatographic peak detection

7 Alignment across samples
Multiple runs. Alignment made at the peak list level Retention time normalisation by reference to common peaks Scans are then merged with m/z and RT tolerances and optionally charge, ID and isotope pattern matching. Retrospective gap filling is used to obtain values where peaks not detected in some runs. This enables more meaningful statistical analysis.

8 Fundamental problems with LC-MS and GC-MS data
3 dimensional data Mass dimension Accurate mass enormous range of possible masses Drift in mass Mass feature detection Fragmentation + adducts Chromatographic dimension RT drift Column bleed / artefacts Peak detection

9 Baseline correction Baseline correction added from R ptw package.
Uses asymmetric least squares (Eilers 2004) Assigns weights to points above and below an estimated trendline Smoothing parameter – larger value = more smoothing (103 to 107) Asymmetry parameter weight for points above/below trend line (0 to 1) Baseline correction applied either to whole scan or to narrow strips. 1 or 0.5 m/z works well in practice Eilers, P.H.C. Eilers, P.H.C. (2004) "Parametric Time Warping", Analytical Chemistry, 76 (2), 404– 411. Boelens, H.F.M., Eilers, P.H.C., Hankemeier, T. (2005) "Sign constraints improve the detection of differences between complex spectral data sets: LC-IR as an example", Analytical Chemistry, 77,7998 – 8007.

10 How does baseline correction affect results?
Ratio 1/2 1/3 2/3 Area Orig Area Area Area Area Area Area mean 0.49 0.69 1.41 sd 0.06 0.14 0.20 SE 0.12 CV 12.04 19.98 13.94 Preliminary work: Ratios calculated for 3 well characterised peaks Various settings of smoothing parameter and asymmetry CV values same order of magnitude as typical metabolomic experimental variation

11 Mass detection Mass detection stage detects features in the mass domain Aim is to detect features and avoid noise, preview options assist with choosing parameters MZmine then extracts single ion chromatograms

12 Peak detection MZmine splits ion chromatograms into individual peaks
Problems: peak recognition is rarely perfect Split peaks Noisy features 2 new features: Maximum peak width (simple but effective) Implemented CENTWAVE module from XCMS Ralf Tautenhahn, Christoph Böttcher, and Steffen Neumann "Highly sensitive feature detection for high resolution LC/MS" BMC Bioinformatics 2008, 9:504

13 UPLC-MS Peak Identification
In-house library generated by UPLC Positive / negative ion Polar / Apolar methods Retention time and m/z values stored as .CSV file Common adducts added to database ( i.e. [M+Na] ) Custom database search in MZmine with RT and m/z tolerances Also experimented with RT modelling to predict samples not yet run in combination with accurate mass Highly approximate used for scouting for suspects to run next in UPLC-RT library

14 Extending MZmine for GC data
GC-MS data similar to UPLC from a chromatographic viewpoint Sharpness of peaks similar MZmine handles data well Rather longer runs need more RAM And/or collect in centroid mode GC-MS data is fragmented Searchable NIST MS Library search

15 GC Data I – NIST Search First implementation:
Send simple time slice to NIST MS search (RT Tolerance) Limit numbers of peaks sent Forward and reverse match factor All matching peaks get labelled with ID

16 GC Data II - Deconvolution
Experimenting with CAMERA* algorithm from XCMS CAMERA algorithm correlates peak shapes and clusters features to form “pseudo spectra” Enables deconvolution of closely overlapping peaks with differing peak shapes Pseudo spectra are generated which are sent to NIST * Carsten Kuhl, Ralf Tautenhahn, Steffen Numann IPB Halle

17 NIST search of Pseudo spectra
“Must have same identities” enables the selection of pseudo spectra NIST identities are then overwritten onto the peak list identities Currently under development!

18 2D and 3D plotting enhancements
3D plot: An intensity slider has been added and the ability to change font and font colour 2D plot: In cases of many overlaid chromatograms the legend would obscure the plots so a new icon now toggles the legend Peak annotations from a peaklist may now be overlaid on TIC plots

19 Identification Improvements
Adduct list is now customisable Import/export as CSV file Online Chemspider search Requires chemspider security token

20 Batch mode analysis Batch mode allows you to concatenate analysis steps into a batch Allows unattended operation (i.e. overnight) Can now load and save batch scripts as XML files

21 Design of Experiments (DoE) approach to testing
How to optimise the many parameters in MZmine? Design of Experiments is useful for testing software and sensitivity of parameters Requires some sensible measure of peak picking quality Trygg, Johansson et al. have used dilution series of a pooled sample. The linear regression r2 vs concentration may be used to distinguish ‘good’ from ‘bad’ peaks Some DoE already used in MZmine, more planned. Here the time taken to do baseline correction is studied using a CCC design A Strategy for Optimizing UPLC-MS Data-Processing Parameters – a DoE approach Mattias Eliasson, Stefan Rännar, Rasmus Madsen, Erik Johansson Emma Marsden-Edwards, John P. Shockcor and Johan Trygg Submitted to Analytical Chemistry

22 Summary of Our Identification Process
UPLC-MS Search against custom accurate mass/ retention time library Search against accurate mass library and predicted RT from QSRM model Search individual peaks against Online databases using Isotope pattern matching A great deal of hand curation! GC-MS NIST MS-search Hand curation (still painful) “Gotchas” Full search of public databases brings up many unsuitable hits Gap filling averages accurate mass across samples which can prevent peak recognition. Search custom database before gap filling.

23 The Future Tomáš Pluskal has recently been discussing version 3 of MZmine with the community. MZmine GUI seen as attractive feature to be retained Data model may be extended to allow GCxGC-MS and MSe data Possible incorporation of guineu software (Sandra Castillo) Stephan Beisken (EBI) has adapted some MZmine code into KNIME nodes For us this is a very attractive combination Workflow tools are ideal for multistep data processing Gives transparency and reproducibility advantages Audit trail capability Integrate with downstream processing already done in KNIME T. Pluskal, T. Uehara, M. Yanagida, Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching , Anal Chem (2012).

24 Advantages of open source model
We have found the open source model to be very useful Rapid development Ability to tailor software to our needs Direct communication with developers Incorporate cutting edge tools and algorithms Cost effective Substantial initial investment but no licence restriction on deployment across many users Finding new MS applications outside metabolomics Technical support Active MZmine forum Community We hope by donating code we can encourage further development by the community

25 Thanks and Acknowledgements
MZmine team Tomas Pluskal , Matej Orešič Mikko Katajamaa Chris Pudney Syngenta Mark Seymour Mark Forster Martin Cip Dave Portwood Aniko Kende IPB-Halle Carsten Kuhl, Ralf Tautenhahn Steffen Neumann EBI Christoph Steinbeck Stephan Beisken Dominic Clark Thermo Madalina Oppermann Umetrics / Umeå University Erik Johansson Johan Trygg

26


Download ppt "Processing of MS data using open source software - MZmine"

Similar presentations


Ads by Google