Use of retention indices for compound identification in GC/MS analysis

Use of retention indices for compound identification in GC/MS analysis
Prof. Tom Wenseleers Department of Biology University of Leuven, Belgium

Retention index as orthogonal filter in GC/MS
One advantage of GC/MS compared to LC/MS is the greater reproducibility of the relative elution order and relative retention time of compounds. Retention index = relative retention time normalised to closely eluting n-alkanes. Powerful orthogonal filter for compound ID !

Example 1: sesquiterpenes
Even if you could do perfect prediction of EI-MS fragments and intensities (CFM-ID), this would still not be enough for compound ID, unless combined with RI info : C15H24 cis-muurola-3,5,diene, RI = 1447 germacrene D, RI = 1479 bicyclosesquiphellandrene, RI = 1487 (MassFinder 4)

RI as orthogonal filter in GC/MS
NIST MSSEARCH version 2014 Ca. 85% of all hits with match factors > 800 can be removed if retention index is used as orthogonal filter in MS search using RI tolerance window (Babushok 2015). RI mismatch causes penalty in matching score. Better than using hard RI tolerance window, especially when used with predicted RIs. NIST2014 MS library includes median experimentally measured RIs and estimated RIs for some compounds (S. Stein's 2007 group contrib method) $ESTIMATED_KOVATS_RI 703 $RI_ESTIMATION_ERROR 89 $`EXPERIMENTAL_RI_MEDIAN/DEVIATION/#DATA` "SemiStdNP=805/0/1 StdPolar=1376/0/1"

Example 2: insect hydrocarbons
Insect hydrocarbons : protective coating + pheromones, e.g. sex pheromones, species and gender recognition cues, primer pheromones. High nr. of compounds with highly similar mass spectra (e.g. >2,000 compounds identified in Hymenopteran insects). Standard lib search fails in 99.% of the cases. Often coelution of multiple compounds & deconvolution doesn't always work. mix of 11,15,23- and 13,17,23-trimethylpentatriacontane

Example 2: insect hydrocarbons
237 homologous biosynthetic series (Kather & Martin 2015) Huge nr of possible tautomers, but restricted set of homologous biosynthetic series. And RIs can be very accurately predicted for such defined, simple compounds. RI model based on neural net with 36 geometrical descriptors

Predicted RIs as orthogonal filter
RI model applied to 10,000 combinatorially produced methylalkanes that form part of observed homologous series. Custom mass spectral identifier based on comparison of spectra to equivalent straight-chain alkane to get diagnostic ions and matching to possible candidates based on RI constraint and elution order constraints. Correctly predicts identity of methylalkanes including mixes in ca. 90% of all cases and 100% accurate for 15 synthetized mono, di, tri & tetramethylalkanes. More spectacular applications should be possible in combination with simulation of EI spectra ! 5-Methyltricosane Measured RI: Predicted RI:

Predicted RIs as orthogonal filter
For library compounds with a measured EI spectrum but no measured RI, RI prediction has much potential as an orthogonal filter. Also able to restrict possible hits with in-silico predicted EI mass spectra (CFM-ID, QCEIMS). But: more accurate RI models needed. Mean absolute error of S. Stein's group contribution method (Stein 2007) = 87 RI units – too much ! Also better, biologically plausible combinatorial libraries needed (BioTransformer or hand-crafted ?) Working on it ! Jalali-Heravi & Fatemi 2000

Couple of issues Although potential is clear there is still some open issues...

Issue 1: Traditional formulae no good
Traditional formulae : Kovats RI for isothermal programs (interpolation on log RT scale) : Ix = 100n + 100[log(tx) − log(tn)] / [log(tn+1) − log(tn)] Linear retention index (Van den Dool and Kratz) for temp programmed (interpolation on linear RT scale ) : Ix = 100n + 100(tx-tn) / (tn+1 − tn) where tn and tn+1 are retention times of the reference n-alkane hydrocarbons eluting immediately before and after chemical compound X, tx is retention time of compound X But: traditional formulae are only used for convenience, not accuracy

Much better: cubic spline interpolation
Traditional formulae no good because relationship not linear or loglinear. Much better to use interpolating cubic splines (Halang et al. 1978, Girard 1996, Messadi et al. 1990). More accurate & less dependent on specific temperature program used, better inter-machine correspondence (< 1-2 units vs units for traditional LRI). Deviation between LRIs and cubic spline RIs up to 37 units for Adams lib. My GC/MS package that's in development (GCMSPrime) provides provisions to automatically recognize RI markers and calculate cubic spline RIs. R: smooth.spline()

Issue 2: Libraries not integrated & not open
NIST 2014 GC RI library / NIST Webbook : 82,868 comps (Webbook free, with detailed column & temp programme info & source, indexed in simplified form in Pubchem and in extensive but ugly form in Chemspider ) Pherobase (insect pheromones & plant metab, 13,000 comps, with source & column info, free) Adams RI+MS terpenoids library (2,200 comps, DB-5) LRI and odour database (4,400 comps, with col info & source, free) König terpenoids library (2,000 comps, DB1, free) flavornet (700 compounds, DB-1, DB-5, DB-WAX, source, free) Rouseff citrus database (500 comps, DB-5, DB-WAX, free) FiehnLib (1,200 comps, DB-5ms, commercial) None of these have a web API... Sometimes chemical structure information not given or unreliable. Many (ca. 1/3-1/5) of the compounds represented in these RI libraries do not have an EI-mass spectrum available in NIST or Wiley. NOTE check overlap Pherobase – NIST 2014

Some in Russian ?!

Will make R code available to access these...
PUBCHEM PHEROBASE DB-1 DB-5 DB-WAX NIST WEBBOOK Will make R code available to access these...

Issue 3: Data quality Some databases, like Adams terpenoid lib, are high quality and primary data recorded under uniform GC conditions. For others, quality varies. E.g. for many studies included in NIST2014 RI DB unclear which formula was used. These are put down as "Normal alkane RI". Wide variety of column phases (287 in NIST), column diameters, phase thicknesses & temp programmes used. None used cubic spline interpolation. Different attempts at curation (e.g. iMatch2, Koo et al. 2014). -pinene, NIST Webbook, DB-5 or equivalent

Issue 3: Which standards to use?
Original scale used n-alkanes, e.g. C7-C40 alkane ladder. Automated identification linear alkanes tricky due to high match factors with other aliphatic compounds (but possible). For this reason, some labs (e.g. Fiehn lab, cf. VocBinbase) opt instead to use C4:C24 even-chain FAMEs with RI value= and for C4 and C24 FAME (Kogerson et al. 2011). Also Lee retention indices (Lee et al. 1979) - use PAHs: benzene, naphthalene, phenanthrene, chrysene and picene (RI=200 to 600, equivalent alkane RI ), but too few compounds to be accurate & scale doesn't go far enough. Nice boiling point correlation though. Confusion to have several scales around – I would at least suggest to always provide RIs on a linear alkane scale – conversion easy for part. column type VOCBINBASE FAMES LINEAR ALKANES compounds

Perspectives: Ideally could include both newly measured, highly accurate data, plus perhaps consensus estimates of old published RI values for 3 main types of columns (DB-5/DB-1 & DB-WAX (cf. Pubchem) and predicted values. Standardized fields necessary for column & temp programme meta-info! Guidelines? Separate, but relationally linked RI database would also be possible Should people be asked to also upload their internal or external standard spectra, including retention times? Would allow Mona to do its own calculation of retention index (otherwise people might use less ideal formulae). Quality flag for RI data (Gold=new data, Bronze=old literature data, Predicted=QSPR predicted) and estimated accuracy, both for old literature data and new experimentally measured data

In the NIST RI lib they use the following fields in their SDF file:
<KOVATS INDEX> # for experimentally measured RIs 1207 <COLUMN TYPE> Capillary / Packed (packed will be little useless as that's too inaccurate) <COLUMN CLASS> Semi-standard non-polar (for DB-5) / Standard non-polar (for DB-1) / Standard polar (DB-WAX) <ACTIVE PHASE> DB-5 <COLUMN LENGTH> 30 m <CARRIER GAS> He <COLUMN DIAMETER> 0.2 mm <PHASE THICKNESS> 0.25 um <DATA TYPE> Linear RI / Kovats RI / Lee RI / Normale alkane RI / Lee RI value specified by scale definition / Normale alkane RI value specified by scale definition / cubicspline (Lee can probably be ignored as it's pretty useless and inaccurate) <PROGRAM TYPE> Ramp / Isothermal / Complex <START T> 50 C <END T> 250 C <HEAT RATE> 5 K/min <START TIME> 3 min <END TIME> 15 min <ESTIMATED KOVATS RI> # for QSP predicted RI value XXX <RI ESTIMATION ERROR> # for QSP predicted RI value XXX

File formats & databases
But: no set standard for including RI information in mass spectral data files (MSP, JCAMP, SDF). Current approach: include meta-info in comment fields of MSP files to make it backward compatible with older software. LIB2NIST can retain meta-information – can give you the guidelines to make sure that RI is preserved during library conversion to NIST format...

Perspectives Leverage of using retention indices for compound ID in GC/MS would be especially strong if it could be combined with other orthogonal filters : parallel chemical ionization run to get masses of molecular ions & isotope abundances ideally with high mass accuracy to enable formula ID (or accurate mass recalibration, cf. Cerno Massworks who claim ca. 10 ppm mass accuracy on single quadrupole machines – open source implementation desirable). Should be relationally linked to EI spectra in Mona. restricting possible formulas based on compound libraries and 7 golden rules (Kind & Fiehn 2007) libraries with in-silico predicted EI-spectra (CFM-ID / QCEIMS / regression models for simple compounds) and QSPR predicted RI indices compound libraries: ideal are natural product libraries or combinatorial libraries that respect observed homologous biosynthetic series (metabolite prediction methods needed) Open databases to collect RI information in standardized way and standardized file format specs allowing for RI info desirable as well as widely available software to calculate RIs in high-quality way.

Use of retention indices for compound identification in GC/MS analysis

Similar presentations

Presentation on theme: "Use of retention indices for compound identification in GC/MS analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Use of retention indices for compound identification in GC/MS analysis

Similar presentations

Presentation on theme: "Use of retention indices for compound identification in GC/MS analysis"— Presentation transcript:

Similar presentations

About project

Feedback