New algorithms for high-resolution metabolomics A case study on trypanosome parasites Rainer Breitling – Groningen Bioinformatics Centre University of Groningen Michael P. Barrett – Infection & Immunity Division, University of Glasgow Breitling et al., Ab initio prediction of metabolomic networks using FT-ICR MS, Metabolomics, 2006, 2:155 Breitling et al., Precision mapping of the metabolome, Trends in Biotechnology, 2006, 24:543
The biological context – trypanosomiasis sleeping sickness is a major health problem in tropical Africa current drugs are becoming ineffective and are a health risk themselves (they kill up to 10% of patients, rather than healing them!) metabolite profiling in drug-treated and mutant parasites may identify new drug targets first pilot study: compare metabolome of in vivo and in vitro parasites to the composition of media – identify metabolite scavenging To test the technology, a simple experiment was done on the pathogen that causes sleeping sickness (historically the most important disease of tropical Africa, because it limits successful animal husbandry, and still one major cause of human morbidity and mortality). Drugs are very old, resistance is developing and the side effects are terribly bad. New drug targets are searched for. We wanted to find those metabolites that are taken up (scavenged) by the parasite from the medium. Also those that are produced specifically by the parasite. Both should lead to interesting drug targets, transporters in the first case, enzymes in the second.
FT-ICR mass spectrometry High-resolution mass spectrometry. Detection/separation is done in an ion trap. Ions are cycling in a magnetic field and are excited by radio frequency pulses. The resonance frequency depends on the mass/charge ratio of an ion. The resulting detected signal is a convolution of the signals from all ions, it is deconcoluted by Fourier transformation and directly corresponds to a mass spectrum. The stronger the magnetic field, the better the resolution (similar to NMR) – almost unlimited. measurement of very small mass differences at very high accuracy in complex mixtures of biomolecules
The advantage of high resolution The chemical composition of a metabolite can be estimated Exact identification by mass may be possible (within limits) CH6N2 Methylhydrazine Mw = 46.0718 C2H6O Ethanol Mw = 46.0684 Only a limited number of molecular formulae can explain a given exact mass within acceptable limits of accuracy. The complexity of this task increases rapidly with the total mass (interesting computational challenge!!). And of course, it can’t discriminate between compounds with the same formula, but different connectivity.
High accuracy confirmed by standards Compound predicted mass measured mass ppm average S/N glutathione 307.083807 307.0835 1 438 oxidized glutathione 612.152 612.1516 328 trypanothione 723.3044 723.3036 16 oxidized trypanothione 721.2887 721.2889 281 NADP 743.075458 743.0766 2 442 NAD 663.109125 663.1096 1229 ATP 506.99575 506.9945 289 ADP 427.029418 427.0293 118 AMP 347.063086 347.0633 14 berenil 281.138894 281.139 9 pentamidine 340.1899 340.1897 67 DB75 304.132411 304.1325 115 melarsen oxide 292.00538 292.0053 113 spermine 202.215747 [202.1721] 216 - spermidine 145.157898 [141.1402] 28466 putrescine 88.100048 [100] 119000 ornithine 132.089878 [128.0479] 31566 Strong signal to noise ratios (S/N) are detected for most compounds in the standard mixture – but some of them are not detectable at all (polyamines). Those that are found, are measured accurately up to the third decimal place.
Overview of experimental results The global results are displayed in the form of a Venn diagram. Usually this is done for 3 sets at most, but there are solutions for 4 sets (like here) and even for 5. But they get increasingly difficult to interpret. The message is that (A) there is a large set of ubiquitous metabolites, (B) a smaller set of parasite-specific metabolites, and (C) some metabolites that are restricted to a single sample type. Also, the in vivo samples are consistently more complex. 1251 mass peaks detected in total in the four sample types Breitling et al., Metabolomics, 2006, 2:155
Can we use accuracy to get identities? Searches against the PubChem database to identify putative molecular identities Few useful hits, indicating that many metabolites are novel But some hits reveal interesting clues – many are fatty acid related, and this can be used to guide further more targeted exploration The high accuracy limits the number of possible hits tremendously. Usually there is at most a single mass hit (the lists are still quite long, because there are usually many “isoforms”) MetabolomeExplorer Classic (Breitling, unpubl.)
Phospholipids of regular structure Possible variations: Length of sidechain, in steps of 2C units (+C2H4) Degree of unsaturation (-H2) Type of headgroup (choline, ethanolamine, glycine…) connection via ester or ether bond (acyl or alkyl lipids)
The phospholipid metabolome of trypanosomes Even a small number of good hits allows further exploration.
Do mass differences contain additional information? Cluster of common distances Mass difference (all possible pairwise comparisons) Breitling et al., Trends in Biotechnology, 2006, 24:543
Do mass differences contain additional information? Real Masses (differences) Frequency Formula exact mass RANDOM masses (differences) 2.015950785 382 H2 2.015650074 92.7097502 7 21.98312914 326 Na-H 21.98194466 205.304917 1.003209507 284 13C isotope 1.00335484 52.82462466 24.00000115 260 C2 24 193.6001474 6 26.01629789 237 C2H2 26.01565007 243.2921378 28.03188991 218 C2H4 28.03130015 254.7535545 4.032019289 197 H4 4.031300148 6.467240667 1.012596951 164 H2-13C isotope 1.012295234 52.69339973 3.019108784 148 H2+13C isotope 3.019004914 21.98649217 22.99695714 140 C2-13C isotope 22.99664516 22.12482588 TOTAL 25370 115 (+/-22) (in 2472 clusters of >5) (in 19 +/- 4 clusters of >5)
Biochemically expected transformations Not all kinds of mass differences are equally interesting But some are particularly important, because they are expected: (de)hydrogenation (de)amination (de)phosphorylation …and many more (about 100 are really common)
Biochemically expected transformations Frequency Formula exact mass RANDOM hydrogenation/ dehydrogenation 284 H2 2.015650074 Glycine 8 C2H2 211 26.01565007 cytosine (-H) ethyl addition (-H2O) 191 C2H4 28.03130015 Threonine 7 hydroxylation (-H) 84 O 15.99491464 Serine palmitoylation (-H2O) 57 C16H30O 238.2296658 isoprene addition (-H) ketol group (-H2O) C2H2O 42.01056471 condensation/dehydration methanol (-H2O) 56 CH2 14.01565007 primary amine 6 40 H2O 18.01056471 Leucine Formic Acid (-H2O) 28 CO 27.99491464 Carboxylation 25 CO2 43.98982928 carbamoyl P transfer (-H2PO4) TOTAL 1438 271 (+/- 25) If masses are randomly distributed, their differences are not enriched in interesting transformations (right hand side), but in the real data, there are many of them, e.g. 284 pairs of metabolites differ by a mass of 2.01565 (+/- 1ppm), corresponding to a hydrogenation/dehydrogenation reaction
Visualization of “common” metabolic relationships Based on the common “textbook transformations” one can find the metabolic neighbors of a certain mass…
Visualization of “common” metabolic relationships “metabolic network” of masses that correlate with the amount of 809.5939 (C38:4) in trypanosome metabolism …and this can be repeated iteratively, to build an entire network of interrelated metabolites. This corresponds to a biochemical pathway map, although not each step is necessarily catalyzed by an enzyme (some of the mass differences may refer to compounds with related formula, but without any metabolic relationship)
de novo network generation In the end, a huge graph results from the de novo network building process – this is difficult to visualize, navigate and analyze – interesting challenges for bioinformatics
de novo network generation In the end, a huge graph results from the de novo network building process – this is difficult to visualize, navigate and analyze – interesting challenges for bioinformatics Does this network have a random structure, or are there certain patterns?
Degree distributions metabolites exponential random net The distribution of “textbook transformations” in the trypanosome metabolome follows a power law (linear graph in a log-log plot) [right]. The distribution of “clusters of common distances” is closer to an exponential distribution [left]. The reason is simple: Many reactions involve small molecules, which are not detectable in the FTMS machine. These compounds would be hubs in the network. They are missing here, but are implicitly considered in the “textbook transformations” lesson: Understand the limits of the data acquisition before trying an analysis transformations power-law scale-free net metabolites exponential random net Power law:
Conclusions FT-ICR MS provides highly accurate measurements of metabolites in complex mixtures accuracy is sufficient to identify metabolites based on mass information mass differences are particularly informative de novo metabolic network construction and exploration are a distinct possibility new analysis tools are necessary to make full use of the available information
MetabolomeExplorer platform Scheltema et al., submitted