Presentation is loading. Please wait.

Presentation is loading. Please wait.

Doing statistics with homonuclear 2D-NMR spectra : handling and preliminary study of their repeatability Baptiste FERAUD Bernadette GOVAERTS (UCL, ISBA)

Similar presentations


Presentation on theme: "Doing statistics with homonuclear 2D-NMR spectra : handling and preliminary study of their repeatability Baptiste FERAUD Bernadette GOVAERTS (UCL, ISBA)"— Presentation transcript:

1 Doing statistics with homonuclear 2D-NMR spectra : handling and preliminary study of their repeatability Baptiste FERAUD Bernadette GOVAERTS (UCL, ISBA) – Michel VERLEYSEN (UCL, MLG) PhD Day September 14, 2012

2 Baptiste Feraud - UCL - ISBA / Machine Learning Group OUTLINE  WHAT ? Some definitions to a good start (Metabolomics, 1D and 2D-NMR experiences)  WHY ? Why use two-dimensional tools instead of « traditional » 1D spectra : benefits from a users' point of view  HOW ? Statistics : How to handle 2D-NMR data and spectra ? Example from a first 2D-COSY experimental design  NEED STATISTICAL GUARANTEES ? A rigorous study of 2D-NMR tools’ repeatability and robustness is needed : clustering approaches and preliminary results

3 Baptiste Feraud - UCL - ISBA / Machine Learning Group WHAT ? Metabolomics is the scientific study of chemical processes involving metabolites. Specifically, it represents the systematic study of the unique chemical fingerprints that specific cellular processes leave behind. Metabonomics is the study of biological responses to a stressor (drug, disease…) in the level of metabolites. Applications : pharmacology, pre-clinical drug trials, toxicology, newborn screening, clinical chemistry, food and medicinal plants quality control, … Data acquisition : Nuclear Magnetic Resonance Spectroscopy vs. Mass Spectroscopy (mass-to-charge ratio) 1D-NMR (see Réjane Rousseau’s thesis, 2011) vs. 2D-NMR

4 Baptiste Feraud - UCL - ISBA / Machine Learning Group 1D : Mainly 1 H-NMR (Proton NMR or Hydrogen-1 NMR) and Carbon-13 NMR 2D (more recently) : Homonuclear experiences : - COSY (COrrelated SpectroscopY) : first method for determining which signals arise from neighboring protons (usually up to four bonds). Correlations appear when there is spin-spin coupling between protons (i.e. correlation between two or more nearby chemical processes). - TOCSY (TOtal Correlated SpectroscopY) : creates correlations between all protons within a given spin system, not just between identical or vicinal protons as in COSY. Magnetization is transferred successively as long as successive protons are coupled, and is interrupted by small or zero proton-proton couplings.

5 Baptiste Feraud - UCL - ISBA / Machine Learning Group - NOESY (Nuclear Overhauser Effect SpectroscopY) : useful for determining which signals arise from protons that are close to each other in space even if they are not bonded. A NOESY spectrum yields through space correlations. (…) Heteronuclear experiences : Heteronuclear correlation is used to assign the spectrum of another nucleus once the spectrum of one nucleus is known. For small molecules, 1 H is usually correlated with 13 C while for biomolecules, 1 H is also commonly correlated to 15 N (HSQC for Heteronuclear Single Quantum Coherence ).

6 Baptiste Feraud - UCL - ISBA / Machine Learning Group SOME GRAPHICS…

7 Baptiste Feraud - UCL - ISBA / Machine Learning Group

8

9 WHY ? biomarker? or biomarkers? 1D protein spectra are often far too complex for interpretation Signals overlap heavily Ambiguous or overlapping resonances … Additional spectral dimension = extra information (obvious) separate the contributions made by individual resonances analysis and quantization of off-diagonal peaks ! QUESTION : extra information = relevant information ??

10 Baptiste Feraud - UCL - ISBA / Machine Learning Group HOW ? Let’s start with a first 1D and 2D COSY experimental plan : M1M2M3M4 4 mixtures = 4 cell culture systems containing various metabolites (fetal bovine serum, glutamax, amino acids, vitamins, inorganic salts, proteins, …) Expected : M1, M2 and M4 quite close (Data provided by Pascal de Tullio, Pharmaceutical chemistry, Ulg)

11 Baptiste Feraud - UCL - ISBA / Machine Learning Group HOW ? Let’s start with a first 1D and 2D COSY experimental plan : M1M2M3M4 (…) Sampling : 3 samples per mixture

12 Baptiste Feraud - UCL - ISBA / Machine Learning Group HOW ? Let’s start with a first 1D and 2D COSY experimental plan : M1M2M3M4 (…) Time : 3 repetitions per sample - Samples are subject to freezing and defrosting. - Risks : degradation and bacterial contamination because of the duration of the 2D analysis.

13 Baptiste Feraud - UCL - ISBA / Machine Learning Group 36 measures = 36 spectra = 36 peak lists From individual peak list … … to global peak list C1C2INT ……… All points in a specific spectra C1C2INT1P1INT2P2… …… +100… 00 +1… +1 +1… ………………… includes all pairs of coordinates that appear in at least one of the 36 spectra INT : intensities vectors P : position vectors (binary)

14 Baptiste Feraud - UCL - ISBA / Machine Learning Group REPEATABILITY ? As for 1D tools, we need to verify the statistical performances and reliability of 2D data and spectra. Some pre-processing :  Symmetrisation : by removing negative intensities (or too close to zero) which result from an inappropriate choice of baseline.  Bucketing : by controlling the size of the database (via the chosen number of decimals of the coordinates). One decimal → (909 × 74) Two decimals → (2348 × 74) Three decimals → (3250 × 74)  Detection of outliers among spectra via the intensities vectors.

15 Baptiste Feraud - UCL - ISBA / Machine Learning Group REPEATABILITY ? An intuitive way to evaluate the repeatability / reproducibility of 2D spectra consists in non-supervised multivariate clustering (blind). If we manage to separate and recover our 4 mixtures starting from the 36 spectra → Done ! 1) Clustering on position vectors Need some specific distances or similarity measures adapted to binary vectors such as Ochiai, Dice, Jaccard, Russel-Rao, Kulczynski … Ward and K-means algorithms

16 Baptiste Feraud - UCL - ISBA / Machine Learning Group Exemple of result (Ochiai-Ward, 2 decimals)

17 Baptiste Feraud - UCL - ISBA / Machine Learning Group Exemple of result (Ochiai-Ward, 2 decimals) in the vast majority of cases, we can already isolate the mixture 3

18 Baptiste Feraud - UCL - ISBA / Machine Learning Group 2) Clustering on intensities vectors Normalization of each vector such that sum = 1 Euclidean distance Ward and K-means algorithms RESULTS : → Generally, all mixtures are well recovered by the algorithms, in spite of the sampling procedure and time repetitions ! → Best result obtained with the one-decimal matrix (interest of the bucketing) : just one error !

19 Baptiste Feraud - UCL - ISBA / Machine Learning Group Exemple of result (Ward, 1 decimal)

20 Baptiste Feraud - UCL - ISBA / Machine Learning Group Validation : exemple of the K-means Number of clusters : from 2 to 6 Validation measure : Dunn index (ratio between minimal inter-cluster distance and maximal intra-cluster distance).

21 Baptiste Feraud - UCL - ISBA / Machine Learning Group 3) 2D vs. 1D (current work) Warning : be very careful to compare what is objectively comparable ! This implies same pre-processing procedures in 1D and 2D cases (very hard…). But we can : - eliminate negative intensities, - apply the same standards to the intensities, - use a same number of decimals, - remove outliers (PCA), - choose a resolution proportional or equal to the 2D horizontal axis, etc… By doing this, we can already visualize that the repeatability can be better in 2D than 1D !

22 Baptiste Feraud - UCL - ISBA / Machine Learning Group 1D clustering (Ward)

23 Baptiste Feraud - UCL - ISBA / Machine Learning Group It’s commonly accepted by users (biologists, pharmacologists, healthcare professionnals…) that the recent introduction of 2D-NMR methods represents a huge qualitative gap for metabolomic investigations. For them, it’s obvious and natural that more information = more power. BUT… for the moment, no statistical study proved this clearly … So, we are trying to fill this lack. We are working to show in a encouraging way that 2D-NMR tools (at first, COSY) are statistically robust tools, and, more, that 2D-COSY experiment seems to be more repeatable and reliable than corresponding 1D methods ! CONCLUSION

24 Baptiste Feraud - UCL - ISBA / Machine Learning Group CONCLUSION Perspectives : ► continue to go further into 1D vs. 2D comparisons ► improve 2D data pre-processing ► apply the same procedures with NOESY and heteronuclear methods (same conclusions ?) ► implement supervised classification methods (such as SVM, Lasso…) in order to make predictions and to identify discriminating zones (biomarkers) ► work with « challenging » real datasets (disease, drug…)

25 Baptiste Feraud - UCL - ISBA / Machine Learning Group THANK YOU FOR YOUR ATTENTION


Download ppt "Doing statistics with homonuclear 2D-NMR spectra : handling and preliminary study of their repeatability Baptiste FERAUD Bernadette GOVAERTS (UCL, ISBA)"

Similar presentations


Ads by Google