Presentation is loading. Please wait.

Presentation is loading. Please wait.

METABOLOMICS & Biomarker discovery

Similar presentations


Presentation on theme: "METABOLOMICS & Biomarker discovery"— Presentation transcript:

1 METABOLOMICS & Biomarker discovery
Welcome, my name is Anika Vaarhorst and I work at the department of Molecular Epidemiology. I will give you an introduction into metabolomics and next I will give a short practical assignment were you will analyze a metabolomics dataset. Anika Vaarhorst Section of Molecular Epidemiology Leiden University Medical Centre Leiden, The Netherlands

2 What is Metabolomics The nonbiased identification and quantification of all metabolites in a biological system Metabolites are the biochemicals including lipids, sugars, nucleotides, amino acids and related amines of < 2000 Dalton to be found in biological fluids All metabolites combined make the human metabolome Metabolomics is the nonbiased indentification and quantificiation of all metabolites in a biological system, for example a human being. Metabolites are all the biochemical molecules including lipids, sugars, nucleotides, amino acids of less than 2000 Dalton that can be found in a biological fluid. The metabolites are the result of all the biochemical processes in the body. All metabolites combined make the human metabolome How big is 2000 Dalton. Dalton is the standard unit that is used for indicating mass on a atomic or molecular scale. 1 Dalton is defined as one twelfth of the mass of a neutral atom of carbon Dalton is ± ) * kg. A mol is defined as 12 grams of carbon-12. The number of atoms or molecules per mol is the constant of Avogadro, this number is *1023 mol-1. Thus, when a substance has a mass of 1 Dalton, than 12 grams of this substance will contain *1023 particles. For example a protein has a molecular weigth of

3 Why metabolomics More than 4000 metabolites can be measured by different platforms in blood. Not all at high throughput yet. Blood is the highway for degraded, secreted, discarded and synthesized molecules. Indicates tissues lesions, organ dysfunction and pathological state As -omics technology is close to biomedical phenotypes. Why would you study metabolomics. Nowadays it is possible to measure more than 4000 different metabolites by different platforms in blood, however not all metabolites can be measured high througput yet. Blood is easy accesable and is the high for degraded, secreted, discard en syntesized molecules. These metabolites can hopefully indicicate tissue lesions, organ dysfunction and a pathological state. Also among the different –omics technologies, this one is the closest to the biomedical phenotypes.

4 Epigenome This is illustrated in this slide. You can focus on the genome, the transcriptome, the proteome and the metabolome. When you combine information from the metabolome and the proteome or the transcriptome it is possible to detect metabolic pathways that are affected by disease or an lifestyle factors.

5 Here you see a part of the Valine, Leucine and Isoleucine degradation pathway in humans. All the white retancles represent different kinds of metabolites. The green ovals represent enzymes. The pink ovals represent co-enzymes or co-factors, and the orange ovals represent transcription factors. The green blobs are multimeric enzymes. This pathway starts with leucine, valine and isoleucine present, which are degraded into substances, like Acetyl-CoA that can for example enter the citrate cycle. Pathman.smpdb.ca

6 Metabolites marking diabetes in patients
Before you can identify pathway that are affected by for example diabetes, it is important to identify the metabolites that differ between diseased and non-diseased subjects. This was done by a study by Suhre, in this study 40 diabetes patients were compared to 60 healthy controls. The concentrations of 420 metabolites were determined in the fasting blood samples of the participants. Of these 420 metabolites, 28 metabolites were associated with diabetes after accounting for multiple testing. This slide here shows the metabolites that were affected by diabetes status. Based on this information it can be concluded that diabetes affects carbohydrate metabolism, the branched chain amino acid metabolism, lipid metabolism, also mild signals of ketosis were found (3-hydroxybutyrate). Creatinine and 1,5-anhydoglucitol were also increased in diabetes patients, which could indicate impaired renal function. And not surprisingly, in patients it was possible to detect the medication they used for their diabetes. Suhre et al. PLoS ONE | November 2010 | Volume 5 | Issue 11

7 environment Phenotype Genotype
Administration of branched amino acids increased insulin resistance environment Metabolome Phenotype Genotype Suhre et al., Nat Genetically Determined Metabotypes 37 genetic loci accounting for variance in level Wang et al., Nat Med 2011: markers of 4 x increased T2D risk branched chain amino acids, tyrosine and phenylalanine You cannot only study metabolites that are associated to a disease, but also study which genetic variants are associated to which metabolites. This was done in a study by Suhre et al in For this study they measured more than 250 metabolites. They found that 37 genetic loci were associated with metabolite levels. 25 of these genetic variants show effect sizes that are unusually high for genome-wide-association studies. These 25 genetic variants account for 10 to 60 percent of the differences in metabolite levels per allele copy. In a study published in 2011, Wang and co-workers followed 2422 participants for 12 years, of which 201 developed diabetes. In these individuals metabolites were measured. They found that branched chain amino acids (isoleucine, leucine, and valine), tyrosine and phenylalanine predict future diabetes. From research in animals and humans it is known that the administration of branched chain amino acids promotes insulin resistance. This indicates that it is also important to account for the environment in relation to metabolomics.

8 Psychogios et al. 2011 PloS One
To detect metabolites, several platforms can be used. In a paper published in 2011, 5 different platforms were used to identify as much metabolites as possible. In total, when using 5 different platforms, more than 3500 different metabolites could be identified and quantified in serum. This Venn diagram shows that it is possible to identify 49 compounds with the NMR platform, of which 20 are unique to this platform. The remaining metabolites show overlap with a platform based on Gas chromatography couples to mass spectrometry and the commercial Biocrates platform. Psychogios et al PloS One

9 A step to step approach Biological experiment
Sample extraction NMR analysis Raw data Data preprocessing Clean data Data pretreatment Here is a step to step approach for performing a metabolomics study. First you start with a study were there are cases and controls. In these subjects you draw blood, which are than measured using for example NMR spectroscopy. The resulting raw data needs to be preprocessed to obtain clean data. Next the research checks this clean data to see if there are for example individuals with outlying datapoints, these persons are removed from further anlaysis. Now you can test which metabolites are associated with a disease or phenotype and rank the metabolites according to their importance Data fit for analysis Data analysis Rank the important metabolites Van den Berg et al BMC Genomics

10 Sample analysis 1H-NMR spectroscopy
The sample is in the tube, which is in the probe, which is in the core of the magnetic field. vacuum Liquid nitrogen For now I will focus in H-NMR spectroscopy. The H stands for hydrogen-1, the most common hydrogen isotope. NMR stands for nuclear magnetic resonance. With H-NMR spectroscopy you can measure the amount of hydrogen-1 isotypes in solutions. A H-NMR spectrometer is basically a very large magnet and in the middle of this magnet a probe is placed. In this probe there is a tube in which the sample is placed. The sample is exposed to a magnetic field which can be varied from high to low. At the same time the sample is also exposed to electromagnetic radiation at a fixed frequency. When the hydrogen atoms can absorb this electromagnetic radiation, this will become visible as a peak in an H-NMR spectrum as is shown in the next slide. Liquid helium coil core

11 Metabolomics, NMR 1, imidazole; 2, urea; 3,D-glucose; 4, L-lactic acid; 5, glycerol; 6, L-glutamine; 7, L-alanine; 8, DSS; 9, glycine; 10, L-glutamic acid; 11, L-valine; 12, L-proline; 13, L-lysine; 14, Lhistidine;15, L-threonine; 16, propylene glycol; 17, L-leucine; 18, L-tyrosine; 19, L-phenylalanine; 20, methanol; 21,creatinine; 22, 3-hydroxybutyric acid; 23, ornithine; 24, L-isoleucine; 25, citric acid; 26, acetic acid; 27, carnitine; 28, 2-hydroxybutyric acid; 29, creatine; 30, betaine; 31, formic acid; 32,isopropyl alcohol; 33, pyruvic acid; 34, choline; 35, acetone; 36, glycerol. Analyse known variables 50 Here is an example of an NMR spectrum. The location of the peaks depends on the chemical environment of the hydrogen atoms. Depending on the chemical environment, hydrogen atoms can be shielded by a large number of electrons. In that case a high magnetic field is needed to generate a peak. The peak will become visible in the right part of the spectrum. When a low magnetic field is needed to generate a peak, this will become visible in the left part of the spectrum. With this technique it is possible to identify and quantify around 50 different metabolites.

12 Data pretreatment Check for outliers Check for distribution Centering
Scaling Transformations Data pretreatment consists of checking for outliers, checking how the metabolites are distributed. If metabolites show a skewed distribution this can be fixed by applying centering, scaling and transforming the data. This will be showed during the practicum.

13 Data analysis Univariate analysis
Univariate analysis combined with step wise regression multicollinearity LASSO regression, elastic net, ridge regression, PLS-DA There are several approached to test which metabolites differ between cases and controls. The most simple approach is to test per metabolite, if it is associated to a disease or phenotype. In some studies they perform step wise regression with the significant metabolites based on the univariate methods. If you’re using this method you have to be aware of multicollinearity. Multicollinearity is a situation were 2 or more metabolites are highly correlated. This occurs a lot in metabolomics data, which you will see in the practicum. With LASSO regression you can

14 Multiple testing Bonferoni correction
100 tests, test with a significance level of 0.05 P after Bonferoni correction: 0.05/100 = For metabolomics to conservative Replicate your findings in independent studies Cross-validation However with metabolomics it is not unusual if you test hundreds of metabolite, therefore it is important to account for multiple testing. A well known method is Bonferroni correction. For example you have 100 metabolites and you set your p-value to Now you have to divide 0.05 by 100. This results in a p-value of Thus a metabolites with a p-value lower than is significant after accounting for multiple testing. This method is very conservative, that is why some use false discovery rates to account for multiple testing. The false discovery rate is the expected proportion of false positives among all discoveries. In the table is some more explanation. M is the total number of hypotheses tested. For example if you test 100 metabolites, m would be 100. T is the number of true positives and F is the number of false positives. S is the total number of features called significant. M0 is the number of metabolites not associated with the phenotype in reality. M1 is the number of metabolites associated with the phenotype in reality. Storey and Tibshirani 2003, PNAS

15 Confounding Confounder variable: a variable other than the predictor variables that potentially affects the outcome variable Prevent confounding: Matching Stratification Controlling for confounding Include the known confounders as covariates in your model Metabolite Outcome variable Confounder

16 Problems: Confounding
Brindle JT et al., Nat Med. 8(12), → NMR spectroscopy is diagnostic for the occurrence and severity of CAD But according to: Kirschenlohr et al Nat Med. 12(6), Gender & statin treatment affect the ‘biomarkers’ of disease → groups must be stratified NMR analysis of plasma is a weak predictor for CAD Examples of problems with data analysis

17 BBMRI Rainbow RP4 Metabolomics Applying Metabolomics in Dutch cohorts
Het stukje over het BBRMI project

18 High throughput / high resolution NMR
Reference populations Leiden Longevity Study (LLS) Netherlands Twin Register (NTR) Erasmus Rucphen Family study (ERF) Selection based on existing metabolomics data Extensive phenotypic data High throughput / high resolution NMR LUMC Deelder et al. Mass spectrometry: Nederlands Metabolomics Centre, lipid platform Hankemeier et al. Mass spectrometry: Biocrates platform Gieger et al.

19 Netherlands Twin Registry Leiden Longevity Study Erasmus Rucphen Lipidomics Matrix Citrate plasma Stored at -30°C -80°C Fasted yes no Yes N 3000 2201 1H-NMR EDTA plasma Serum No 2487 Biocrates 267 (yes)/390(no) 1900 657 994

20 327 metabolites measured Biocrates N=163 1H-NMR N=52 146
Lipidomics N=129 12 5 40 124

21 The practical Long-lived siblings Spouses as controls
Offspring of long-lived siblings Which metabolites differ between controls and offspring of long-lived siblings


Download ppt "METABOLOMICS & Biomarker discovery"

Similar presentations


Ads by Google