Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Similar presentations


Presentation on theme: "Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris."— Presentation transcript:

1 Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris

2 Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks

3 Metabolic network The metabolic network consists of enzyme proteins and chemical compounds The metabolic network consists of enzyme proteins and chemical compounds 6018 genes in yeast genome 6018 genes in yeast genome 1120 genes with EC numbers 1120 genes with EC numbers 668 genes with pathway information 668 genes with pathway information (in the KEGG as of Sep. 2004) (in the KEGG as of Sep. 2004) Problem: unknown part of pathways and many missing enzyme genes Problem: unknown part of pathways and many missing enzyme genes

4 Network inference methods For gene regulatory network Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Bayesian network (Friedman et al., 2000, Imoto et al, 2002) Boolean network (Akutsu et al., 2000) Boolean network (Akutsu et al., 2000) Graphical modeling (Toh et al., 2001) Graphical modeling (Toh et al., 2001) For protein interaction network Joint graph method (Marcotte et al., 1999) Joint graph method (Marcotte et al., 1999) Mirror tree method (Pazos et al., 2001) Mirror tree method (Pazos et al., 2001)

5 Objectives Develop a method to infer metabolic gene networks in a supervised context Develop a method to infer metabolic gene networks in a supervised context Integrate heterogeneous genomic data in the framework of network inference Integrate heterogeneous genomic data in the framework of network inference Reconstruct unknown pathways and identify genes for missing enzymes Reconstruct unknown pathways and identify genes for missing enzymes

6 Kernel in this study Kernel : representation of the similarity between two genes and (e.g., correlation coefficient) Kernel matrix: similarity matrix of a set of genes

7 An example of the kernel Suppose we have a set of genes x 1, x 2,…, x N and represent them by gene expression profiles

8 An example of kernel matrix This can be regarded as a kind of similarity matrix

9 Direct network inference Assumption: connected proteins in the network share high similarity in the data Similarity matrix based on a genomic dataset 1 2 3 4 5 6 7 8 9 123456789123456789 Configuration of genes 1 2 3 5 4 7 6 8 9

10 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

11 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

12 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

13 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

14 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

15 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

16 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

17 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

18 Direct network inference Assumption: connected proteins in the network share high similarity in the data 1 2 3 4 5 6 7 8 9 123456789123456789 1 2 3 5 4 7 6 8 9 Similarity matrix Predicted network

19 Evaluation of the direct approach: using gene expression data Gold standard data: metabolic network of 668 genes of the yeast in the KEGG/Pathway ROC curve False positives True positives 157 expriments (SMD)

20 Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

21 An illustration of formalism Unknown pathway Protein network Similarity matrix in expression

22 An illustration of formalism Unknown pathway Protein network Similarity matrix in expression training

23 Supervised network inference :training set Original space Key idea: use of partially known network information

24 Supervised network inference :training set Original space : edge predicted by direct approach

25 Supervised network inference :training set Original space :true edge

26 Supervised network inference 1/2 Step 1: map proteins to a space, where interacting proteins are close to each other Feature space :training set Original space :true edge

27 Supervised network inference 2/2 Feature space :training set :test set Original space :true edge

28 Supervised network inference 2/2 Feature space Step 2: predict interacting protein pairs involving the test set :training set :test set Original space :true edge

29 Algorithm Kernel CCA (Yamanishi et al., 2004) Distance metric learning (Vert et al., 2004)

30 Result of the supervised learning: ROC curve by cross-validation Direct approachSupervised approach

31 Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

32 Various genomic data Bit strings NumericalvectorsStructure Evolutionary similarity Co-localization similarity Co-expresion similarity Gene-gene relationship Data Phylogenetic profile Localization data Geneexpression

33 Data of the yeast S. cerevisiae Expression: 6059 genes with 157 experiments (SMD database) Expression: 6059 genes with 157 experiments (SMD database) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Localization: 6059 proteins with 23 intracellular locations (Huh et al, 2003) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster) Phylogenetic profile: 6059 proteins with 145 organisms (KEGG/Ortholog Cluster)

34 Gene expression profiles exp1 exp2 exp3 exp4 exp5 … exp P exp1 exp2 exp3 exp4 exp5 … exp P gene 1 (0.1, 0.4, 0.6, 0.2, -0.3, …, 1.5) gene 2 (0.2, 0.9, 1.8, 0.7, -0.3, …, 0.4) gene 3 (0.6, 0.7, -1.0, 0.8, 1.2, …, 0.6) … gene N (1.2, 0.3, 1.9, -0.1, -0.7, …, 0.1) Numerical vectors of the gene expression ratio gene Experiments (or time series)

35 Phylogenetic profiles org1 org2 org3 org4 org5 … org P org1 org2 org3 org4 org5 … org P gene 1 (1, 1, 0, 0, 0, …, 1) gene 2 (1, 0, 1, 0, 1, …, 0) gene 3 (0, 1, 0, 0, 1, …, 0) … gene N (1, 0, 1, 0, 0, …, 1) Bit strings in which the presence and absence of the genes are corded as 1 or 0 across organisms gene organism

36 An illustration of our network inference procedure Gene expression Protein localization Phylogenetic profile Gene network similarity matrix of genes INPUT OUTPUT infer

37 Data representation and integration Genomic dataSimilarity matrix

38 Evaluating the weight for each data source 1.Individual application to each data 2.Evaluation of its biological relevance by the ROC score ROC curve ROC score: area under the ROC curve

39 Evaluating the weight by the ROC scores For each data, compute the ROC score - 0.5, which are used as the weight ExpressionLocalizationPhylogenetic profile Evolutionary information seems to be useful

40 The resulting normalized weights: The effect of data integration ROC curve

41 Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction - Missing enzyme gene estimation - Missing enzyme gene estimation Concluding remarks Concluding remarks

42 Comprehensive prediction of a global gene network We predicted a network of 6059 genes Possible biological applications 1. Estimate unknown pathways 2. Predict biochemical function for hypothetical proteins 3. Identify missing enzyme genes

43 Prediction for a role in pathways YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC:1.8.4.8 and EC:2.5.1.47 in the predicted network YJR137C (the detail function was unknown as of Sep. 2003) is connected with EC:1.8.4.8 and EC:2.5.1.47 in the predicted network

44 Recently, there has been a report that YJR137C is annotated as EC:1.8.1.2 Prediction for a role in pathways

45 Outline Motivation: metabolic network Motivation: metabolic network Method: network inference Method: network inference - Supervised network inference - Supervised network inference - Multiple data integration - Multiple data integration Application Application - Global network prediction - Global network prediction Concluding remarks Concluding remarks

46 Summary We developed supervised approaches to infer the metabolic network from multiple genomic data We developed supervised approaches to infer the metabolic network from multiple genomic data The accuracy improved from the supervised learning and the weighted data integration The accuracy improved from the supervised learning and the weighted data integration We showed some possibilities to obtain new biological findings We showed some possibilities to obtain new biological findings

47 Collaborator For the methods For the methods Jean-Philippe Vert (Ecole des Mines) Jean-Philippe Vert (Ecole des Mines) Minoru Kanehisa (Kyoto University) Minoru Kanehisa (Kyoto University) For the biochemical experiments For the biochemical experiments Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University) Hisaaki Mihara, Motoharu Ohsaki, Hisashi Muramatsu, Nobuyoshi Esaki (Kyoto University)


Download ppt "Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris."

Similar presentations


Ads by Google