Learning Metabolic Network Inhibition using Abductive Stochastic Logic Programming Jianzhong Chen, Stephen Muggleton, José Santos Imperial College, London.

Slides:



Advertisements
Similar presentations
Explanation-Based Learning (borrowed from mooney et al)
Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Copyright © Allyn & Bacon (2007) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
Copyright © Allyn & Bacon (2010) Research is a Process of Inquiry Graziano and Raulin Research Methods: Chapter 2 This multimedia product and its contents.
Knowledge Representation
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
The multi-layered organization of information in living systems
Doug Raiford Lesson 13 5/10/20151Gene networks and pathways.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
Integration of abduction and induction in biological networks using CF-induction Yoshitaka Yamamoto Graduate University for Advanced Studies Tokyo, Japan.
Against the Empirical Viability of the DWE Approach to QM Against the Empirical Viability of the DWE Approach to QM Richard Dawid and Karim Thebault The.
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Learning From Data Chichang Jou Tamkang University.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Scientific Thinking - 1 A. It is not what the man of science believes that distinguishes him, but how and why he believes it. B. A hypothesis is scientific.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Machine Reasoning about Anomalous Sensor Data Matt Calder, Francesco Peri, Bob Morris Center for Coastal Environmental Sensoring Networks CESN University.
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Data Mining – Intro.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
The Scientific Method Chapter 1.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Chapter 12: Analysis of Variance
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Gene Set Enrichment Analysis (GSEA)
Reconstructing Gene Networks Presented by Andrew Darling Based on article  “Research Towards Reconstruction of Gene Networks from Expression Data by Supervised.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
1 Machine Learning: Lecture 11 Analytical Learning / Explanation-Based Learning (Based on Chapter 11 of Mitchell, T., Machine Learning, 1997)
Functional Genomic Hypothesis Generation and Experimentation by a Robot Scientist King et al, Nature : Presented by Monica C. Sleumer February.
1 Abduction and Induction in Scientific Knowledge Development Peter Flach, Antonis Kakas & Oliver Ray AIAI Workshop 2006 ECAI August, 2006.
Biology I.  Biology offers a framework to pose and answer questions about the natural world.  What do Biologists study?  Questions about how living.
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B.
Nature of Science August 2014 Bio X. From the Solutions Lab What do you observe? Look for patterns in the “data.” What do you infer each solution to be?
1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.
Nature of Life La Cañada High School Biology – Dr. E.
BIOINFORMATICS ON NETWORKS Nick Sahinidis University of Illinois at Urbana-Champaign Chemical and Biomolecular Engineering.
The nature of science. Scientific knowledge is the product of observation and inference. Observations and Inferences.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Scientific Method. My 4 out of 5 Rule If you make an observation.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
The Scientific Method An approach to acquiring knowledge.
Introduction to biological molecular networks
The Nature of Science and Technology Chapter 1: What is Science?
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Questions for Today 1.What is the Nature of Science? 2.What are the differences between a theory and a law? 3.What are the differences between inductive.
Chapter 1 What is Biology? 1.1 Science and the Natural World.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Chapter 1: Section 1 What is Science?. What Science IS and IS NOT.. The goal of Science is to investigate and understand the natural world, to explain.
Introduction to the Scientific Method Key Vocabulary Analyze - Review the data from an experiment to find out what they mean (evidence) (see interpret).
Inference of Gene Relations from Microarray Data by Abduction Irene Papatheodorou & Marek Sergot Imperial College, London UK.
Scientific Method Vocabulary Observation Hypothesis Prediction Experiment Variable Experimental group Control group Data Correlation Statistics Mean Distribution.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
HYPOTHESIS TESTING.
Knowledge Representation Techniques
Data Mining – Intro.
CHAPTER 5 Handling Uncertainty BIC 3337 EXPERT SYSTEM.
The Nature of Scientific Knowledge
The Scientific Method Unit 1.
Scientific Method Integrated Sciences.
Presentation transcript:

Learning Metabolic Network Inhibition using Abductive Stochastic Logic Programming Jianzhong Chen, Stephen Muggleton, José Santos Imperial College, London Florence, 2nd August 2007

Summary Background: Metabolic networks, ILPs and SLPs Problem: Learning Metabolic Network Inhibition Modeling: Abductive SLPs Results: Improvement by extracting probabilistic examples Conclusions and Discussion

Metabolic Network Metabolites interact with each other in a complex metabolic network with enzymes catalyzing the transformation of one metabolite into another. Metabolites are the nodes and enzymes are the arcs of this graph. The metabolic network data was taken from the Kyoto Encyclopedia of Genes and Genomes (KEGG).

Excerpt of the rat metabolic network

Introducing ILP Inductive Logic Programming is an expressive machine learning technique which receives a description of a problem in a logic program. This description is divided in background knowledge (i.e. facts, rules and constraints about the domain) and observations.

Induction and Abduction Induction: Inference of a general theory that entails as many observations as possible (the observations are causes) Abduction: Inference of the hypothesis that best explain available observations (the observations are consequences)

Problem Build a model to predict enzyme inhibition due to a toxin (hydrazine) injection. Motivation Predicting the inhibitory effects of a drug is crucial in drug development to understand the possible side effects of the drug in the metabolic system of the recipient.

Experiment setting Our dataset consists of 2 groups with 10 rats each. The control group was injected with a placebo and the case group was injected with 30mg of hydrazine. The injection of hydrazine changes the metabolite concentration in the rats. The level of change was measured comparing to the control rats.

Important point It is not possible to directly observe the inhibitory effects of the drug (i.e. which enzymes it inhibited). We can know, however, the metabolite concentrations at certain hours after the drug injection. We abduce the inhibited status by observing the metabolite concentrations

Background knowledge concentration(X, down, T):- reactionnode(X, Enz, Y), inhibited(Enz, Y, X, T). concentration(X, down, T):- reactionnode(X, Enz, Y), noninhibited(Enz, Y, X, T), concentration(Y, down, T). concentration(X, up, T):- reactionnode(Y, Enz, X), inhibited(Enz, X, Y, T). concentration(X, up, T):- reactionnode(X, Enz, Y), noninhibited(Enz, Y, X, T), concentration(Y,up,T). :- concentration(M, up, T), concentration(M, down, T) :- inhibited(Enz, From, To, T), noninhibited(Enz, From, To, T).

Background knowledge- partial network reactionnode('l-2-aminoadipate',' ', '2-oxo-glutarate'). reactionnode('2-oxo-glutarate',' ', 'l-2-aminoadipate'). reactionnode('2-oxo-glutarate', ' ', 'isocitrate'). reactionnode('isocitrate', ' ', '2-oxo-glutarate'). reactionnode('2-oxo-glutarate',' ', 'succinate'). reactionnode('succinate',' ', '2-oxo-glutarate'). reactionnode('isocitrate',' ', 'citrate'). reactionnode('citrate',' ', 'isocitrate'). reactionnode('isocitrate',' ', 'trans-aconitate'). reactionnode('trans-aconitate',' ', 'isocitrate'). reactionnode('citrate',' ', 'fumarate'). reactionnode('fumarate',' ', 'citrate'). reactionnode('succinate',' ', 'fumarate'). reactionnode('fumarate',' ', 'succinate'). reactionnode('succinate',' ', 'hippurate'). reactionnode('hippurate',' ', 'succinate'). reactionnode('citrate',' ', 'taurine'). reactionnode('taurine',' ', 'citrate').

Observations after 8 hours of injection concentration('citrate',up,8). concentration('2-oxo-glutarate',down,8). concentration('succinate',up,8). concentration('l-2-aminoadipate',up,8). concentration('creatine',down,8). concentration('creatinine',down,8). concentration('hippurate',up,8). concentration('beta-alanine',down,8). concentration('lactate',up,8). concentration('methylamine',up,8). concentration('trans-aconitate',down,8). concentration('formate',down,8). concentration('taurine',up,8). concentration('acetate',down,8). concentration('nmna',down,8). concentration('nmnd',up,8). concentration('tmao',down,8). concentration('fumarate',up,8). concentration('l-as',up,8). concentration('glucose',down,8).

Discovered abducibles noninhibited(' ',succinate,hippurate,8). noninhibited(' ',fumarate,nmnd,8). noninhibited(' ',fumarate,nmna,8). noninhibited(' ',formaldehyde,formate,8). inhibited(' ','l-2-aminoadipate','2-oxo-glutarate',8). inhibited(' ',isocitrate,'trans-aconitate',8). inhibited(' ',fumarate,citrate,8). inhibited(' ',fumarate,succinate,8). inhibited(' ',taurine,citrate,8). inhibited(' ','l-as',arginine,8). inhibited(' ',ornithine,creatine,8). inhibited(' ',sarcosine,creatinine,8). inhibited(' ',methylamine,formaldehyde,8). inhibited(' ',tmao,formaldehyde,8). inhibited(' ',lactate,'acryloyl-coA',8). inhibited(' ','beta-alanine','acryloyl-coA',8). inhibited(' ',glucose,pyruvate,8). inhibited(' ',acetate,acetylCoA,8).

Introducing SLPs ILP has a great modeling power, however more can be done if probabilities are attached to the background knowledge rules. Attaching probabilities to rules is, in brief, the idea of Stochastic Logic Programs (SLPs)

Reformulating the problem In our SLP modeling we started with the logic program just like in the ILP case, including the abducibles as part of the model. The difference is that probabilities were attached to the inhibited/4 and noninhibited/4 predicates as well as in the concentration/3 rules. The SLP system has to discover the probability values that maximize the likelihood of observing the given concentrations.

Significance of this approach In real world problems the status of entities is rarely just two folds (on/off). Most problems are better modeled if the modelling system allows for a certain degree of fuzzyness. Also, the rules previous discovered may not be all equally important and we leave to the SLP system the responsibility of determining the best relative weighting.

Novelty of our work We divided SLP modeling in two, Categorical SLP (CSLP) and Probabilistic SLP (PSLP), the only difference between the two is that the latter uses probabilistic examples to learn (confidence for being up or down) rather than categorical (totally up/down) The problem now is how to derive these confidences from the dataset.

Extracting Probabilistic Examples from Scientific Data Pnorm(1.72, 0,1)=0.9573

ConcentrationEmpirical ProbCSLPPSLP citratedown0,98430,690,686 2-ogdown1,0000,5680,69 succinatedown0,93680,2590,297 l-2-aaup0,99620,6580,828 creatinedown0,50520,3070,443 creatininedown0,57980,3220,493 hippuratedown0,71360,3030,166 beta-alanineup0,96590,5670,686 lactateup0,95030,540,516 methylamineup1,0000,3010,525 trans-aconitatedown0,64880,3920,441 formatedown0,93680,3920,423 taurineup0,73620,650,81 acetateup0,67270,5560,539 nmnaup0,52390,4890,492 nmndup0,64140,4890,499 tmaoup0,51660,310,112 fumarateup0,6970,2970,502 l-asup0,67480,5040,507 glucoseup0,80960,5570,531 Average accuracy68,34%72,75% P-value4,12% Prediction accuracy of CSLP vs PSLP Default accuracy and ILP accuracy= 60%

Abductive SLP model

Learned metabolic network with Probabilistic SLP

Conclusions and Discussion This kind of problems is almost impossible to model without a rich machine learning framework like logic programs. SLPs produce a richer description of the underneath biological reality compared to ILP. Additionally, learning SLPS from probabilistic examples leads to an statistically significant improvement in predictive accuracy.