Automated Exploration of Bioinformatics Spaces Simon Colton Computational Bioinformatics Laboratory.

Slides:



Advertisements
Similar presentations
Introduction to Proofs
Advertisements

Copyright © Cengage Learning. All rights reserved.
Proofs, Recursion and Analysis of Algorithms Mathematical Structures for Computer Science Chapter 2 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProofs,
CS 355 – Programming Languages
Induction and recursion
Great Theoretical Ideas in Computer Science.
Proofs, Recursion and Analysis of Algorithms Mathematical Structures for Computer Science Chapter 2.1 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProofs,
Mathematical Induction Readings on induction. (a) Weiss, Sec. 7.2, page 233 (b) Course slides for lecture and notes recitation. Every criticism from a.
Critical Thinking Skills and Doing Science Elements of the scientific method: 1.Observe patterns 2. Ask questions 3. Formulate hypotheses that make specific.
Automated Puzzle Generation Simon Colton Universities of Edinburgh and York.
Machine Learning in Bioinformatics Simon Colton The Computational Bioinformatics Laboratory.
First Order Logic. This Lecture Last time we talked about propositional logic, a logic on simple statements. This time we will talk about first order.
Automated Theory Formation for Tutoring Tasks in Pure Mathematics Simon Colton, Roy McCasland, Alan Bundy, Toby Walsh.
ILP for Mathematical Discovery Simon Colton & Stephen Muggleton Computational Bioinformatics Laboratory Imperial College.
The HOMER System for Discovery in Number Theory Simon Colton Imperial College, London.
C OURSE : D ISCRETE STRUCTURE CODE : ICS 252 Lecturer: Shamiel Hashim 1 lecturer:Shamiel Hashim second semester Prepared by: amani Omer.
Methods of Proof & Proof Strategies
Lakatos-style Methods in Automated Reasoning Alison Pease University of Edinburgh Simon Colton Imperial College, London.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Automated Theory Formation: First Steps in Bioinformatics Simon Colton Computational Bioinformatics Laboratory.
Artificial Intelligence at Imperial Dr. Simon Colton Computational Bioinformatics Laboratory Department of Computing.
The TM System for Repairing Non-Theorems Alison Pease – University of Edinburgh Simon Colton – Imperial College, London.
Introduction to Proofs
Automated Theory Formation in Bioinformatics Simon Colton Computational Bioinformatics Lab Imperial College, London.
Automated Reasoning for Classifying Finite Algebras Simon Colton Computational Bioinformatics Laboratory Imperial College, London.
Empirical Explorations with The Logical Theory Machine: A Case Study in Heuristics by Allen Newell, J. C. Shaw, & H. A. Simon by Allen Newell, J. C. Shaw,
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Descriptive ILP for Mathematical Discovery Simon Colton Computational Bioinformatics Lab Department of Computing Imperial College, London.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
2012: J Paul GibsonTSP: MSC SAI Mathematical FoundationsMAT7003.ProofsWithRodin.1 MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul.
A Theory of Theory Formation Simon Colton Universities of Edinburgh and York.
Chapter 1 Logic Section 1-1 Statements Open your book to page 1 and read the section titled “To the Student” Now turn to page 3 where we will read the.
1 Sections 1.5 & 3.1 Methods of Proof / Proof Strategy.
Science Fair How To Get Started… (
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
Edinburgh and Calculemus Simon Colton Universities of Edinburgh and York.
Inductive/Dedu ctive Reasoning Using reasoning in math and science.
The HR Program for Theorem Generation Simon Colton Mathematical Reasoning Group University of Edinburgh.
Making Conjectures About Maple Functions Simon Colton Universities of Edinburgh & York.
Working Group 4 Creative Systems for Knowledge Management in Life Sciences.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
A Theory of Theory Formation Simon Colton Universities of Edinburgh and York.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Overview Concept Learning Representation Inductive Learning Hypothesis
Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.
Automated Discovery in Pure Mathematics Simon Colton Universities of Edinburgh and York.
The Homer System Simon Colton – Imperial College, London Sophie Huczynska – University of Edinburgh.
1.2 Logical Reasoning page 9. Inductive Reasoning: Reasoning that is based on patterns you observe. Conjecture: A conclusion that is reached using inductive.
First Order Logic Lecture 3: Sep 13 (chapter 2 of the book)
Intertheoretic Reduction and Explanation in Mathematics
CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.
Data Mining and Decision Support
CHAPTER 1 Mathematical Reasoning Section 1.1 Inductive Reasoning.
Automatic Generation of First Order Theorems Simon Colton Universities of Edinburgh and York Funded by EPSRC grant GR/M98012 and the Calculemus Network.
Automated Theorem Discovery Simon Colton Universities of Edinburgh and York.
Chapter 5. Section 5.1 Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If we can reach.
Calculation Invention and Deduction Dr. Simon Colton Imperial College London (Formerly at Edinburgh) YVR in Karlsruhe & Saarbrucken.
Machine Creativity Edinburgh Simon Colton Universities of Edinburgh and York.
Chapter 2 Notes Ms. Sager. Science as Inquiry What is Science? – Word derived from Latin – means “to know” – A way of knowing – How to answer questions.
Welcome to Math 6 Our subject for today is… Divisibility.
#1 Make sense of problems and persevere in solving them How would you describe the problem in your own words? How would you describe what you are trying.
Section 1.7. Section Summary Mathematical Proofs Forms of Theorems Direct Proofs Indirect Proofs Proof of the Contrapositive Proof by Contradiction.
ESFOR Panel Application Developers’ Wish Lists for Automated Theorem Provers.
Developing Your Research Question. I know about general area, but I ’ m not sure of my research question?
TOK: Mathematics Unit 1 Day 1. 2 – B 2 = AB – B 2 Factorize both sides: (A+B)(A-B) = B(A-B) Divide both sides by (A-B): A = B = B Since A = B, B+B=B Add.
What is the Scientific Method?
The Foundations: Logic and Proofs
CSE 311 Foundations of Computing I
Spreadsheets, Modelling & Databases
2-1: Logic with Inductive Reasoning
Presentation transcript:

Automated Exploration of Bioinformatics Spaces Simon Colton Computational Bioinformatics Laboratory

Purpose of the Talk  To make you aware of another tool which may have some potential for use in the Metalog project  To get feedback on this potential  To briefly describe two other projects

The Substructure Server  Old-style approach to using machine learning (ML) for predictive toxicology –What do the positives have in common that the negatives do not? –For chemicals, possibly using ILP is like using a sledgehammer to crack a nut  Substructures are often the answer (e.g., mutagenesis) –Substructure server looks explicitly for substructures  Vehicle for me to understand ML in predictive toxicology and server-client technology –May even be of some use one day

Substructure Server Development  Team –Simon Colton  Prolog machine learning routine (FIND-S) –Saravanan Anandathiyagar  Server technology –Laurence Darby  Distributing the process over our linux farm –Gives roughly 5 times speed up –A.N.Other masters student (TBA)  Front end (Babel)  Back end (Molgen, etc.)

Old-Style Predictive Toxicology  Reason 1: –Using only chemistry, attributes etc.  Not using biochemical pathways  Reason 2: –Using predictive machine learning  Not using descriptive machine learning

Predictive Induction in Bioinformatics  Interesting problem found –Interesting from a biochemistry perspective –Interesting from a computer science perspective  Packaged as prediction/classification –Turned into positives and negatives –Much work done to shoe-horn into a prediction task  Reason(s) learned why positives are positive –Almost guaranteed that any answer found will be interesting, because the problem is interesting

Generating Hypotheses  Predictive machine learning produces hypotheses of the form: –A  Toxic –Toxic  C –B  Toxic –D  ¬Toxic –etc.  With any luck, A, B or C will be interesting in their own right –And enter the biochemistry literature!

But what if…  There was an interesting relationship –Between a concept and a subset of the positives. Isn’t this interesting?  Examples: A  Toxic & B C  ¬Toxic & D & E

Predictive versus Descriptive Learning  Predictive learning –You know what you are looking for –You just don’t know what it looks like  Descriptive learning –You don’t know what you are looking for –But you want to find something interesting  Eventually: –You don’t even know you are looking for something

Descriptive Induction  Not as goal directed as predictive induction  Same background information given –Perhaps no categorisation into pos & neg  A theory is produced which contains: –Examples –Concepts which categorise/describe sets of examples –Hypotheses which relate concepts –Explanations which explain the hypotheses  For instance: –Acid + Base  Salt + Water  Tools are supplied so that –The user can extract interesting parts of the theory

The HR System in 3 Slides  Concept formation –Starts with background info like Progol –Builds new concepts from old ones  Using one of 15 production rules  (composition, instantiation, counting, matching, etc.)  Unary or binary  Many settings for how concept formation occurs –Derives examples & definition of concepts  Heuristic search (if user specifies) –Uses a best first search  20+ measures of interestingness for concepts/conjectures  Chooses to build new concepts from best old ones

The HR System in 3 Slides  Conjecture Making –“Proper” induction! –Notices patterns in examples for concepts  Newly formed concept has no examples –Makes a non-existence conjecture  Two concepts have exactly the same examples –Makes an equivalence conjecture  One concept’s examples are subset of another –Makes an implication conjecture –Extracts simpler hypotheses from empirical ones –Able to make “near-conjectures”  Patterns don’t have to be exact  User specifies a tolerance level

The HR System in 3 Slides  Generating explanations –User supplies a set of axioms –HR appeals to a third party theorem prover  And a third party model generator (otter/mace) –To attempt to prove/disprove  That the hypothesis follows from the axioms  Sometimes, explanations are interesting –In domains such as group theory  Explanations are proofs of theorems  Sometimes, explanations show that a hypothesis is dull –Anything provable by the theorem prover is trivial

Extreme(!) Theory Formation  All my best examples are from maths  Given only one concept: –How to divide two integers  HR finds the conjecture –Odd refactorable numbers are squares  Invented concepts: –Odd, square, refactorable, (even, tau, …)  Made concept of odd refactorables –Noticed the examples are a subset of the examples for square numbers  No proof supplied (I proved this one)

What HR Can Deliver  HR generates hypotheses like Progol –But there are too many –Require filters to prune dull ones  Some concepts might be interesting aside from their relation to toxicity  HR points out interesting examples –E.g., a molecule has the only occurrence of a particular sub-molecule

Interesting New Angle  Anomaly detection  First experiments in analysis of Bach chorale melodies –Which ones were different to the rest  Not necessarily breaking rules  Could be: something occurring more often –“Parsimony outlier” measure of interestingness  Hope to try this with metabolic pathways –Give me 30 pathways  I’ll give you reasons why each is unique –Give me an invented pathway  I’ll show you possible reasons it’s wrong…

What I need  Objects of interest –Pathways  Background concepts –Ways to describe the pathways  Axioms –What we know is true about pathways  Measures of interestingness –Essential to separate the wheat from chaff –Evolve over time as we use HR together

Future for my Work  Form theories about biochemical data  Domain of interest –Pathways  Technical problems –Enabling HR to work with probabilistic information (not yet possible) –Enabling HR to work with larger datasets –Understanding pathways!

The Amaze Database  Bioinformatics MSc. Project –Organised by Marek Sergot  Challenge –To resurrect the Amaze database  Of biochemical pathways –EBI originally, now Université libre de Bruxelles –To get hold of data, put into a database, put a front- end onto this, etc. –And write translation routines  So that we can get at the information  This is a resource we should use –Please let me know your requirements