Using data sets to simulate evolution within complex environments Bruce Edmonds Centre for Policy Modelling Manchester Metropolitan University.

Slides:



Advertisements
Similar presentations
Data points are spread over the space according to two of their component values Using real data sets to simulate evolution within complex environments.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Some Future Directions in Social Simulation Bruce Edmonds Centre for Policy Modelling Manchester Metropolitan University.
Genetic Algorithms By: Anna Scheuler and Aaron Smittle.
Natural selection Essential Question: What mechanisms have allowed for diversity in organisms?
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Haftu Shamini Thomas Temesgen Seyoum
21-May-15 Genetic Algorithms. 2 Evolution Here’s a very oversimplified description of how evolution works in biology Organisms (animals or plants) produce.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
x – independent variable (input)
A Classification Approach for Effective Noninvasive Diagnosis of Coronary Artery Disease Advisor: 黃三益 教授 Student: 李建祥 D 楊宗憲 D 張珀銀 D
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.
Evolutionary Algorithms Simon M. Lucas. The basic idea Initialise a random population of individuals repeat { evaluate select vary (e.g. mutate or crossover)
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Simulation Models as a Research Method Professor Alexander Settles.
Evolutionary Computational Intelligence
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Model Checking Using residuals to check the validity of the linear regression model assumptions.
Genetic Algorithm.
Using localised gossip to structure distributed learning, Bruce Edmonds, Univ. of Herts., April 2005, slide-1 Using Localised.
2-Day Introduction to Agent-Based Modelling Day 1: Session 4 Networks.
Two Approaches to Calculating Correlated Reserve Indications Across Multiple Lines of Business Gerald Kirschner Classic Solutions Casualty Loss Reserve.
Using the Experimental Method to Produce Reliable Self-Organised Systems, B. Edmonds, ESOA 2004, New York, July 2004, slide-1 Using.
Logistic Regression Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata September 1, 2014.
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
Introduction to Evolutionary Algorithms Session 4 Jim Smith University of the West of England, UK May/June 2012.
GENE 3000 Fall 2013 slides More geologists agree that the age of the Earth is ~4.5 billion years old geneticists have independent data suggesting.
Biological data mining by Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Multi-Patch Cooperative Specialists With Tags Can Resist Strong Cheaters, Bruce Edmonds, Feb 2013, ECMS 2013, Aalesund, Norway, slide 1 Multi-Patch Cooperative.
Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory By Mahmoud Mohamed Al-Bouraie Yasser Fouad Mahmoud Hassan Wesam Fathy Jasser.
2-Day Introduction to Agent-Based Modelling Day 2: Session 6 Mutual adaption.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Algorithms and their Applications CS2004 ( ) 13.1 Further Evolutionary Computation.
HLM Models. General Analysis Strategy Baseline Model - No Predictors Model 1- Level 1 Predictors Model 2 – Level 2 Predictors of Group Mean Model 3 –
CpSc 881: Machine Learning Evaluating Hypotheses.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Projection and the Reality of Routines – reflections of a computational modeller Bruce Edmonds Centre for Policy Modelling Manchester Metropolitan University.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
Essential Question: How can a change in the environment initiate a change in the population? NATURAL SELECTION.
1 Context-aware Data Mining using Ontologies Sachin Singh, Pravin Vajirkar, and Yugyung Lee Springer-Verlag Berlin Heidelberg 2003, pp Reporter:
Probability Aim of session: Links with National Standards and NZ illustrations Intoduce activities that explore and develop ideas of probability Links.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
EDGE DETECTION USING EVOLUTIONARY ALGORITHMS. INTRODUCTION What is edge detection? Edge detection refers to the process of identifying and locating sharp.
Unit 3: Probability.  You will need to be able to describe how you will perform a simulation  Create a correspondence between random numbers and outcomes.
Copyright © 2009 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Selected Topics in CI I Genetic Programming Dr. Widodo Budiharto 2014.
Chapter 4 Basic Estimation Techniques
Evolutionary Algorithms Jim Whitehead
USING MICROBIAL GENETIC ALGORITHM TO SOLVE CARD SPLITTING PROBLEM.
Smoothing Serial Data.
Medical Diagnosis via Genetic Programming
Group 7 • Shing • Gueye • Thakur
Advanced Artificial Intelligence Evolutionary Search Algorithm
Smoothing Serial Data.
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
Basics of Genetic Algorithms (MidTerm – only in RED material)
Lecture 2 – Monte Carlo method in finance
Basics of Genetic Algorithms
Coevolutionary Automated Software Correction
Presentation transcript:

Using data sets to simulate evolution within complex environments Bruce Edmonds Centre for Policy Modelling Manchester Metropolitan University

Main Issue Does the complexity of the environment significantly affect evolutionary processes? Where “complexity” means that there are exploitable patterns in the environment but these are difficult to discover Adding randomness to an environment and/or fitness is not satisfactory NK model of fitness adjusts the difficulty of a fitness space (second order uniformity) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 2

Idea of Talk Evolutionary data-mining is where ideas from biological evolution are applied to data-mining – finding patterns in data Data sets exist for the purpose of testing different ML algorithms that have patterns in them, albeit difficult to discover Reversing this... I am suggesting the use of complex data sets as a test bed to investigate how the complexity of the environment might affect evolution Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 3

The Data Set Environment Find a rich data set (preferably one derived from a naturally complex system) with many independent variables The gene of an individual is an arbitrary arithmetic expression stored as a tree (or similar technique) Resource in the model is modelled by distributing to individuals predicting the outcome variable of local data better than its competitors The gene are mutated and crossed as the simulation progresses Individuals are selected for/against depending on their total success in predicting Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 4

Cleveland Heart Disease Data 281 Data Points 13 Diagnostic variables: age, sex, cp (chest pain), trestbps (resting blood pressure), chol (cholesteral), fbs (fasting blood sugar), restecg (resting ecg type), thalach (max heart rate), exang (exercise induced angina), oldpeak (ST depression induced by exercise), slope (slope of exercise), ca (num blood vessels), thal Predicts severity of Heart Attack (0-4) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 5

The Evolutionary Model I Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 6 Data Space Individuals each with genes composed of an arithmetic expression to predict HD based on the other 13 variables For each data point (or a random subset of them) evaluate (a random selection of) near individuals to determine the share of fitness each receive (depending on predictive success) Sum of fitness determines which breed and die

The Evolutionary Model II Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 7 Data Space N times: 1.probabilistically select a winner on fitness 2.probabilistically select a loser on lack of fitness 3.kill loser Either 4.propagate winner locally with possible mutation 5.mate with another local based on fitness 8.1

Start of Simulation (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 8 Individuals each with gene which is an arithmetic expression, e.g.: Data points from set distributed over space dependent on 2 variables chol (x) & thalach (y)

After 25 ticks (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 9

After 50 ticks (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 10

After 75 ticks (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 11

After 300 then 100 w/o Variation Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 12 ca 1 sex slope restecg+fbs+1fbs/oldpeak

Illustrative Results Heart Disease Data Set 20 runs with each setting 1000 individuals, 1000 iterations Locality parameter 0.1 (radius) Comparison of Original vs Ersatz Data Sets Fixed normal noise (0, 0.1) added to both data sets Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 13

Ersatz Data Set Comparison Data Set For each variable separately: approximate a normal distribution of its values Then reconstruct a data set using this distribution for each value independently Results in a Data Set with similar shape and randomness But without the predictive variable being linked in to the explanatory variables Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 14

Fitness (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 15

Spread (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 16

Gene Complexity/Depth (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 17

All Runs’ Complexity Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 18 Original Ersatz

Fitness (White Wine Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 19

Depth (White Wine Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 20

Depth (White Wine Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 21 Original Original with 0.1 noise

Depth – locality 0.1 (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 22

Depth – locality 0.2 (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 23

Depth – locality 0.4 (HD Data) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 24

Concluding Questions When mighty the complexity of the environment effect evolutionary processes? How might the complexity of the environment effect evolutionary processes? Will models with a simple environment tell us about evolution in the wild? –When and about what aspects will models with simple environments be sufficient? –In what ways might evolution differ when in complex environments? What kind of complexity might we need? How might one measure this complexity in the wild (if this is even possible)? Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 25

The End Bruce Edmonds Centre for Policy Modelling

White Wine Quality Data 4898 Data Points 11 Diagnostic variables: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol Predicts judged quality of wine (0-10) Using data sets to simulate evolution, Bruce Edmonds, Complexity of Evolutionary Processes, Manchester, June 13 th 2011, slide 27