Presentation is loading. Please wait.

Presentation is loading. Please wait.

Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

Similar presentations


Presentation on theme: "Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon"— Presentation transcript:

1 Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon (dycho@bi.snu.ac.kr)

2 © 2005 SNU CSE Biointelligence Lab 2 Example (1/2) Data  Relationship between A and P AP 0.390.24 0.720.61 1.00 1.521.84 5.2011.9 9.5329.4 19.183.5

3 © 2005 SNU CSE Biointelligence Lab 3 Example (2/2) Kepler’s Third Law  Square of any planet's orbital period (sidereal) is proportional to cube of its mean distance (semi-major axis) from Sun PlanetAP Mercury0.390.24 Venus0.720.61 Earth1.00 Mars1.521.84 Jupiter5.2011.9 Saturn9.5329.4 Uranus19.183.5

4 © 2005 SNU CSE Biointelligence Lab 4 Koza’s Algorithm 1. Choose a set of possible functions and terminals for the program. F = {+, - *, /,  }, T = {A} 2. Generate an initial population of random trees (programs) using the set of possible functions and terminals. 3. Calculate the fitness of each program in the population by running it on a set of “fitness cases” (a set of input for which the correct output is known). 4. Apply selection, crossover, and mutation to the population to form a new population. 5. Steps 3 and 4 are repeated for some number of generations. Evolving the Programs (1/2)

5 © 2005 SNU CSE Biointelligence Lab 5 Evolving Lisp Programs (2/2) Kepler’s Third Law: P 2 = cA 3  FORTRAN  LISP PROGRAM ORBITAL_PERIORD C# Mars # A = 1.52 P = SQRT(A * A * A) PRINT P END ORBITAL_PERIORD (defun orbital_period () ; Mars ; (setf A 1.52) (sqrt (* A (* A A)))) Parse tree

6 © 2005 SNU CSE Biointelligence Lab 6 Symbolic Regression by GP Objective  Find the function f for the given data (x, y) Data Sets  Set 1 and 2: 11 pairs  Set 3: 50 pairs

7 © 2005 SNU CSE Biointelligence Lab 7 Functions and Terminals Functions  Numerical operators  {+, -, *, /, exp, log, sin, cos, sqrt}  Some operators should be protected from the illegal operation. Terminals  Input and constants  {x, R} where R  [a, b]

8 © 2005 SNU CSE Biointelligence Lab 8 Initialization Maximum initial depth of trees D max is set. Full method (each branch has depth = D max ):  nodes at depth d < D max randomly chosen from function set F  nodes at depth d = D max randomly chosen from terminal set T Grow method (each branch has depth  D max ):  nodes at depth d < D max randomly chosen from F  T  nodes at depth d = D max randomly chosen from T Common GP initialisation: ramped half-and-half, where gr ow and full method each deliver half of initial population

9 © 2005 SNU CSE Biointelligence Lab 9 Fitness Functions Relative Squared Error The number of outputs that are within  % of the correct value

10 © 2005 SNU CSE Biointelligence Lab 10 Selection (1/2) Fitness proportional (roulette wheel) selection  The roulette wheel can be constructed as follows.  Calculate the total fitness for the population.  Calculate selection probability p k for each chromosome v k.  Calculate cumulative probability q k for each chromosome v k.

11 © 2005 SNU CSE Biointelligence Lab 11 Procedure: Proportional_Selection  Generate a random number r from the range [0,1].  If r  q 1, then select the first chromosome v 1 ; else, select the kth chromosome v k (2  k  pop_size) such that q k-1 < r  q k. pkpk qkqk 10.082407 20.1106520.193059 30.1319310.324989 40.1214230.446412 50.0725970.519009 60.1288340.647843 70.0779590.725802 80.1020130.827802 90.0836630.911479 100.0885211.000000

12 © 2005 SNU CSE Biointelligence Lab 12 Selection (2/2) Tournament selection  Tournament size q Ranking-based selection  2    POP_SIZE  1   +  2 and  - = 2 -  +

13 © 2005 SNU CSE Biointelligence Lab 13 GP Flowchart GA loopGP loop

14 © 2005 SNU CSE Biointelligence Lab 14 Bloat Bloat = “ survival of the fattest ”, i.e., the tree sizes in the population are increasing over time Ongoing research and debate about the reasons Needs countermeasures, e.g.  Prohibiting variation operators that would deliver “ too big ” children  Parsimony pressure: penalty for being oversized

15 © 2005 SNU CSE Biointelligence Lab 15

16 © 2005 SNU CSE Biointelligence Lab 16 Experiments At least three problems (+ your own data) Various experimental setup  Termination condition: maximum_generation  2 Models  3 settings  20 runs  Polynomial and general  Effects of the penalty term  Selection methods and their parameters  Crossover p c and mutation p m  

17 © 2005 SNU CSE Biointelligence Lab 17 Results For each problem  Result table and your analysis  Present the optimal function.  Readable form and predicted function graph with data  Draw a learning curve for the run where the best solution was found.  You can draw all learning curves in one plot. PolynomialGeneral Average  SD BestWorst Average  SD BestWorst Setting 1 Setting 2 Setting 3

18 © 2005 SNU CSE Biointelligence Lab 18 Generation Fitness (Error)

19 © 2005 SNU CSE Biointelligence Lab 19 References Source Codes  GP libraries (C, C++, JAVA, …)  MATLAB Tool box Web sites  http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html  http://cs.gmu.edu/~eclab/projects/ecj/ http://cs.gmu.edu/~eclab/projects/ecj/  http://www.geneticprogramming.com/GPpages/softwar e.html http://www.geneticprogramming.com/GPpages/softwar e.html  …

20 © 2005 SNU CSE Biointelligence Lab 20 Pay Attention! Due: May 3, 2005 Submission  Source code and executable file(s)  Proper comments in the source code  Via e-mail  Report: Hardcopy!!  Running environments  Results for many experiments with various parameter settings  Analysis and explanation about the results in your own way


Download ppt "Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon"

Similar presentations


Ads by Google